Data Mining

Kira Schroder  Extraction transformation and loadingOverview

YouTube Video

Hang Man

Data mining is a process of extracting data already collected in database warehouse to find hidden patterns, trends, and correlations to make subjective analysis.  Data mining is extensively used by business sectors, government sectors, scientific research, and along with many other sectors.  It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.[2]  These are just a few examples of its uses. 

The question remains to as how reliable are subjective analysis.  The accuracy of the analysis highly depend on sample size and the technique being used.  Generally, the bigger the sample data to be used, the better the result of the finding.  Moreover, there are many techniques to use in data mining that will be discuss below.  The effectiveness of the technique being depends on the type of analysis that it's intended for.  The result of analysis is by no mean a foolproof because it's still consider to be subjective.  However, the analysis can be verified using some rule of thumbs and validation techniques.  Sample data are usually data collected from either day to day operation or from external source.  In house data such as sales, accounting, payroll, cost, customer profile among other things are being used for analysis.  External data such as data from competitors and industry sales are also helpful for data mining process.


Data Mining Elements

Data mining consists of these elements:

  •     Extract data from database warehouse.
  •     Store and manage the relevant data.
  •     Transform data into useful information using appropriate technique.
  •     Analyze data using specialize software.
  •     Summarize data into useful format and present them in the form of table or graph.
  •     Provide data and analysis to the intended professionals.


Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:[1]
  • operational or transactional data such as, sales, cost, inventory, payroll, and accounting. [1]
  • nonoperational data, such as industry sales, forecast data, and macro economic data. [1]
  • meta data - data about the data itself, such as logical database design or data dictionary definition. [1]

Data Mining Techniques

Since the process of data mining is to analyze data from different perspectives to find hidden patterns, several techniques are require to produce better predictive result.  The three most common techniques are classification, clustering, and regression.

  • Classification - Arranges the data into predefined groups.[2]  For example, credit company classify credit risk such as good or bad by analyzing customers profile.

  • Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together. [2]  For example, marketing department want to find a cluster of brands from sample of customer surveys.

  • Regression - Developing a predictive model where a set of variables is used to predict a variable of interest.[3]  Example can be weight loss after taking diet pill for a certain period.

Data Mining in Business

As organizations begin to migrate from the traditional product-focused organization toward customer-driven organizations, they are recognizing their customers as experts, not just revenue generators.[5]  The area of business practice that deal with this trend is called Customer Relationship Management (CRM).   Customer relationship management is a broadly recognized, widely-implemented strategy for managing and nurturing a company’s interactions with clients and sales prospects.[6] Organizations that understand and successfully utilize this concept will gain a competitive advantage or their competitors.  CRM uses data collect from accounting system, order fulfillment system, inventory system, and customer service system through the data mining process to generate valuable information on customers and the market.  "Thus data mining is extensively used in organizations that have a strong consumer focus - retail, financial, communication, and marketing organizations. "It enables these companies to determine relationships among internal factors such as price, product positioning, or staff skills, and external factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to drill down into summary information to view detail transactional data."[1]

Data Mining Software and Services


1.  What is Data Mining? Anderson/UCLA/edu.
2.  Data Mining Wiki.
3.  Data Mining Techniques. Statsoft Electronic Statistic Textbook.  Creator of STATISTICA data analysis Software and Services.
4.  Data Mining: An Introduction. Databases.
5.  Paige Baltzan, Amy Phillips (2009).  Business Driven Information Systems, 2/e.  Toronto: McGraw-Hill pp. 316.
6.  CMR Wiki.

Subpages (1): Super Dooper