What is Data Mining?
The application of statistical techniques to find patterns and relationships among data for classification and prediction.
The process that results in the discovery of new patterns and relationships in large data sets.
Different Types of Data Mining:
Classes: Using stored data in order to find data for a predetermined group. One example includes restaurant mining customer orders to find their habits and customize their product to tailor those patterns. This way they can plan daily specials.
Cluster: Grouping data items based on consumer preferences and logical relationships. One example includes companies mining customer data in order to identify market segments.
Association: Data is mined in order to identify associations. One example are video rental stores predicting which DVDs will be rented based of similar DVDs being borrowed. Customers renting action movies are more likely to rent different action movies.
Sequential Patterns: Data is mined in order to anticipate trends and patterns. One example is a electronic company trying to predict the likelihood of printers being bought based on purchases of computers.
Unsupervised Data Mining:
Analysts do not create a model before running the analysis.
Instead they apply the techniques to the data and observe the results.
After analysis is complete, they form a hypothesis to explain the patterns.
Cluster Analysis is the most popular form of unsupervised data mining.
It involves using many fields to group similar sets of data into clusters.
Supervised Data Mining:
Analysts create a model before running the analysis.
They gather the available information and predict trends and patterns.
Then they would run an analysis to compare with their prediction.
One technique known as Regression Analysis, focuses on finding a connection between a dependent and independent variable.
Once the connection is established, it can be used to see how one factor affects the other.
Another technique known as Neural Networks, classifies the data into useful and non-useful.
Click here for more information on Neural Networks.
Data Mining can be used in many ways.
Other Uses of Data Mining:
Classified information that is meant to be destroyed can be reconstructed through data mining.
Business use information determined through data mining to forecast sales and alter their marketing strategy.
Engineers and scientists also use data mining in order to more quickly scan and map data.
History of Data Mining
Emerged from a blend of statistics, mathematics, artificial intelligence, and machine learning fields in computer science.
Data Mining is also known as KKD (knowledge discovery in database).
Constant improvement over the last 10 years have made data mining easier through the increase in usage of cheap computers.
How Does it Work?
Data-mining tools are used to make predictions.
For example, if a correlation is found between the purchase of toothpaste and sofas, stores will place those two products in closer proximity.
The goal of data mining is to gather knowledge from an existing data set and turn it into a structure that is easier to understand and use.
Why Do People Use Data Mining?
Companies that specialize in retail, financial, communication, and marketing organizations use data mining to enables them to determine the impact on sales, customer satisfaction, and corporate profits.
There are 5 steps in Data Mining:
First is Selection which involves choosing what to search for.
Second is Pre-processing which involves gathering the raw data. Usually Data Warehouses contain a good amount of information.
Third is Transformation which involves cleaning the data of any noise and missing data.
Fourth is Data Mining which is composed of six common tasks; Anomaly Detection, Association Rule Learning, Clustering, Classification, Regression, and Summarization.
Fifth is Interpretation and Evaluation which tries to make sense of all the information gathered and find patterns that companies can use.
Some common goals of data mining include:
Extracting, transforming, and loading transaction data in a data warehouse system.
Storing and managing the data into a multidimensional database system.
Providing data access to business analysts.
Analyzing the data by using an application software.
Presenting the data in a useful format, including graphs and tables.