Data mining is a process where intriguing and beneficial patterns and relationships in huge quantity of data are discovered. The field merges tools of statistics and artificial intelligence with database management to scrutinize large digital collections, which are called data sets. Insurance, banking, retail, research on astronomy and medicine and the government security are some common applications where data mining is used.
Data Mining Techniques:
The techniques where data are categorized by the type of information and the kind of knowledge that is to be derived from the data-mining model.
- Predictive Modeling. This skill is used when you have to predict the value of a specific aspect and sample training data is available for which the values of that attribute are present. Classification, for instance, is a way in which a set of data which is already distributed into predefined categories and looks for similarities in the data that distinguishes those groups. These discovered patterns can be further used to segregate other data where the correct group destination is unknown, although the other attributes might be known.
- Descriptive Modeling. This is a technique that categorizes data into different groups, which is a process known as “clustering”. In this case, however, the appropriate groups are not known beforehand, and the patterns that surface during the scrutiny of the data are utilized to figure out the groups. For example, a general population could be analyzed to divide likely customers into various clusters, and expand advertising drives that focus on each cluster.
- Anomaly Detection. This could be viewed as the opposite clustering, since it focuses on discovering those data instances that are unique and do not follow any set pattern. An example of anomaly detection is fraud detection. Anomaly detection focuses on modeling normal behavior to find out any bizarre transactions. It is further used in a plethora of monitoring systems like intrusion detection.
Various other data-mining techniques have evolved, including pattern discovery in time series data (like stock prices), data streaming and relational learning (for example, social networks).