There are various steps that are involved in mining data as shown in the picture.
- Data Integration: First of all the data are collected and integrated from all the different sources.
- Data Selection: We may not all the data we have collected in the first step. So in this step we select only those data which we think useful for data mining.
- Data Cleaning: The data we have collected are not clean and may contain errors, missing values, noisy or inconsistent data. So we need to apply different techniques to get rid of such anomalies.
- Data Transformation: The data even after cleaning are not ready for mining as we need to transform them into forms appropriate for mining. The techniques used to accomplish this are smoothing, aggregation, normalization etc.
- Data Mining: Now we are ready to apply data mining techniques on the data to discover the interesting patterns. Techniques like clustering and association analysis are among the many different techniques used for data mining.
- Pattern Evaluation and Knowledge Presentation: This step involves visualization, transformation, removing redundant patterns etc from the patterns we generated.
- Decisions / Use of Discovered Knowledge: This step helps user to make use of the knowledge acquired to take better decisions.