Oct 31, 2008

Steps of Data Mining


There are various steps that are involved in mining data as shown in the picture.
  1. Data Integration: First of all the data are collected and integrated from all the different sources.
  2. Data Selection: We may not all the data we have collected in the first step. So in this step we select only those data which we think useful for data mining.
  3. Data Cleaning: The data we have collected are not clean and may contain errors, missing values, noisy or inconsistent data. So we need to apply different techniques to get rid of such anomalies.
  4. Data Transformation: The data even after cleaning are not ready for mining as we need to transform them into forms appropriate for mining. The techniques used to accomplish this are smoothing, aggregation, normalization etc.
  5. Data Mining: Now we are ready to apply data mining techniques on the data to discover the interesting patterns. Techniques like clustering and association analysis are among the many different techniques used for data mining.
  6. Pattern Evaluation and Knowledge Presentation: This step involves visualization, transformation, removing redundant patterns etc from the patterns we generated.
  7. Decisions / Use of Discovered Knowledge: This step helps user to make use of the knowledge acquired to take better decisions.

Oct 25, 2008

Difference between Data Mining and Data Warehousing

Data Mining provides the Enterprise with intelligence and Data Warehousing provides the Enterprise with a memory.

Data warehousing is the process that is used to integrate and combine data from multiple sources and format into a single unified schema. So it provides the enterprise with a storage mechanism for its huge amount of data. On the other hand, Data mining is the process of extracting interesting patterns and knowledge from huge amount of data. So we can apply data mining techniques on the data warehouse of an enterprise to discover useful patterns.

Oct 24, 2008

What is Data Warehousing?

An organization may have a number of different systems like TPS, ERP, Accounting Systems and so on. Each of these systems collect data but the data collected through these systems are not always in the same format. So we need to combine all these data from different sources into a single standard form and this process of combining multiple databases into a single homogenous form is called data warehousing.

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.Subject-oriented

The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together.
Time-variant
The changes to the data in the data warehouse are tracked and recorded so that reports can be produced showing changes over time.
Non-volatile
Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting.
Integrated
The data warehouse contains data from most or all of an organization's operational systems and this data is made consistent.

History of Data Mining


Data mining is a fairly new concept which was emerged in the late 1980s. But it soon attracted huge interests for research works and flourishes with many new and remarkable techniques being discovered throughout the 1990s. Data Mining has evolved from a number of different disciplines like statistics, machine learning, artificial intelligence, database technologies and so on.

Why Data Mining?

"We are overwhelmed with data but starved of knowledge"

In today's world we are overwhelmed with data and information from various sources. The advancement in the field of IT makes collection of data easier than ever before. A business enterprise has various systems like transaction processing system, HR management system, Accounting system and so on and each of these systems are collecting huge piles of data everyday.

Are these data help us take more intelligent decisions? The answer is no. To take better decisions you need to discover and understand the underlying patterns involved in your business from these data. For example, it's no more enough for a retailer to know just the amount of sales, profit and expenses he is making in this highly competitive business environment. To expand its business and achieve higher goals it has to search for answers to the questions like:

  • Which products are bought most often together?
  • Identify the segment of customers who are most likely to buy certain products so that it can promote those products only to them.
  • What should be profit range five years from now?

Data mining has the answers to all these questions. Data mining can help organizations to have useful insights into its business from the data it has collected over the years and take better decisions to achieve new heights.

Oct 17, 2008

What is Data Mining?


The word "Mining" refers to the extraction of valuable things like minerals from the earth. However, data mining is the process by which we can extract interesting patterns and knowledge from huge amounts of data. The data mining is a relatively new field of study and research and has generated huge interests among business communities. It is a an important part of business intelligence which deals with how an organization uses, analyzes, manages and stores data it collects from various sources to make better decisions.

In the popular article, "IT Does Not Matter" Nicholas Carr argued that the use of IT is nowadays so widespread that any particular organization does not have any strategic advantage over the others due to the use of IT. So IT has lost its strategic importance. But data mining is one of most important concept of IT that proves him wrong. It reflects that an organization can create strategic advantages over its competitors by making use of data mining to get to important insights from the the data it collects. The way an organization collects data and analyzes it is not same for any organizations. So an organization can easily gain competitive values over others using data mining.

A Welcome Note !!!!

Hello and welcome to all of you in this fascinating and rapidly changing world of data mining and warehousing. I being very interested on this topic would like to share all the new things I come across. I have been trying to understand the data mining concepts, techniques and tools for quite some time now but I faced the difficulty of getting useful and desired information on the topic at a single site. I will use this blog to make things easy for all of you interested in data mining with useful posts and to keep note of all the things I will learn.