Nov 20, 2009

Data Warehouse Design

Data Warehouse can be designed in two different ways based on Ralph Kimball and Bill Inmon's model.
In Inmon's architecture data from OLTP databse is stored in a warehouse before it is transferred to the data marts. In Inmon's view the data warehouse is a real database.

According to Kimball's architecture data is transferred directly from OLTP database to the data mart. In Kimball's model the data warehouse is actually a collection of integrated data marts.

Nov 12, 2009

SQL Server Replication

Replication is the process of creating and maintaining a separate copy of database in another server or the same server and keeping both synchronized. Replication is useful in a number of scenario:

1. To integrate multiple database to build a central warehouse: Businesses operating on a number of distant locations usually collect data in separate database. However the senior management needs to generate reports showing analysis of data collected from all these sites. Hence the need for central warehouse is increasing and replication proves to be a powerful way to create and maintain the warehouse. It can also be used to create and maintain a separate data mart for OLAP operations.

2. To ensure higher availability and improved performance of web applications: Web applications with the increasing number of users faces huge problem in avoiding the extra load on the database. Replication along with other techniques can offer the right solution by distributing users to a number of replicated database in different servers and thus improving the performance and ensure high availability.

Sep 28, 2009

OLAP Video Tutorial

This is an exciting video I came across where OLAP concepts are explained in a easy to understand way using cartoons.

Sep 26, 2009

Dimensional Modeling


Dimensional Modeling refers to the technique used to design logical models for database which can support OLAP and Data Warehousing operations. It differs from the relational modeling or ER modeling in the sense that it can be used for data warehouse as well as relational databases.

Dimensional modeling is based on two concepts called facts and dimensions. Facts are usually numeric values or something that can be measured or aggregated. For example: monthly sales amount. Dimensions are groups of hierarchies and attributes that are used to define facts. For example product and its color. In a dimensional model a group of facts forms a fact table while a group of dimensions form a dimension table.


Sep 20, 2009

Market Basket Analysis


Market Basket Analysis is used to identify or gain interesting insights into the way customers buy items. It finds out the combination of items which are most likely to be bought together. For example buying milk and bread together can lead to buying butter. Identifying such patterns from the transaction data collected over the years provide new window of opportunity to explore and understand the purchasing behavior and thus can be used to increase sales through cross selling and targeted marketing. Market basket analysis is not limited to retail sales but are applied to a wide range of areas like analyzing credit card purchases, fraud detection for insurance claims and so on. Data Mining technique called association rule mining is usually used. Algorithms like FP-Growth and Apriori algorithm are usually used for mining the transaction data to get the desired patterns.

Sep 18, 2009

MDX - Multi-Dimensional Expressions

OLAP cubes stores data in multidimensional form whereas in the relational database data is stored in the 2 dimensions: rows and columns. OLAP cubes can store data in more than these 2 dimensions. The SQL or Structured Query Language enables us to access and manipulate the data resinding in the relational database. However SQL cannot be used for the same purposes for OLAP cubes. MDX is the language used for accessing and manipulating data in the cubes.

Jul 30, 2009

Web Usage Mining

Web Usage Mining: Whenever anyone logs on to a web site the user's access information like time spent on the site, links visited, geographical location from where accessed etc are stored in the log file of the website. These data can be mined to discover user access patterns for the site. The site can then be redesigned to maximize hits and better user experience like having the most accessed link to be available on the front page of the site. This also provides patterns which can be used to run successful ad campaigns on the site.

Jul 24, 2009

Web Mining

The WWW has revolutionize the way in which people interact, carry out their works and gather information. It has proved itself to be a useful interface for its users to carry out such activities with ease. With hundreds of millions of people around the world using it a huge pile of data are collected everyday. These data carries interesting insights on the way people interact with it. Web Mining is the process of using various data mining techniques to analyze and discover patterns from the data.

Jul 11, 2009

OLAP And OLTP

OLAP and OLTP are two of the most common words in the world of database. However the differences between the two are not well understood. OLTP and OLAP refer to Online Transaction Processing and Online Analytical Processing respectively. The word "online" is common to both and highlights that both the systems are capable to carry out certain things almost instantaneously.

The OLTP systems are used to register the transactions occurring in real time. In a retail shop the buying information needs to be instantaneously recorded by the salesman during any purchase of goods. OLTP systems allow such transactions to be carried out within a very short span of time. Hence to process any kinds of transactions OLTP systems are used.

It is not worth anymore to merely store the transaction data. We need to carry out various operations on those data to get the real picture of various aspects of the business. Reports generated based on these data are crucial part of the Decision Support System(DSS). However such complex operations on large volume of transaction data makes the report generation very slow and puts extra load on the OLTP system in use. So to overcome these limitations OLAP are used. OLAP enables data to be stored in multidimensional forms which is often called OLAP cube. OLAP cube ensures very fast retrieval or analysis of data.

Jan 23, 2009

Data Mining in Economic Crisis

In this time of economic turmoil the whole world is slowly plunging into recession. We have seen bailouts from governments worth billions of dollars for giant companies in US, Europe and Asia. The businesses are finding hard times to operate and so ability to make smarter and intelligent business decisions are imperative to overcome in such harsh conditions.

Businesses who have already invested in business intelligence solutions will be in a better position to undertake right measures to survive and continue its growth. However, many may argue that most of the giant companies were using Data mining and BI solutions but still could not avoid the collapse. The important thing to note here is that data mining solutions provides an analytical perspective into the performance of an organization based on historical data but the economic impact on an organization is linked to many issues and in many cases to external forces and unscrupulous activities. The failure to predict this does not undermine the role of data mining for organizations but on the contrary makes it more important especially for regulatory bodies of governments to predict and identify such practices in advance and take necessary measures to avoid such circumstances in future.