0% found this document useful (0 votes)
44 views

Data Mining & Data Warehousing

This document defines data mining and data warehousing. Data mining is the process of extracting hidden predictive information from large databases to help companies focus on important information. Data warehousing involves collecting and structuring data from multiple systems to reduce the time needed to produce reliable information. The goals of data mining include prediction, identification, classification, and optimization. Typical data mining approaches are verification-driven and discovery-driven. Data warehousing is needed to facilitate better decision making in large companies by providing a consistent source of historical corporate information.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Data Mining & Data Warehousing

This document defines data mining and data warehousing. Data mining is the process of extracting hidden predictive information from large databases to help companies focus on important information. Data warehousing involves collecting and structuring data from multiple systems to reduce the time needed to produce reliable information. The goals of data mining include prediction, identification, classification, and optimization. Typical data mining approaches are verification-driven and discovery-driven. Data warehousing is needed to facilitate better decision making in large companies by providing a consistent source of historical corporate information.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA MINING

&
DATA
WAREHOUSING
DEFINITION
The Data Mining process is the extraction of
valid and previously unknown information.
OR
The process of the extraction of hidden
predictive information from large databases,
is a powerful new technology with a great
potential to help companies focus on the
most important information in their data
warehouses.
Why Do We Need Data Mining?
 To handle bulk of Data in various enterprises,
thereby increasing the margin.
 To turn incomprehensible Data into Usable

information.
 It is a combination of ideas from statistics,

machine learning, Databases and parallel


computing.
Goals of Data Mining
Prediction
Identification
Classification
Optimization
Prediction
How certain attributes within the Data behave
in future.
Like as:-
 What customers buy with discount.
 How much sale value a store generates in a

given period.
 Whether deleting a sale line yield more profit.
 Uses techniques like regression , correlation

etc.
Identification
Data patterns used to identify the existence of
an item, an event or an activity.
 Intruders trying to break the computer

system may be identified by the program


executed, files accessed and CPU time per
session.
 Existence of gene is identified by certain

sequence of nucleotide symbols present in


the DNA sequence .
Classification
Data partition to identify different classes or
patterns based on combination of
parameters.
 Customers can be identified as discount

seekers, shoppers in a rush, loyal regular


customers, shoppers attached to name
brands etc.
 Classification can help in categorizing food as

health food, party food, school lunch food


etc.
Data Mining Approaches
 Verification Driven Data Mining:- Querying &
reporting, presenting the output in graphical,
tabular & textual forms, through multi-dimensional
analysis & through statistical analysis.
 Discovery Data Driven Mining:- There are four

different discovery driven Data Mining approaches


for at present:-
 Predictive modeling including neural nets.
 Link-analysis technique which attempts to establish

links between records.


 Database segmentation which partitions the data

into collections of related records, and


 Deviation detection which identifies point that do

not fit in a segment.


Data Mining Process(A KDD Process)

Data Mining-core of knowledge


discovery
Steps of a KDD Process
 Learning the application domain
relevant prior knowledge and goals of
application.
 Creating a target data set :data selection
 Data cleaning and preprocessing
 Data reduction and transformation.

find useful features, dimensionally /


variable reduction, invariant representation.
 Choosing functions of data mining.

summarization,classification,regression
,association, clustering.
 Choosing the mining algorithms.
 Data mining: search for patterns of interest.
 Pattern evaluation and knowledge presentation.

visualization, transformation, removing


redundant patterns etc.
 Use of discovered knowledge.
Typical Data Mining System
ARCHITECTURE

GRAPHICAL USER INTERFACE

PATTERN EVALUATION

DATA MINING ENGINE KNOW


LEDGE
DATA CLEANING & BASE

FILTERING DATA WAREHOUSE

DATA INTEGARTION

DATA
DATA
BASE
WAREH
S
Process of Data Mining
TRANSFORMED

ASSIMI
DATA LATIO
N
EXTRACTE
TRANS
D
FORM
DATA
ED
Data DATA
wareho SELECTE
use D DATA

SELECT TRANSF ASSIMILA


ORM MINE TE
APPLICATIONS OF DATA MINING
 Data mining predict future trends & behaviors,
allowing businesses to make proactive,
knowledge driven decisions.
 Using a method called “neural segmentation” a

no. of different types of purchase patterns can


be identified and then customers groupings can
be associated with this data.
 To minimize the resources, it is necessary to

identify what factors affect the crop yield, out of


such items as chemical fertilizers & additives.
Which are
lowest/high
est margin
What is the customers?
most Who are my
effective customers
distribution and what
Channel? products are
Data they buying?

warehousing customers

What
product Which
promotions What impact customers
have the will new are most
biggest products/ser likely to go to
impact on vices have the
revenue? on revenue & competition
margins?
WHAT IS DATA WAREHOUSE?
DATA COLLECTED FROM ONE OR MANY
SYSTEMS THAT EXIST WITHIN AND OUTSIDE
THE ORGANIZATION. THE DATA IS
STRUCTURED INSUCH A WAY AS TO REDUCE
THE AMOUNT OF TIME THAT IT TAKES TO
PRODUCE RELIABLE INFORMATION.
WHY DO WE NEED DATA
WAREHOUSING?
 As It has both hardware and software
components which facilitates taking better
decisions in massive companies.
 To provide a consistent common source for

corporate information.
 To store large volumes of historical detail

data from mission critical applications.


 Improve the ability to access, report against

and analyze information.


 To solve or improve business processes.

You might also like