0% found this document useful (0 votes)

62 views

Dmbi PPT 1

The document outlines a course on data mining and warehousing. It discusses 7 chapters that will be covered: introduction to data mining, data warehousing concepts, online analytical processing, data pre-processing, mining frequent patterns and associations, classification, and clustering. It then provides details on chapter 1 which introduces data mining, including definitions, functionalities, architectures, and the knowledge discovery process. It also discusses some major issues in data mining and applications.

Uploaded by

Harsha Gangwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Dmbi PPT 1

Uploaded by

Harsha Gangwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Data Mining and Warehousing

(203105431)
Sandeep Jangir, Assistant Professor
Department of Computer Science & Engineering
The Course Outline
Chapter 1 : Introduction to data mining (DM):

Chapter 2: Overview and concepts Data Warehousing and Business

Intelligence
Chapter 3: Data Warehousing and Online Analytical Processing
Chapter 4: Data Pre-processing:
Chapter 5: Mining Frequent Patterns, Associations, and Correlations:
Chapter 6: Classification
Chapter 7: Clustering:
Chapter 8: Applications
CHAPTER-1
Introduction to Data Mining
Outline
1.1 Introduction to Data Mining
1.2 Data Mining-Definition and Functionalities
1.3 Classification of Data mining systems
1.4 Data mining Architecture
1.5 A data Mining: KDD Process
1.6 Major Issues in Data Mining
1.7 Applications of Data Mining

Image source : Google

Introduction of Data Mining

• Extraction of implicit, previously unknown and potentially useful

information from data

• Exploration & analysis, by automatic or semi-automatic means, of

large quantities of data in order to discover meaningful patterns

• Extraction of interesting (non-trivial, implicit, previously unknown and

potentially useful) patterns or knowledge from huge amount of data

Image source : Google

Introduction of Data Mining

• Data Mining also known as Knowledge discovery (mining) in databases

(KDD), knowledge extraction, data/pattern analysis, data archaeology, data
dredging, information harvesting, business intelligence, etc

•As data is growing at very remarkable rate, there comes a need to analyze
large, complex and information rich data sets to gain the hidden
information. This may result into greater customer satisfaction and
remarkable turn over for the firm.

Image source : Google

Why do We Need Mata Mining?

• Lots of data is being collected and warehoused

- Web data, e-commerce

- purchases at department/ grocery stores
- Bank/Credit Card transactions
Contd……
• Computers have become cheaper and more powerful
• Competitive Pressure is Strong
- Provide better, customized services for an edge (e.g. in Customer
Relationship Management)

Image source : Google

Data Mining Functionality
•Concept description: Characterization and discrimination
- Generalize, summarize, and contrast data characteristics, e.g., dry
vs. wet regions
• Association (correlation and causality) :
- ulti-dimensional vs. single-dimensional association
- age(X, ―20..29‖) ^ income(X, ―20..29K‖) ->buys(X, ―PC‖) [support
= 2%, confidence = 60%]
- contains(T, ―computer‖) -> contains(x, ―software‖) [1%, 75%]19

Image source : Google

Contd.....
•Classification and Prediction:
- Finding models (functions) that describe and distinguish classes or
concepts for future prediction
- E.g., classify countries based on climate, or classify cars based on
gas mileage
- Presentation: decision-tree, classification rule, neural network
- Prediction: Predict some unknown or missing numerical values

Image source : Google

Contd.....
•Cluster analysis :
- Class label is unknown: Group data to form new classes, e.g.,
cluster houses to find distribution patterns
- Clustering based on the principle: maximizing the intra-class
similarity and minimizing the interclass similarity20
•Outlier analysis :
- Outlier: a data object that does not comply with the general
behaviour of the data C

Image source : Google

Contd.....
- It can be considered as noise or exception but is quite useful in
fraud detection, rare events analysis
•Trend and evolution analysis:
- Trend and deviation: regression analysis
- Sequential pattern mining, periodicity analysis
- Similarity-based analysis2
•Other pattern-directed or statistical analyses

Image source : Google

Data Mining Task
•Data mining is widely divided into two parts:
- Predictive Data mining
- Descriptive Data mining

Image source : Google

Data Mining Task
• Predictive Data mining:
The objective of predictive tasks is to use the values of some
variable to predict the values of other variable.
- Ex: Web mining is used by the online marketers to predict the
purchase by online user on a website
•Classification
- Used to map data in a predefined groups.
•Regression
- Maps a data item to a real valued prediction variable.

Image source : Google

Data Mining Task
• Clustering
- Form a similar data together.
• Summarization
- It is used to map data in a subsets. Link Analysis defines
relationships among data.

Image source : Google

Data Mining Task
• Discriptive Data mining:
The objective of descriptive tasks is to find human readable patterns
which describes the relationships between data.

Image source : Google

Classification of Data Mining System
• Origin of Data Mining

Draws ideas from machine learning/AI,

pattern recognition, statistics, and
database systems

Figure: Origin of
Data Mining
Classification of Data Mining System
Traditional Techniques may be unsuitable due to:
- Enormity of data Statistics/ Machine Learning/ AI Pattern
- High dimensionality Recognition of data
- Heterogeneous, Data Mining distributed nature of data Database
systems
Classification of Data Mining System
• Decision in Data Mining

• Databases to be mined:
- Relational, transactional, object-oriented, object-relational,
active, spatial, time-series, text, multi-media, heterogeneous,
legacy, WWW, etc
• Knowledge to be mined:
- Characterization, discrimination, association, classification,
clustering, trend, deviation and outlier analysis, etc.
- Multiple/integrated functions and mining at multiple levels
Techniques utilized
Architecture of Data Mining System
• Four Data Mining Architecture

1. No- Coupling
2. Loose Coupling
3. Semi tight Coupling
4. Tight coupling
No- Coupling
• In this architecture, data mining system doesn’t use any functionality of a
database or data warehouse system.

• Data is retrieved from data sources like file system and processed using data
mining algorithms which are stored into file system.

• This architecture is considered as a poor architecture for data mining system as

it does not take any advantages of database or data warehouse.

• However it is used for simple data mining processes

Loose Coupling
• The loose coupling data mining system uses database or data warehouse for
data retrieval.

• In this architecture, data mining system retrieves data from database or data
warehouse, processes data using data mining algorithms and stores the result
in those systems.

• Loose coupling architecture is for memory-based data mining system which

does not require high scalability and high performance.
Semi- tight Coupling
• In semi-tight coupling data mining architecture, it not only links it to database
or data warehouse system, but it also uses several features of database or
data warehouse systems which perform some data mining tasks like sorting
and indexing etc.

• Moreover the intermediate result can also be stored in database or data

warehouse system for better performance
Tight Coupling
• In this architecture, database or data warehouse is treated as an information
retrieval component.

•Tight-coupling data mining architecture provides scalability, high performance

and integrated information.
Architecture of Data Warehouse
Data mining: A KDD Process

Figure: kDD
Process Data
mining: A KDD
Data mining: A KDD Process
• The KDD process comprises of a few steps leading from raw data
collections to some form of new knowledge.
• The iterative process consists of the following steps:
- Data cleaning
- Data integration
- Data selection
- Data transformation
- Data mining
- Pattern evaluation
- Knowledge representation
Data mining: A KDD Process
• Data cleaning
- noise data and irrelevant data are removed from the collection

•Data integration
- multiple data sources (heterogeneous) may be combined in a common
source

•Data selection
- data relevant to the analysis is decided on and retrieved from the data
collection

Image source : Google

Data mining: A KDD Process
•Data transformation
- Also known as data consolidation
- it is a phase in which the selected data is transformed into forms appropriate
for the mining procedure

•Data mining
- clever techniques are applied to extract patterns potentially useful.

•Pattern evaluation
- interesting patterns representing knowledge are identified based on given
measures

Image source : Google

Data mining: A KDD Process

•Knowledge representation
- final phase in which the discovered knowledge is visually represented to the
user

Image source : Google

Issue in Data Mining
Application of Data Mining
Database analysis and decision support

•Market analysis and management :

- Target marketing, customer relation management, market basket analysis,
cross selling, market segmentation Risk analysis and management
- Forecasting, customer retention, improved underwriting, quality control,
competitive analysis
- Fraud detection and management

•Other Applications:
- Text mining (news group, email, documents) and Web analysis
- Intelligent query answering

Image source : Google

Advantages of Data Mining
•Marketing /Retail
- Data mining helps marketing companies to build models based on historical
data which will precisely predict responders to the new marketing campaigns.

- Marketers will have appropriate approach for targeted customers

- Data mining helps retail companies as well. By using market basket analysis, a
store can have an appropriate arrangement in such a way that customers can
purchase frequent buying products together with pleasant. It also helps the
retail companies to offer certain discounts which will attract more customers.

Image source : Google

Advantages of Data Mining

• Finance / Banking
- By building a model from historical customer’s data of loans, the bank
officials and financial institution can determine good and bad loans.
- Data mining also helps banks to detect fraudulent credit card transactions

•Manufacturing
- Data mining is useful in operational engineering data which can detect faulty
equipments and determines optimal control parameters.
- Data mining can determine the range of control parameters which leads to
the production of perfect product. Hence optimal control parameters can
provide the desired quality.

Image source : Google

Advantages of Data Mining
• Governments
- Data mining helps government agency to analyze records of financial
transaction which will help in building patterns that can detect money
laundering or criminal activities.

• Market segmentation
- Data mining helps to identify the common characteristics of customers who
buy the same products from your company.

Image source : Google

Advantages of Data Mining
• Customer anticipation
- It helps to predict which customers may leave your company and go to a
competitor.

• Fraud detection
- It indentifies which transactions are most likely to be fraudulent.

Image source : Google

Advantages of Data Mining
• Direct marketing
- Direct marketing identifies which prospects should be included to obtain the
highest response rate.

• Interactive marketing
- It is useful for predicting what each user on a Web site is most likely
interested in seeing.

• Market basket analysis

- It helps to understand what products or services are commonly purchased
together.
• Trend analysis
- Trend analysis identifies the difference between a typical customer this
month and last. Image source : Google
Advantages of Data Mining
• Market basket analysis
- It helps to understand what products or services are commonly purchased
together.

• Trend analysis
- Trend analysis identifies the difference between a typical customer this
month and last.

Image source : Google

Disadvantages of Data Mining
• Privacy Issues
- The internet is booming with social networks, ecommerce, blogs etc, the
concerns about the personal privacy has been increasing.
- This worries the users as the information might be collected and used in
unethical way which can potentially cause a lot of troubles.
- Businesses collect the information of its users for setting up the marketing
strategies but there are chances that business might be taken by other firms or
gets shut down and that’s where a concern of misusing or leaking the personal
information arises.

Image source : Google

www.paruluniversity.ac.in

Ansi T1.111-1996
No ratings yet
Ansi T1.111-1996
444 pages
Ns 1
No ratings yet
Ns 1
103 pages
Generalized Linear Model With Excel Tutorial
100% (2)
Generalized Linear Model With Excel Tutorial
6 pages
Cooling Water Product List
100% (1)
Cooling Water Product List
3 pages
Unit 1
No ratings yet
Unit 1
46 pages
Chap 1
No ratings yet
Chap 1
32 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
DWDM
No ratings yet
DWDM
30 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Why Data Mining?: March 3, 2015
No ratings yet
Why Data Mining?: March 3, 2015
41 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
No ratings yet
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
1 Intro
No ratings yet
1 Intro
33 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
datamining&warehousing
No ratings yet
datamining&warehousing
65 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
17 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
UNIT-1 Introduction: Motivation: Why Data Mining?
No ratings yet
UNIT-1 Introduction: Motivation: Why Data Mining?
86 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
DB-14
No ratings yet
DB-14
97 pages
Data Mining - GDi Techno Solutions
No ratings yet
Data Mining - GDi Techno Solutions
145 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
Module1 IntroToDataMining
No ratings yet
Module1 IntroToDataMining
36 pages
Unit - II DW
No ratings yet
Unit - II DW
20 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
01Intro (2)
No ratings yet
01Intro (2)
45 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
1intro - Data Mining
No ratings yet
1intro - Data Mining
61 pages
Module 3
No ratings yet
Module 3
187 pages
01Intro1
No ratings yet
01Intro1
33 pages
Unit 1
No ratings yet
Unit 1
59 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
data mining 1
No ratings yet
data mining 1
39 pages
Anaum Hamid: Lecture 01 - Introduction To DM
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
50 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
91 pages
DM-Unit 1 PPT
No ratings yet
DM-Unit 1 PPT
110 pages
Introduction
No ratings yet
Introduction
46 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Data Mining
No ratings yet
Data Mining
395 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
DMDW
No ratings yet
DMDW
287 pages
PPT 1
No ratings yet
PPT 1
34 pages
Data Mining
No ratings yet
Data Mining
88 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
No ratings yet
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
17 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
July 16, 2009 1 Data Mining
No ratings yet
July 16, 2009 1 Data Mining
26 pages
Chapter 1___Data Mining and Data Warehouse
No ratings yet
Chapter 1___Data Mining and Data Warehouse
44 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Dmbi PPT 2
No ratings yet
Dmbi PPT 2
19 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
Crytography PPT 3
No ratings yet
Crytography PPT 3
49 pages
Crytography PPT 2
No ratings yet
Crytography PPT 2
53 pages
Ns 6
No ratings yet
Ns 6
35 pages
Cryptography Practical 1
No ratings yet
Cryptography Practical 1
41 pages
Cryptography Syllabus
0% (1)
Cryptography Syllabus
2 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Ns 3
No ratings yet
Ns 3
27 pages
Ns 5
No ratings yet
Ns 5
28 pages
Outline
No ratings yet
Outline
57 pages
Network Security-203105447: Mrs. Praptiba S. Parmar, Assistant Professor
No ratings yet
Network Security-203105447: Mrs. Praptiba S. Parmar, Assistant Professor
13 pages
Ns 2
No ratings yet
Ns 2
38 pages
Network Security Manual
No ratings yet
Network Security Manual
35 pages
ASSIGNMENT
No ratings yet
ASSIGNMENT
1 page
Aficio 1022-1027 PDF
No ratings yet
Aficio 1022-1027 PDF
327 pages
DE-Finals
No ratings yet
DE-Finals
6 pages
Ashish Pratap Singh Resume
No ratings yet
Ashish Pratap Singh Resume
2 pages
K.nithin Vignesh Maths
No ratings yet
K.nithin Vignesh Maths
7 pages
Multiplication Theorem of Probability - 0
No ratings yet
Multiplication Theorem of Probability - 0
7 pages
Aj5515e-Cz3c 183181
No ratings yet
Aj5515e-Cz3c 183181
1 page
Laboratory 2 Hall-Effect Sensors: ME 104 Sensors and Actuators Fall 2003
No ratings yet
Laboratory 2 Hall-Effect Sensors: ME 104 Sensors and Actuators Fall 2003
13 pages
Copeland Chapter One
50% (2)
Copeland Chapter One
49 pages
PR2 Curriculum Guide
No ratings yet
PR2 Curriculum Guide
7 pages
Week 10 - Principle of Organization Repetition, Rhythm, Radiation, and Dominance
No ratings yet
Week 10 - Principle of Organization Repetition, Rhythm, Radiation, and Dominance
4 pages
Working With CSV Files
No ratings yet
Working With CSV Files
4 pages
M7 LAS Q2 Wk3
No ratings yet
M7 LAS Q2 Wk3
2 pages
ABB Application Manual - Integrated Vision - 3HAC044251-en PDF
100% (1)
ABB Application Manual - Integrated Vision - 3HAC044251-en PDF
104 pages
Vitsan: Certificate of Analysis
No ratings yet
Vitsan: Certificate of Analysis
1 page
Question Bank: (Microwave Engineering)
No ratings yet
Question Bank: (Microwave Engineering)
7 pages
Clauses in English Grammar
No ratings yet
Clauses in English Grammar
3 pages
O & M Manual For The ATC-800 Automatic Transfer Switch Controller
No ratings yet
O & M Manual For The ATC-800 Automatic Transfer Switch Controller
40 pages
Engineering Vol 72 1901-07-19
No ratings yet
Engineering Vol 72 1901-07-19
35 pages
Unit II - HES
No ratings yet
Unit II - HES
215 pages
Komutları
No ratings yet
Komutları
2 pages
Estudio del mecanismo de desgaste en deltas
No ratings yet
Estudio del mecanismo de desgaste en deltas
5 pages
MESA White Paper 50 - Time-In-State Metrics - 3 2014-6
100% (1)
MESA White Paper 50 - Time-In-State Metrics - 3 2014-6
22 pages
IADC Guidance For UBO and MPD Techniques Land Operations 06 29 21 v3
No ratings yet
IADC Guidance For UBO and MPD Techniques Land Operations 06 29 21 v3
17 pages
ENGINE Ignition System - Service Information - Ram Pickup PDF
No ratings yet
ENGINE Ignition System - Service Information - Ram Pickup PDF
20 pages
Cleaning Guide TURBOTECT 2020
No ratings yet
Cleaning Guide TURBOTECT 2020
1 page
Phenols_Worksheet_Year_13 - Copy
No ratings yet
Phenols_Worksheet_Year_13 - Copy
24 pages
Matthews 1990
No ratings yet
Matthews 1990
21 pages

Uploaded by

Uploaded by

Data Mining and Warehousing

Chapter 2: Overview and concepts Data Warehousing and Business

Image source : Google

• Extraction of implicit, previously unknown and potentially useful

• Exploration & analysis, by automatic or semi-automatic means, of

• Extraction of interesting (non-trivial, implicit, previously unknown and

Image source : Google

• Data Mining also known as Knowledge discovery (mining) in databases

Image source : Google

• Lots of data is being collected and warehoused

- Web data, e-commerce

Image source : Google

Image source : Google

Image source : Google

Image source : Google

Image source : Google

Image source : Google

Image source : Google

Image source : Google

Image source : Google

Draws ideas from machine learning/AI,

• This architecture is considered as a poor architecture for data mining system as

• However it is used for simple data mining processes

• Loose coupling architecture is for memory-based data mining system which

• Moreover the intermediate result can also be stored in database or data

•Tight-coupling data mining architecture provides scalability, high performance

Image source : Google

Image source : Google

Image source : Google

•Market analysis and management :

Image source : Google

- Marketers will have appropriate approach for targeted customers

Image source : Google

Image source : Google

Image source : Google

Image source : Google

• Market basket analysis

Image source : Google

Image source : Google

You might also like