0% found this document useful (0 votes)
62 views

Dmbi PPT 1

The document outlines a course on data mining and warehousing. It discusses 7 chapters that will be covered: introduction to data mining, data warehousing concepts, online analytical processing, data pre-processing, mining frequent patterns and associations, classification, and clustering. It then provides details on chapter 1 which introduces data mining, including definitions, functionalities, architectures, and the knowledge discovery process. It also discusses some major issues in data mining and applications.

Uploaded by

Harsha Gangwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Dmbi PPT 1

The document outlines a course on data mining and warehousing. It discusses 7 chapters that will be covered: introduction to data mining, data warehousing concepts, online analytical processing, data pre-processing, mining frequent patterns and associations, classification, and clustering. It then provides details on chapter 1 which introduces data mining, including definitions, functionalities, architectures, and the knowledge discovery process. It also discusses some major issues in data mining and applications.

Uploaded by

Harsha Gangwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Mining and Warehousing

(203105431)
Sandeep Jangir, Assistant Professor
Department of Computer Science & Engineering
The Course Outline
Chapter 1 : Introduction to data mining (DM):

Chapter 2: Overview and concepts Data Warehousing and Business


Intelligence
Chapter 3: Data Warehousing and Online Analytical Processing
Chapter 4: Data Pre-processing:
Chapter 5: Mining Frequent Patterns, Associations, and Correlations:
Chapter 6: Classification
Chapter 7: Clustering:
Chapter 8: Applications
CHAPTER-1
Introduction to Data Mining
Outline
1.1 Introduction to Data Mining
1.2 Data Mining-Definition and Functionalities
1.3 Classification of Data mining systems
1.4 Data mining Architecture
1.5 A data Mining: KDD Process
1.6 Major Issues in Data Mining
1.7 Applications of Data Mining

Image source : Google


Introduction of Data Mining

• Extraction of implicit, previously unknown and potentially useful


information from data

• Exploration & analysis, by automatic or semi-automatic means, of


large quantities of data in order to discover meaningful patterns

• Extraction of interesting (non-trivial, implicit, previously unknown and


potentially useful) patterns or knowledge from huge amount of data

Image source : Google


Introduction of Data Mining

• Data Mining also known as Knowledge discovery (mining) in databases


(KDD), knowledge extraction, data/pattern analysis, data archaeology, data
dredging, information harvesting, business intelligence, etc

•As data is growing at very remarkable rate, there comes a need to analyze
large, complex and information rich data sets to gain the hidden
information. This may result into greater customer satisfaction and
remarkable turn over for the firm.

Image source : Google


Why do We Need Mata Mining?

• Lots of data is being collected and warehoused

- Web data, e-commerce


- purchases at department/ grocery stores
- Bank/Credit Card transactions
Contd……
• Computers have become cheaper and more powerful
• Competitive Pressure is Strong
- Provide better, customized services for an edge (e.g. in Customer
Relationship Management)

Image source : Google


Data Mining Functionality
•Concept description: Characterization and discrimination
- Generalize, summarize, and contrast data characteristics, e.g., dry
vs. wet regions
• Association (correlation and causality) :
- ulti-dimensional vs. single-dimensional association
- age(X, ―20..29‖) ^ income(X, ―20..29K‖) ->buys(X, ―PC‖) [support
= 2%, confidence = 60%]
- contains(T, ―computer‖) -> contains(x, ―software‖) [1%, 75%]19

Image source : Google


Contd.....
•Classification and Prediction:
- Finding models (functions) that describe and distinguish classes or
concepts for future prediction
- E.g., classify countries based on climate, or classify cars based on
gas mileage
- Presentation: decision-tree, classification rule, neural network
- Prediction: Predict some unknown or missing numerical values

Image source : Google


Contd.....
•Cluster analysis :
- Class label is unknown: Group data to form new classes, e.g.,
cluster houses to find distribution patterns
- Clustering based on the principle: maximizing the intra-class
similarity and minimizing the interclass similarity20
•Outlier analysis :
- Outlier: a data object that does not comply with the general
behaviour of the data C

Image source : Google


Contd.....
- It can be considered as noise or exception but is quite useful in
fraud detection, rare events analysis
•Trend and evolution analysis:
- Trend and deviation: regression analysis
- Sequential pattern mining, periodicity analysis
- Similarity-based analysis2
•Other pattern-directed or statistical analyses

Image source : Google


Data Mining Task
•Data mining is widely divided into two parts:
- Predictive Data mining
- Descriptive Data mining

Image source : Google


Data Mining Task
• Predictive Data mining:
The objective of predictive tasks is to use the values of some
variable to predict the values of other variable.
- Ex: Web mining is used by the online marketers to predict the
purchase by online user on a website
•Classification
- Used to map data in a predefined groups.
•Regression
- Maps a data item to a real valued prediction variable.

Image source : Google


Data Mining Task
• Clustering
- Form a similar data together.
• Summarization
- It is used to map data in a subsets. Link Analysis defines
relationships among data.

Image source : Google


Data Mining Task
• Discriptive Data mining:
The objective of descriptive tasks is to find human readable patterns
which describes the relationships between data.

Image source : Google


Classification of Data Mining System
• Origin of Data Mining

Draws ideas from machine learning/AI,


pattern recognition, statistics, and
database systems

Figure: Origin of
Data Mining
Classification of Data Mining System
Traditional Techniques may be unsuitable due to:
- Enormity of data Statistics/ Machine Learning/ AI Pattern
- High dimensionality Recognition of data
- Heterogeneous, Data Mining distributed nature of data Database
systems
Classification of Data Mining System
• Decision in Data Mining

• Databases to be mined:
- Relational, transactional, object-oriented, object-relational,
active, spatial, time-series, text, multi-media, heterogeneous,
legacy, WWW, etc
• Knowledge to be mined:
- Characterization, discrimination, association, classification,
clustering, trend, deviation and outlier analysis, etc.
- Multiple/integrated functions and mining at multiple levels
Techniques utilized
Architecture of Data Mining System
• Four Data Mining Architecture

1. No- Coupling
2. Loose Coupling
3. Semi tight Coupling
4. Tight coupling
No- Coupling
• In this architecture, data mining system doesn’t use any functionality of a
database or data warehouse system.

• Data is retrieved from data sources like file system and processed using data
mining algorithms which are stored into file system.

• This architecture is considered as a poor architecture for data mining system as


it does not take any advantages of database or data warehouse.

• However it is used for simple data mining processes


Loose Coupling
• The loose coupling data mining system uses database or data warehouse for
data retrieval.

• In this architecture, data mining system retrieves data from database or data
warehouse, processes data using data mining algorithms and stores the result
in those systems.

• Loose coupling architecture is for memory-based data mining system which


does not require high scalability and high performance.
Semi- tight Coupling
• In semi-tight coupling data mining architecture, it not only links it to database
or data warehouse system, but it also uses several features of database or
data warehouse systems which perform some data mining tasks like sorting
and indexing etc.

• Moreover the intermediate result can also be stored in database or data


warehouse system for better performance
Tight Coupling
• In this architecture, database or data warehouse is treated as an information
retrieval component.

•Tight-coupling data mining architecture provides scalability, high performance


and integrated information.
Architecture of Data Warehouse
Data mining: A KDD Process

Figure: kDD
Process Data
mining: A KDD
Data mining: A KDD Process
• The KDD process comprises of a few steps leading from raw data
collections to some form of new knowledge.
• The iterative process consists of the following steps:
- Data cleaning
- Data integration
- Data selection
- Data transformation
- Data mining
- Pattern evaluation
- Knowledge representation
Data mining: A KDD Process
• Data cleaning
- noise data and irrelevant data are removed from the collection

•Data integration
- multiple data sources (heterogeneous) may be combined in a common
source

•Data selection
- data relevant to the analysis is decided on and retrieved from the data
collection

Image source : Google


Data mining: A KDD Process
•Data transformation
- Also known as data consolidation
- it is a phase in which the selected data is transformed into forms appropriate
for the mining procedure

•Data mining
- clever techniques are applied to extract patterns potentially useful.

•Pattern evaluation
- interesting patterns representing knowledge are identified based on given
measures

Image source : Google


Data mining: A KDD Process

•Knowledge representation
- final phase in which the discovered knowledge is visually represented to the
user

Image source : Google


Issue in Data Mining
Application of Data Mining
Database analysis and decision support

•Market analysis and management :


- Target marketing, customer relation management, market basket analysis,
cross selling, market segmentation Risk analysis and management
- Forecasting, customer retention, improved underwriting, quality control,
competitive analysis
- Fraud detection and management

•Other Applications:
- Text mining (news group, email, documents) and Web analysis
- Intelligent query answering

Image source : Google


Advantages of Data Mining
•Marketing /Retail
- Data mining helps marketing companies to build models based on historical
data which will precisely predict responders to the new marketing campaigns.

- Marketers will have appropriate approach for targeted customers

- Data mining helps retail companies as well. By using market basket analysis, a
store can have an appropriate arrangement in such a way that customers can
purchase frequent buying products together with pleasant. It also helps the
retail companies to offer certain discounts which will attract more customers.

Image source : Google


Advantages of Data Mining

• Finance / Banking
- By building a model from historical customer’s data of loans, the bank
officials and financial institution can determine good and bad loans.
- Data mining also helps banks to detect fraudulent credit card transactions

•Manufacturing
- Data mining is useful in operational engineering data which can detect faulty
equipments and determines optimal control parameters.
- Data mining can determine the range of control parameters which leads to
the production of perfect product. Hence optimal control parameters can
provide the desired quality.

Image source : Google


Advantages of Data Mining
• Governments
- Data mining helps government agency to analyze records of financial
transaction which will help in building patterns that can detect money
laundering or criminal activities.

• Market segmentation
- Data mining helps to identify the common characteristics of customers who
buy the same products from your company.

Image source : Google


Advantages of Data Mining
• Customer anticipation
- It helps to predict which customers may leave your company and go to a
competitor.

• Fraud detection
- It indentifies which transactions are most likely to be fraudulent.

Image source : Google


Advantages of Data Mining
• Direct marketing
- Direct marketing identifies which prospects should be included to obtain the
highest response rate.

• Interactive marketing
- It is useful for predicting what each user on a Web site is most likely
interested in seeing.

• Market basket analysis


- It helps to understand what products or services are commonly purchased
together.
• Trend analysis
- Trend analysis identifies the difference between a typical customer this
month and last. Image source : Google
Advantages of Data Mining
• Market basket analysis
- It helps to understand what products or services are commonly purchased
together.

• Trend analysis
- Trend analysis identifies the difference between a typical customer this
month and last.

Image source : Google


Disadvantages of Data Mining
• Privacy Issues
- The internet is booming with social networks, ecommerce, blogs etc, the
concerns about the personal privacy has been increasing.
- This worries the users as the information might be collected and used in
unethical way which can potentially cause a lot of troubles.
- Businesses collect the information of its users for setting up the marketing
strategies but there are chances that business might be taken by other firms or
gets shut down and that’s where a concern of misusing or leaking the personal
information arises.

Image source : Google


www.paruluniversity.ac.in

You might also like