Dmbi PPT 1
Dmbi PPT 1
(203105431)
Sandeep Jangir, Assistant Professor
Department of Computer Science & Engineering
The Course Outline
Chapter 1 : Introduction to data mining (DM):
•As data is growing at very remarkable rate, there comes a need to analyze
large, complex and information rich data sets to gain the hidden
information. This may result into greater customer satisfaction and
remarkable turn over for the firm.
Figure: Origin of
Data Mining
Classification of Data Mining System
Traditional Techniques may be unsuitable due to:
- Enormity of data Statistics/ Machine Learning/ AI Pattern
- High dimensionality Recognition of data
- Heterogeneous, Data Mining distributed nature of data Database
systems
Classification of Data Mining System
• Decision in Data Mining
• Databases to be mined:
- Relational, transactional, object-oriented, object-relational,
active, spatial, time-series, text, multi-media, heterogeneous,
legacy, WWW, etc
• Knowledge to be mined:
- Characterization, discrimination, association, classification,
clustering, trend, deviation and outlier analysis, etc.
- Multiple/integrated functions and mining at multiple levels
Techniques utilized
Architecture of Data Mining System
• Four Data Mining Architecture
1. No- Coupling
2. Loose Coupling
3. Semi tight Coupling
4. Tight coupling
No- Coupling
• In this architecture, data mining system doesn’t use any functionality of a
database or data warehouse system.
• Data is retrieved from data sources like file system and processed using data
mining algorithms which are stored into file system.
• In this architecture, data mining system retrieves data from database or data
warehouse, processes data using data mining algorithms and stores the result
in those systems.
Figure: kDD
Process Data
mining: A KDD
Data mining: A KDD Process
• The KDD process comprises of a few steps leading from raw data
collections to some form of new knowledge.
• The iterative process consists of the following steps:
- Data cleaning
- Data integration
- Data selection
- Data transformation
- Data mining
- Pattern evaluation
- Knowledge representation
Data mining: A KDD Process
• Data cleaning
- noise data and irrelevant data are removed from the collection
•Data integration
- multiple data sources (heterogeneous) may be combined in a common
source
•Data selection
- data relevant to the analysis is decided on and retrieved from the data
collection
•Data mining
- clever techniques are applied to extract patterns potentially useful.
•Pattern evaluation
- interesting patterns representing knowledge are identified based on given
measures
•Knowledge representation
- final phase in which the discovered knowledge is visually represented to the
user
•Other Applications:
- Text mining (news group, email, documents) and Web analysis
- Intelligent query answering
- Data mining helps retail companies as well. By using market basket analysis, a
store can have an appropriate arrangement in such a way that customers can
purchase frequent buying products together with pleasant. It also helps the
retail companies to offer certain discounts which will attract more customers.
• Finance / Banking
- By building a model from historical customer’s data of loans, the bank
officials and financial institution can determine good and bad loans.
- Data mining also helps banks to detect fraudulent credit card transactions
•Manufacturing
- Data mining is useful in operational engineering data which can detect faulty
equipments and determines optimal control parameters.
- Data mining can determine the range of control parameters which leads to
the production of perfect product. Hence optimal control parameters can
provide the desired quality.
• Market segmentation
- Data mining helps to identify the common characteristics of customers who
buy the same products from your company.
• Fraud detection
- It indentifies which transactions are most likely to be fraudulent.
• Interactive marketing
- It is useful for predicting what each user on a Web site is most likely
interested in seeing.
• Trend analysis
- Trend analysis identifies the difference between a typical customer this
month and last.