0% found this document useful (0 votes)
27 views

Unit - I

Data mining involves extracting useful patterns and knowledge from large amounts of data. It is the process of discovering new patterns from data. The key steps in data mining are data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation. Data mining utilizes algorithms to analyze data and identify patterns. It has applications in business intelligence for areas like prediction, classification, and optimization.

Uploaded by

Dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Unit - I

Data mining involves extracting useful patterns and knowledge from large amounts of data. It is the process of discovering new patterns from data. The key steps in data mining are data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation. Data mining utilizes algorithms to analyze data and identify patterns. It has applications in business intelligence for areas like prediction, classification, and optimization.

Uploaded by

Dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT - I

CA614: DATA MINING & ANALYTICS


INTRODUCTION

 Data mining is one of the most useful techniques that help entrepreneurs, researchers,
and individuals to extract valuable information from huge sets of data.
 Data mining is also called Knowledge Discovery in Database (KDD).
 The knowledge discovery process includes Data cleaning, Data integration, Data
selection, Data transformation, Data mining, Pattern evaluation, and Knowledge
presentation.
WHAT IS DATA MINING?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns
or knowledge from huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis,
data archeology, data dredging, information harvesting, business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing
 (Deductive) expert systems
3
WHAT IS DATA MINING?

 The process of extracting information to identify patterns, trends, and useful data that
would allow the business to take the data-driven decision from huge sets of data is
called Data Mining.
 Data mining is the act of automatically searching for large stores of information to find
trends and patterns that go beyond simple analysis procedures.
 Data mining utilizes complex mathematical algorithms for data segments and evaluates
the probability of future events.
 Data Mining is the mining, or discovery, of new information in terms of patterns or
rules from vast amounts of data.
 To be useful, data mining must be carried out efficiently on large files and databases.
 Data Mining is a process used by
organizations to extract specific data
from huge databases to solve business
problems.
 It primarily turns raw data into useful
information.
DATA MINING IN BUSINESS INTELLIGENCE

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business


Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and
Reporting
Data Preprocessing/Integration, Data Warehouses
DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
6
 Business intelligence (BI) can be described as
"a set of techniques and tools for the acquisition and transformation
of raw data into meaningful and useful information for business analysis
purposes“
GOALS OF DATA MINING

 Prediction: Determine how certain attributes will behave in the future. For example,
how much sales volume a store will generate in a given period.
 Identification: Identify patterns in data. For example, newly wed couples tend to
spend more money buying furniture.
 Classification: Partition data into classes. For example, customers can be classified
into different categories with different behavior in shopping.
 Optimization: Optimize the use of limited resources such as time, space, money or
materials. For example, how to best use advertising to maximize profits (sales).
TYPES OF KNOWLEDGE DISCOVERED DURING DATA MINING
 Association rules: For example, when a male shopper buys a new car, he is likely to
buy a car CD.
 Classification hierarchies: For example, mutual funds may be classified into three
categories: growth, income and stable.
 Sequence patterns: Sequence patterns are temporal associations. For example, if
mortgage interest rate drops, within six months period the sales of houses will
increase by certain percentage.
 Patterns within time series: such as stock price data behavior in time.
 Detection of Similarity, or segmentation: For example, health data may indicate
similarity among subgroups of people.
STEPS IN DATA MINING PROCESS
 Data comes from a variety of sources is integrated into a single data store
called target data
 Data then is pre-processed and transformed into the standard format.
 The data mining algorithms process the data to the output in the form of
patterns or rules.
 Then those patterns and rules are interpreted to new or useful knowledge
or information.
1. Data cleaning
• to remove noise and inconsistent data
Stages to Data Mining
2. Data integration
• where multiple data sources may be combined
3. Data selection
• where data relevant to the analysis task are retrieved from
the database
4. Data transformation
• where data are transformed or consolidated into forms
appropriate for mining by performing summary or
aggregation operations, for instance
5. Data mining
• an essential process where intelligent methods are applied
in order to extract data patterns
6. Pattern evaluation
• to identify the truly interesting patterns representing
knowledge based on some interestingness measures
7. Knowledge presentation
• where visualization and knowledge representation
techniques are used to present the mined knowledge to
DATA MINING ARCHITECTURE
 Knowledge base:
 This is the domain knowledge that is used to guide the search or evaluate the interestingness of
resulting patterns.
 Such knowledge can include concept hierarchies, used to organize attributes or attribute values into
different levels of abstraction.
 Knowledge such as user beliefs, which can be used to assess a pattern’s interestingness based on its
unexpectedness, may also be included.
 Other examples of domain knowledge are additional interestingness constraints or thresholds, and
metadata (e.g., describing data from multiple heterogeneous sources).
 Data mining engine:
 This is essential to the data mining system and ideally consists of a set of functional
modules for tasks such as
 characterization,
 association and correlation analysis,
 classification,
 prediction,
 cluster analysis,
 outlier analysis, and
 evolution analysis.
 Pattern evaluation module:
 This component typically employs interestingness measures and interacts with the
data mining modules so as to focus the search toward interesting patterns.
 It may use interestingness thresholds to filter out discovered patterns.
 Alternatively, the pattern evaluation module may be integrated with the mining
module, depending on the implementation of the data mining method used.
 User interface:
 This module communicates between users and the data mining system
 It is allowing the user to interact with the system by specifying a data mining query or
task
 It is providing information to help focus the search, and performing exploratory data
mining based on the intermediate data mining results.
 In addition, this component allows the user to browse database and data warehouse
schemas or data structures, evaluate mined patterns, and visualize the patterns in
different forms.
MULTI-DIMENSIONAL VIEW OF DATA MINING

 Data to be mined
 Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,
transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs &
social and information networks
 Knowledge to be mined (or: Data mining functions)
 Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis,
etc.
 Descriptive vs. predictive data mining
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized
 Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition, visualization,
high-performance, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining,
Web mining, etc.
DATA MINING: ON WHAT KINDS OF DATA?

 Database-oriented data sets and applications


 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web
DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES

Machine Pattern Statistics


Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance


Technology Computing
MAJOR ISSUES IN DATA MINING (1)

 Efficiency and Scalability


 Efficiency and scalability of data mining algorithms
 Parallel, distributed, stream, and incremental mining methods
 Diversity of data types
 Handling complex types of data
 Mining dynamic, networked, and global data repositories
 Data mining and society
 Social impacts of data mining
 Privacy-preserving data mining
 Invisible data mining
21
MAJOR ISSUES IN DATA MINING (2)

 Mining Methodology
 Mining various and new kinds of knowledge
 Mining knowledge in multi-dimensional space
 Data mining: An interdisciplinary effort
 Boosting the power of discovery in a networked environment
 Handling noise, uncertainty, and incompleteness of data
 Pattern evaluation and pattern- or constraint-guided mining
 User Interaction
 Interactive mining
 Incorporation of background knowledge
 Presentation and visualization of data mining results 22

You might also like