0% found this document useful (0 votes)

27 views

Unit - I

Data mining involves extracting useful patterns and knowledge from large amounts of data. It is the process of discovering new patterns from data. The key steps in data mining are data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation. Data mining utilizes algorithms to analyze data and identify patterns. It has applications in business intelligence for areas like prediction, classification, and optimization.

Uploaded by

Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Unit - I

Uploaded by

Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

UNIT - I

CA614: DATA MINING & ANALYTICS

INTRODUCTION

 Data mining is one of the most useful techniques that help entrepreneurs, researchers,
and individuals to extract valuable information from huge sets of data.
 Data mining is also called Knowledge Discovery in Database (KDD).
 The knowledge discovery process includes Data cleaning, Data integration, Data
selection, Data transformation, Data mining, Pattern evaluation, and Knowledge
presentation.
WHAT IS DATA MINING?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns
or knowledge from huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis,
data archeology, data dredging, information harvesting, business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing
 (Deductive) expert systems
3
WHAT IS DATA MINING?

 The process of extracting information to identify patterns, trends, and useful data that
would allow the business to take the data-driven decision from huge sets of data is
called Data Mining.
 Data mining is the act of automatically searching for large stores of information to find
trends and patterns that go beyond simple analysis procedures.
 Data mining utilizes complex mathematical algorithms for data segments and evaluates
the probability of future events.
 Data Mining is the mining, or discovery, of new information in terms of patterns or
rules from vast amounts of data.
 To be useful, data mining must be carried out efficiently on large files and databases.
 Data Mining is a process used by
organizations to extract specific data
from huge databases to solve business
problems.
 It primarily turns raw data into useful
information.
DATA MINING IN BUSINESS INTELLIGENCE

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and
Reporting
Data Preprocessing/Integration, Data Warehouses
DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
6
 Business intelligence (BI) can be described as
"a set of techniques and tools for the acquisition and transformation
of raw data into meaningful and useful information for business analysis
purposes“
GOALS OF DATA MINING

 Prediction: Determine how certain attributes will behave in the future. For example,
how much sales volume a store will generate in a given period.
 Identification: Identify patterns in data. For example, newly wed couples tend to
spend more money buying furniture.
 Classification: Partition data into classes. For example, customers can be classified
into different categories with different behavior in shopping.
 Optimization: Optimize the use of limited resources such as time, space, money or
materials. For example, how to best use advertising to maximize profits (sales).
TYPES OF KNOWLEDGE DISCOVERED DURING DATA MINING
 Association rules: For example, when a male shopper buys a new car, he is likely to
buy a car CD.
 Classification hierarchies: For example, mutual funds may be classified into three
categories: growth, income and stable.
 Sequence patterns: Sequence patterns are temporal associations. For example, if
mortgage interest rate drops, within six months period the sales of houses will
increase by certain percentage.
 Patterns within time series: such as stock price data behavior in time.
 Detection of Similarity, or segmentation: For example, health data may indicate
similarity among subgroups of people.
STEPS IN DATA MINING PROCESS
 Data comes from a variety of sources is integrated into a single data store
called target data
 Data then is pre-processed and transformed into the standard format.
 The data mining algorithms process the data to the output in the form of
patterns or rules.
 Then those patterns and rules are interpreted to new or useful knowledge
or information.
1. Data cleaning
• to remove noise and inconsistent data
Stages to Data Mining
2. Data integration
• where multiple data sources may be combined
3. Data selection
• where data relevant to the analysis task are retrieved from
the database
4. Data transformation
• where data are transformed or consolidated into forms
appropriate for mining by performing summary or
aggregation operations, for instance
5. Data mining
• an essential process where intelligent methods are applied
in order to extract data patterns
6. Pattern evaluation
• to identify the truly interesting patterns representing
knowledge based on some interestingness measures
7. Knowledge presentation
• where visualization and knowledge representation
techniques are used to present the mined knowledge to
DATA MINING ARCHITECTURE
 Knowledge base:
 This is the domain knowledge that is used to guide the search or evaluate the interestingness of
resulting patterns.
 Such knowledge can include concept hierarchies, used to organize attributes or attribute values into
different levels of abstraction.
 Knowledge such as user beliefs, which can be used to assess a pattern’s interestingness based on its
unexpectedness, may also be included.
 Other examples of domain knowledge are additional interestingness constraints or thresholds, and
metadata (e.g., describing data from multiple heterogeneous sources).
 Data mining engine:
 This is essential to the data mining system and ideally consists of a set of functional
modules for tasks such as
 characterization,
 association and correlation analysis,
 classification,
 prediction,
 cluster analysis,
 outlier analysis, and
 evolution analysis.
 Pattern evaluation module:
 This component typically employs interestingness measures and interacts with the
data mining modules so as to focus the search toward interesting patterns.
 It may use interestingness thresholds to filter out discovered patterns.
 Alternatively, the pattern evaluation module may be integrated with the mining
module, depending on the implementation of the data mining method used.
 User interface:
 This module communicates between users and the data mining system
 It is allowing the user to interact with the system by specifying a data mining query or
task
 It is providing information to help focus the search, and performing exploratory data
mining based on the intermediate data mining results.
 In addition, this component allows the user to browse database and data warehouse
schemas or data structures, evaluate mined patterns, and visualize the patterns in
different forms.
MULTI-DIMENSIONAL VIEW OF DATA MINING

 Data to be mined
 Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,
transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs &
social and information networks
 Knowledge to be mined (or: Data mining functions)
 Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis,
etc.
 Descriptive vs. predictive data mining
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized
 Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition, visualization,
high-performance, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining,
Web mining, etc.
DATA MINING: ON WHAT KINDS OF DATA?

 Database-oriented data sets and applications

 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web
DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES

Machine Pattern Statistics

Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance

Technology Computing
MAJOR ISSUES IN DATA MINING (1)

 Efficiency and Scalability

 Efficiency and scalability of data mining algorithms
 Parallel, distributed, stream, and incremental mining methods
 Diversity of data types
 Handling complex types of data
 Mining dynamic, networked, and global data repositories
 Data mining and society
 Social impacts of data mining
 Privacy-preserving data mining
 Invisible data mining
21
MAJOR ISSUES IN DATA MINING (2)

 Mining Methodology
 Mining various and new kinds of knowledge
 Mining knowledge in multi-dimensional space
 Data mining: An interdisciplinary effort
 Boosting the power of discovery in a networked environment
 Handling noise, uncertainty, and incompleteness of data
 Pattern evaluation and pattern- or constraint-guided mining
 User Interaction
 Interactive mining
 Incorporation of background knowledge
 Presentation and visualization of data mining results 22

Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Introduction
No ratings yet
Introduction
27 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Data Mining
No ratings yet
Data Mining
88 pages
Lecture_01_11jan
No ratings yet
Lecture_01_11jan
29 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
intro data mining
No ratings yet
intro data mining
51 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Data Mining
No ratings yet
Data Mining
27 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
Introduction
No ratings yet
Introduction
46 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
01Intro (2)
No ratings yet
01Intro (2)
45 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
datamining&warehousing
No ratings yet
datamining&warehousing
65 pages
Combine 056
No ratings yet
Combine 056
57 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
DMM-finals
No ratings yet
DMM-finals
30 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Unit 3
No ratings yet
Unit 3
23 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
da257829-b262-4875-aa76-2975d8aeaa2c
No ratings yet
da257829-b262-4875-aa76-2975d8aeaa2c
31 pages
Data Mining
No ratings yet
Data Mining
43 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Course: COMP6140 - Data Mining Effective Period: September 2017
No ratings yet
Course: COMP6140 - Data Mining Effective Period: September 2017
24 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Unit-1
No ratings yet
Unit-1
148 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
01 Intro
No ratings yet
01 Intro
40 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
No ratings yet
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
17 pages
01Intro
No ratings yet
01Intro
41 pages
Data-Mining Notes
No ratings yet
Data-Mining Notes
110 pages
Introduction To Data Mining: Unit 1
No ratings yet
Introduction To Data Mining: Unit 1
28 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
Chap 1
No ratings yet
Chap 1
32 pages
01 Intro
No ratings yet
01 Intro
29 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Unit - III
No ratings yet
Unit - III
27 pages
Unit - II
No ratings yet
Unit - II
56 pages
Linkers & Vocabs
No ratings yet
Linkers & Vocabs
3 pages
Cue Cared
No ratings yet
Cue Cared
13 pages
Borland C++ Builder Unleashed
100% (1)
Borland C++ Builder Unleashed
1,630 pages
Social Media Algorithms
100% (1)
Social Media Algorithms
12 pages
BMRCL Question Paper 2019
No ratings yet
BMRCL Question Paper 2019
9 pages
Iot-Based Earthquake Warning System Development and Evaluation
No ratings yet
Iot-Based Earthquake Warning System Development and Evaluation
6 pages
Bluetooth Based Sensor Network: A Technical Seminar Presentation On
No ratings yet
Bluetooth Based Sensor Network: A Technical Seminar Presentation On
17 pages
Rotational Motor MECHATROLINK III Communications Reference Troubleshooting 3.1
No ratings yet
Rotational Motor MECHATROLINK III Communications Reference Troubleshooting 3.1
35 pages
Job_Description_-__Engineering_NCG_2024-25
No ratings yet
Job_Description_-__Engineering_NCG_2024-25
2 pages
How 2 Write Feasibility Report
No ratings yet
How 2 Write Feasibility Report
4 pages
Digital Product Development 2025
No ratings yet
Digital Product Development 2025
45 pages
Solved Refer To Figure 8.23. Given - H1 6 M - H2 1.5 M - D ...
No ratings yet
Solved Refer To Figure 8.23. Given - H1 6 M - H2 1.5 M - D ...
1 page
Lighting Fundamentals
No ratings yet
Lighting Fundamentals
53 pages
EFI XF Linearization Process
No ratings yet
EFI XF Linearization Process
17 pages
Ee290c Outline Spring 2015
No ratings yet
Ee290c Outline Spring 2015
2 pages
Beginners Guide To Code Algorithms Experiments To Enhance Productivity and Solve Problems by Deepankar Maitra
No ratings yet
Beginners Guide To Code Algorithms Experiments To Enhance Productivity and Solve Problems by Deepankar Maitra
189 pages
Ukrspecsystems PD 1
No ratings yet
Ukrspecsystems PD 1
20 pages
Material Handling Air Cargo Automation: WWW - Als.aero
No ratings yet
Material Handling Air Cargo Automation: WWW - Als.aero
12 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
36 pages
What Is A Shadow DOM
No ratings yet
What Is A Shadow DOM
5 pages
9420 Series: Digital Delay Pulse Generator
No ratings yet
9420 Series: Digital Delay Pulse Generator
2 pages
Horizontal CNC Machine Preventative Maintenance Check List PM Form
No ratings yet
Horizontal CNC Machine Preventative Maintenance Check List PM Form
4 pages
Stylus Pro 4880 Parts List and Diagram
No ratings yet
Stylus Pro 4880 Parts List and Diagram
20 pages
Electric Circuit Problems With Solutions
No ratings yet
Electric Circuit Problems With Solutions
6 pages
FANUC Series 0 / 00: Remote Buffer
No ratings yet
FANUC Series 0 / 00: Remote Buffer
79 pages
Structural Integrity Inspection and Nondestructive Testing FOR Ilf Engineers Nigeria Limited
No ratings yet
Structural Integrity Inspection and Nondestructive Testing FOR Ilf Engineers Nigeria Limited
3 pages
LCD Character Display - UART Interface
No ratings yet
LCD Character Display - UART Interface
4 pages
C3 W4 Practice Challenge
No ratings yet
C3 W4 Practice Challenge
6 pages
The Commissioning Approach 2012 Chemical and Process Plant Commissioning Handbook
No ratings yet
The Commissioning Approach 2012 Chemical and Process Plant Commissioning Handbook
3 pages
AZ 301 StarWar
No ratings yet
AZ 301 StarWar
188 pages
Modality Matches Modality: Pretraining Modality-Disentangled Item Representations For Recommendation
No ratings yet
Modality Matches Modality: Pretraining Modality-Disentangled Item Representations For Recommendation
9 pages
EEP - HV Substation Bus Overcurrent and Differential Protection
No ratings yet
EEP - HV Substation Bus Overcurrent and Differential Protection
24 pages

Uploaded by

Uploaded by

UNIT - I

CA614: DATA MINING & ANALYTICS

 Data mining (knowledge discovery from data)

Data Presentation Business

 Database-oriented data sets and applications

Machine Pattern Statistics

Applications Data Mining Visualization

Algorithm Database High-Performance

 Efficiency and Scalability

You might also like