0% found this document useful (0 votes)

9 views

DMDW Imp Ques

Uploaded by

shvetac02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

DMDW Imp Ques

Uploaded by

shvetac02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

DMDW IMP QUES

UNIT 1
2M
1)WHY DATA MINING IS USED
➢ Better Decision-Making: Provides actionable insights for informed choices.
➢ Operational Efficiency: Improves processes and reduces costs.
➢ Risk Management: Detects fraud and predicts potential risks.
➢ Innovation: Generates new ideas and opportunities.

2)DEFINE DATA MINING

Data mining is the process of discovering patterns, correlations, and insights from large datasets
using statistical, mathematical, and computational techniques. It involves analysing data to extract
useful information and transform it into actionable knowledge.

3)GIVE FEW APPLICATIONS OF DATA MINING

➢ Marketing: Customer segmentation and targeted campaigns.
➢ Finance: Fraud detection and credit scoring.
➢ Healthcare: Predicting patient outcomes and optimizing treatments.
➢ Retail: Inventory management and product recommendations.
4)DEFINE BUCKETING
Bucketing is a data pre-processing method used to minimize the effects of small observation errors.

There are 2 methods of dividing data into bins:

• Equal Frequency Binning: bins have an equal frequency.

• Equal Width Binning: bins have equal width with a range of each bin are defined as [min +
w], [min + 2w] …. [min + nw] where w = (max – min) / (no of bins).
5)SOLVE PROBLEMS FOR NORMALIZATION
i)Min-max normalization: to [new_minA, new_maxA]

v − min A
v' = (new _ max A − new _ min A) + new _ min A
max A − min A
Ex. Let income range $12,000 to $98,000 normalized to [0.0, 1.0]. Then $73,000 is mapped to

73,600 − 12,000
(1.0 − 0) + 0 = 0.716
98,000 − 12,000
ii) Normalization by decimal scaling

v
v' =
10 j Where j is the smallest integer such that Max(|ν’|) < 1

iii) Z-score normalization (μ: mean, σ: standard deviation)

v − A
v' =
 A
Eg:
For all values 3-21.1/29.8 = -0.6 Do for all values.
12M
1)EXPLAIN KNOWLEDGE DISCOVERY PROCESS (KDD)

Data cleaning

To remove noise and inconsistent data

Data integration

Where multiple data sources may be combined

Data selection

Where data relevant to the analysis task are retrieved from the database

Data transformation

Where data are transformed and consolidated into forms appropriate for mining by performing
summary or aggregation operations

Data mining

An essential process where intelligent methods are applied to extract data patterns

Pattern evaluation

To identify the truly interesting patterns representing knowledge based on interestingness measures

Knowledge presentation

Where visualization and knowledge representation techniques are used to present mined knowledge
to users
Data to be mined

Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,

transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs
& social and information networks

Knowledge to be mined (or: Data mining functions)

Characterization, discrimination, association, classification, clustering, trend/deviation, outlier

analysis, etc.

Descriptive vs. predictive data mining

Multiple/integrated functions and mining at multiple levels

Techniques utilized

Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition,

visualization, high-performance, etc.

Applications adapted

Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text
mining, Web mining, etc.

2)EXPLAIN SYSTEM ARCHITECTURE OF DATA MINING

Components of data mining systems

Data source

➢ The actual source of data is the Database, data warehouse, World Wide Web (WWW),
text files, and other documents.
➢ We need a huge amount of historical data for data mining to be successful.

Data mining engine

➢ It comprises instruments and software used to obtain insights and knowledge from data
collected from various data sources and stored within the data warehouse.
➢ It contains several modules for operating data mining tasks, including association,
characterization, classification, clustering, prediction, time-series analysis, etc.

Data warehouse server

➢ The database or data warehouse server consists of the original data that is ready to be
processed.
➢ The server is cause for retrieving the relevant data that is based on data mining as per
user request.

Pattern evaluation module

➢ It is primarily responsible for the measure of investigation of the pattern by using a threshold
value. It collaborates with the data mining engine to focus the search on exciting patterns.

Graphical user interface

➢ The graphical user interface (GUI) module communicates between the data mining system
and the user.
➢ This module helps the user to easily and efficiently use the system without knowing the
complexity of the process.
➢ This module cooperates with the data mining system when the user specifies a query or a
task and displays the results.

Knowledge base

➢ It might be helpful to guide the search or evaluate the stake of the result patterns.
➢ The knowledge base may even contain user views and data from user experiences that
might be helpful in the data mining process.
➢ The pattern assessment module regularly interacts with the knowledge base to get
inputs, and also update it.
3)EXPLAIN DATA PREPROCESSING IN DETAIL
Data processing refers to the collection, manipulation, and management of data to extract
meaningful information and insights. It involves various steps to transform raw data into a structured
format that can be analysed and utilized effectively.

Steps or Major Tasks in Data Preprocessing

Data cleaning

Data in the Real World Is Dirty as lots of potentially incorrect data, e.g., instrument faulty, human or
computer error, transmission error

i)incomplete: lacking attribute values, lacking certain attributes of interest, or containing only
aggregate data

e.g., Occupation= “” (missing data)

ii)noisy: containing noise, errors, or outliers

e.g., Salary= “−10” (an error)

Data integration

➢ Combines data from multiple sources into a coherent store (Data Warehouse)
➢ Schema integration: e.g., A. cust-id  B. cust-#

Detecting and resolving data value conflicts

➢ For the same real-world entity, attribute values from different sources are different
➢ Possible reasons: different representations, different scales, e.g., metric vs. British units

Data reduction

➢ Obtains a reduced representation of the data set that is much smaller in volume but yet
produces the same (or almost the same) analytical results
➢ A database/data warehouse may store terabytes of data. Complex data analysis may
take a very long time to run on the complete data set.

Data transformation and data discretization

➢ A function that maps the entire set of values of a given attribute to a new set of replacement
values s.t. each old value can be identified with one of the new values
➢ Methods
➢ Smoothing: Remove noise from data
➢ Normalization: Scaled to fall within a smaller, specified range
➢ min-max normalization
➢ z-score normalization
➢ normalization by decimal scaling

Discretization: Divide the range of a continuous attribute into intervals

4)EXPLAIN SMOOTHING BY BINS WITH AN EXAMPLE
➢ Smoothing by bin means: In smoothing by bin means, each value in a bin is replaced by
the mean value of the bin.
➢ Smoothing by bin median: In this method each bin value is replaced by its bin median
value.
➢ Smoothing by bin boundary: In smoothing by bin boundaries, the minimum and
maximum values in a given bin are identified as the bin boundaries. Each bin value is
then replaced by the closest boundary value.

METHOD

➢ Sort the array of a given data set.

➢ Divides the range into N intervals, each containing the approximately same number of
samples (Equal-depth partitioning).
➢ Store mean/ median/ boundaries in each row.

PROBLEM

Data for price (in dollars): 9, 8, 4, 15, 24, 21, 21, 25, 26, 34, 29, 28

SOLN

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
UNIT 2
2M
1)DEFINE DATA WAREHOUSE
➢ A data warehouse is a collection of data marts representing historical data from different
operations in the company.

➢ It collects the data from multiple heterogeneous data base files (flat, text and etc).

➢ It stores the 5 to 10 years of huge amount of data. This data is stored in a structure
optimized for querying and data analysis as a data warehouse.

2)GIVE THE KEY PROPERTIES OF A DATA WAREHOUSE

➢ Subject Oriented: Data that gives information about a particular subject instead of about a
company’s ongoing operations.
➢ Integrated: Data that is gathered into the data warehouse from a variety of sources and
merged into a coherent whole.
➢ Time-variant: All data in the data warehouse is identified with a particular time period.
➢ Non-volatile: Data is stable in a data warehouse. More data is added but data is never
removed.

3)GIVE THE DATA WAREHOUSE CHARACTERISTICS

➢ It is a database designed for analytical tasks

➢ Its content is periodically updated

➢ It contains current and historical data to provide historical perspective of information.

4)ADVANTAGES OF MULTI-DIMENSIONAL DATA MODEL

➢ A multi-dimensional data model is easy to handle.
➢ It is easy to maintain.
➢ Its performance is better than that of normal databases
➢ The representation of data is better than traditional databases. That is because the
multi-dimensional databases are multi-viewed and carry different types of factors.

5)DISADVANTAGES OF MULTI-DIMENSIONAL DATA MODEL

➢ The multi-dimensional Data Model is slightly complicated in nature and it requires
professionals to recognize and examine the data in the database.
➢ During the work of a Multi-Dimensional Data Model, when the system caches, there is a
great effect on the working of the system.
➢ It is complicated in nature due to which the databases are generally dynamic in design.
6) OLAP OPERATIONS
Drill down

In drill-down operation, the less detailed data is converted into highly detailed data. It can be done
by:

➢ Moving down in the concept hierarchy

➢ Adding a new dimension

Roll up

It is just opposite of the drill-down operation. It performs aggregation on the OLAP cube. It can be
done by:

➢ Climbing up in the concept hierarchy

➢ Reducing the dimensions

Dice

➢ It selects a sub-cube from the OLAP cube by selecting two or more dimensions.

Slice

It selects a single dimension from the OLAP cube which results in a new sub-cube creation.

Pivot

It is also known as rotation operation as it rotates the current view to get a new view of the
representation.

7)DIFFERENTITAE BW OLAP AND OLTP

OLAP (Online analytical processing) OLTP (Online transaction processing)

Consists of historical data from various Databases. Consists only operational current data.

It is subject oriented. Used for Data Mining, It is application oriented. Used for business tasks.
Analytics, Decision making, etc.

The data is used in planning, problem solving and The data is used to perform day to day fundamental
decision making. operations.

Large amount of data is stored typically in TB, PB The size of the data is relatively small as the historical
data is archived. For ex MB, GB

Relatively slow as the amount of data involved is Very Fast as the queries operate on 5% of the data.
large. Queries may take hours.
It only need backup from time to time as compared
Backup and recovery process is maintained religiously
to OLTP.

This data is generally managed by CEO, MD, GM. This data is managed by clerks, managers.

Only read and rarely write operation. Both read and write operations.
12M
1)EXPLAIN THE ARCHITECTURE OR COMPONENTS OF DATA WAREHOUSING

➢ The data warehouse architecture is based on the data base management system server.

➢ The central information repository is surrounded by number of key components

➢ Data warehouse is an environment, not a product which is based on relational database

management system

➢ The data entered into the data warehouse transformed into an integrated structure and
format and transformation process involves conversion, summarization, filtering.

➢ The data warehouse must be capable of holding and managing large volumes of data as well
as different structure of data structures over the time.

Key components

➢ Data sourcing, cleanup, transformation, and migration tools

➢ Metadata repository

➢ Warehouse/database technology

➢ Data marts, Information delivery system

➢ Data query, reporting, analysis, and mining tools

➢ Data warehouse administration and management

Data sourcing, cleanup, transformation, and migration tools

➢ They perform conversions, summarization, key changes, structural changes

➢ The data transformation is required to use by decision support tools.

➢ The transformation produces programs, control statements.

➢ It moves the data into data warehouse from multiple operational systems.

The Functionalities of these tools are listed below:

➢ To remove unwanted data from operational db

➢ Converting to common data names and attributes

➢ Calculating summaries and derived data

➢ Establishing defaults for missing data

➢ Accommodating source data definition changes

Metadata repository

➢ It is data about data. It is used for maintaining, managing and using the data warehouse.

Data ware house database

➢ This is the central part of the data ware housing environment. This is implemented based on
RDBMS technology

Data marts

It is inexpensive tool and alternative to the data ware house. it based on the subject area Data mart
is used in the following situation:

➢ Extremely urgent user requirement

➢ The absence of a budget for a full-scale data warehouse strategy

➢ The decentralization of business needs

Query and reporting tools

Used to generate query and report

➢ Production reporting tool used to generate regular operational reports

➢ Desktop report writer is inexpensive desktop tools designed for end users.

Application development tools

This is a graphical data access environment which integrates OLAP tools with data warehouse and
can be used to access all db systems.

➢ OLAP Tools: Are used to analyze the data in multidimensional and complex views.

➢ Data mining tools: Are used to discover knowledge from the data warehouse data.
2)EXPLAIN HOW DO BUILD A DATA WAREHOUSE
Business factors:
➢ Business users want to make decision quickly and correctly using all available data.
Top – Down Approach It collected enterprise-wide business requirements and decided to
build an enterprise data warehouse with subset data marts.
Bottom-Up Approach The data marts are integrated or combined together to form a data
warehouse.
➢ Developing and integrating data marts as and when the requirements are clear.
➢ The advantage of using the Bottom-Up approach is that they do not require high
initial costs and have a faster implementation time;
Technological factors:
➢ To address the incompatibility of operational data stores
➢ IT infrastructure is changing rapidly. Its capacity is increasing and cost is decreasing
so that building a data warehouse is easy
Design considerations:
➢ In general, a data warehouse data from multiple heterogeneous sources into a query
database this is also one of the reasons why a data warehouse is difficult to build
Data content
➢ The content and structure of the data warehouse are reflected in its data model.
Meta data
➢ It defines the location and contents of data in the warehouse.
➢ Meta data is searchable by users to find definitions or subject areas.
Data distribution
➢ Data volumes continue to grow in nature. Therefore, it becomes necessary to know
how the data should be divided across multiple servers.
➢ The data can be distributed based on the subject area, location (geographical region),
or time (current, month, year)
Hardware platforms
➢ An important consideration when choosing a data warehouse server capacity for
handling the high volumes of data.
➢ It has large data and through put.
➢ The modern server can also support large volumes and large number of flexible GUI
3)EXPLAIN THE THREE-TIER DATA WAREHOUSE ARCHITECTURE

Data Warehouses usually have a three-level (tier) architecture that includes:

Bottom Tier (Data Warehouse Server)

A bottom-tier that consists of the Data Warehouse server, which is almost always an RDBMS. It may
include several specialized data marts and a metadata repository.

Data from operational databases and external sources are extracted using application program
interfaces called a gateway. A gateway is provided by the underlying DBMS and allows customer
programs to generate SQL code to be executed at a server.

Examples of gateways contain ODBC (Open Database Connection) and JDBC (Java Database
Connection).

Top Tier (Front end Tools).

A top-tier that contains front-end tools for displaying results provided by OLAP, as well as additional
tools for data mining of the OLAP-generated data.
Middle Tier (OLAP Server)

A middle-tier which consists of an OLAP server for fast querying of the data warehouse.

(1) A Relational OLAP model, i.e., an extended relational DBMS that maps functions on
multidimensional data to standard relational operations.

(2) A Multidimensional OLAP model, i.e., a particular purpose server that directly implements
multidimensional information and operations.

The metadata repository stores information that defines DW objects. It includes the following
parameters and information for the middle and the top-tier applications:

1. A description of the DW structure, including the warehouse schema, dimension, hierarchies,

data mart locations, and contents, etc.

2. Operational metadata, which usually describes the currency level of the stored data, i.e.,
active, archived or purged, and warehouse monitoring information, i.e., usage statistics,
error reports, audit, etc.

Load Performance

Data warehouses require increase loading of new data periodically basis within narrow time
windows; performance on the load process should be measured in hundreds of millions of rows and
gigabytes per hour and must not artificially constrain the volume of data business.

Load Processing

Many phases must be taken to load new or update data into the data warehouse, including data
conversion, filtering, reformatting, indexing, and metadata update.

Data Quality Management

Fact-based management demands the highest data quality. The warehouse ensures local consistency,
global consistency, and referential integrity despite "dirty" sources and massive database size.

Query Performance

Fact-based management must not be slowed by the performance of the data warehouse RDBMS;
large, complex queries must be complete in seconds, not days.
4)EXPLAIN THE SCHEMAS FOR MULTI-DIMENSIONAL DATA MODEL
Schema is a logical description of the entire database. It includes the name and description of
records of all record types including all associated data-items and aggregates.

Much like a database, a data warehouse also requires to maintain a schema. A database uses
relational model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema.

Star Schema

➢ Each dimension in a star schema is represented with only one-dimension table and this
dimension table contains the set of attributes.

➢ The following diagram shows the sales data of a company with respect to the four
dimensions, namely time, item, branch, and location.

➢ There is a fact table at the center. It contains the keys to each of four dimensions.

➢ The fact table also contains the attributes, namely dollars sold and units sold.

Snowflake Schema

➢ Some dimension tables in the Snowflake schema are normalized and the normalization splits
up the data into additional tables.

➢ The dimensions table in a snowflake schema are normalized. For example, the item
dimension table in star schema is normalized and split into two-dimension tables, namely
item and supplier table.

➢ Now the item dimension table contains the attributes item_key, item_name, type, brand,
and supplier-key.
Fact Constellation Schema

➢ A fact constellation has multiple fact tables. It is also known as galaxy schema.

➢ The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location.

➢ The shipping fact table also contains two measures, namely dollars sold and units sold.

It is also possible to share dimension tables between fact tables. For example, time, item, and
location dimension tables are shared between the sales and shipping fact table.

Complete Download PMP Exam Prep 2023 11th Edition Rita Mulcahy PDF All Chapters
50% (2)
Complete Download PMP Exam Prep 2023 11th Edition Rita Mulcahy PDF All Chapters
40 pages
Open Media Vault
No ratings yet
Open Media Vault
83 pages
Art in The Classroom
No ratings yet
Art in The Classroom
4 pages
Dynamic Presentations: Mark Powell
100% (1)
Dynamic Presentations: Mark Powell
7 pages
Unit-2
No ratings yet
Unit-2
144 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
56 pages
CS-DM MODULE -1
No ratings yet
CS-DM MODULE -1
27 pages
Data Mining
No ratings yet
Data Mining
15 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Week 4 - 5 - Data Preprocessing
No ratings yet
Week 4 - 5 - Data Preprocessing
67 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
DWM_Question_Bank_with_Answers
No ratings yet
DWM_Question_Bank_with_Answers
5 pages
Module 2 Data Mining
No ratings yet
Module 2 Data Mining
49 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
UNIT-1 Introduction: Motivation: Why Data Mining?
No ratings yet
UNIT-1 Introduction: Motivation: Why Data Mining?
86 pages
Down 2
No ratings yet
Down 2
61 pages
Data Mining
No ratings yet
Data Mining
7 pages
dwm q bank
No ratings yet
dwm q bank
16 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
Data Minng
No ratings yet
Data Minng
20 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Software
No ratings yet
Software
93 pages
Data Mining
No ratings yet
Data Mining
27 pages
D-Unit-1 R16
No ratings yet
D-Unit-1 R16
17 pages
DMA_qb_solved
No ratings yet
DMA_qb_solved
42 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Data Mining Questions
No ratings yet
Data Mining Questions
24 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
9 pages
DATA MINING Chapter 1 and 2 Lect Slide
No ratings yet
DATA MINING Chapter 1 and 2 Lect Slide
47 pages
9 MidReview
No ratings yet
9 MidReview
25 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
DWDM UNIT -1-1
No ratings yet
DWDM UNIT -1-1
25 pages
CEUC502 - DMBI_Question_Bank
No ratings yet
CEUC502 - DMBI_Question_Bank
12 pages
Sheet 1 Solution1
No ratings yet
Sheet 1 Solution1
4 pages
Unit 01 DWDM
No ratings yet
Unit 01 DWDM
105 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Unit 2: Big Data Analytics
No ratings yet
Unit 2: Big Data Analytics
45 pages
Data Preprocessing, Data Warehousing
No ratings yet
Data Preprocessing, Data Warehousing
9 pages
Adm Unit - 1
No ratings yet
Adm Unit - 1
62 pages
dwm NOTES
No ratings yet
dwm NOTES
118 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
DataMining S
No ratings yet
DataMining S
103 pages
Datawarehouse & Data Mining
No ratings yet
Datawarehouse & Data Mining
59 pages
Data Mining - GDi Techno Solutions
No ratings yet
Data Mining - GDi Techno Solutions
145 pages
PPT 2
No ratings yet
PPT 2
51 pages
Gujarat Technological University: Subject Name: Elective I - Data Warehousing & Data Mining (DWDM) Subject Code: 640005
No ratings yet
Gujarat Technological University: Subject Name: Elective I - Data Warehousing & Data Mining (DWDM) Subject Code: 640005
5 pages
Data Mining
No ratings yet
Data Mining
40 pages
module 1
No ratings yet
module 1
41 pages
Data Mining
No ratings yet
Data Mining
26 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Review Notes in Ballistics 2 1
100% (1)
Review Notes in Ballistics 2 1
8 pages
Job Interview
No ratings yet
Job Interview
4 pages
TR For SCADA For O Mon Rev-A1
No ratings yet
TR For SCADA For O Mon Rev-A1
84 pages
Retaining Wall Design (Mosley)
No ratings yet
Retaining Wall Design (Mosley)
6 pages
Baby Foods Made Easy
No ratings yet
Baby Foods Made Easy
31 pages
Man and Superman (Shaw, George Bernard)
No ratings yet
Man and Superman (Shaw, George Bernard)
221 pages
Noise Design Methods
No ratings yet
Noise Design Methods
8 pages
Departmental ERP Major PROJECT REPORT
No ratings yet
Departmental ERP Major PROJECT REPORT
55 pages
Prof-Ed 303-Teacher and School Curriculum
No ratings yet
Prof-Ed 303-Teacher and School Curriculum
3 pages
1 PB
No ratings yet
1 PB
8 pages
Ibps Clerk 2022 Mains
No ratings yet
Ibps Clerk 2022 Mains
55 pages
Course Title:: Critical Thinking and Business Ethics
No ratings yet
Course Title:: Critical Thinking and Business Ethics
5 pages
CH5014 Final Exam Samplepaper
No ratings yet
CH5014 Final Exam Samplepaper
2 pages
H2_2023_Report_CERT_ENG
No ratings yet
H2_2023_Report_CERT_ENG
25 pages
Manual de RZR 1000 Turbo XMR
No ratings yet
Manual de RZR 1000 Turbo XMR
474 pages
Why SeaWorld Should Be Shut Down
No ratings yet
Why SeaWorld Should Be Shut Down
11 pages
Hypoglycemic Effect of Dita (Alstonia Scholaris) Leaf Ethanolic Extract On Alloxan-Induced Diabetes in Sprague Dawley Rats
No ratings yet
Hypoglycemic Effect of Dita (Alstonia Scholaris) Leaf Ethanolic Extract On Alloxan-Induced Diabetes in Sprague Dawley Rats
7 pages
SSG4
No ratings yet
SSG4
10 pages
E-Cheque in E-Commerce
No ratings yet
E-Cheque in E-Commerce
22 pages
Applying Virtue Ethics To Business: The Agent-Based Approach
No ratings yet
Applying Virtue Ethics To Business: The Agent-Based Approach
14 pages
Pumpkin - Recipes PDF
No ratings yet
Pumpkin - Recipes PDF
6 pages
Evidence Based Medicine
100% (1)
Evidence Based Medicine
52 pages
Tuning Msms
No ratings yet
Tuning Msms
6 pages
Loewe L2710 Service Manual
No ratings yet
Loewe L2710 Service Manual
83 pages
Elements of Drama and Types of Drama
100% (1)
Elements of Drama and Types of Drama
29 pages
Evaluating Gutter and Downspout Capacity 1: Total Total
No ratings yet
Evaluating Gutter and Downspout Capacity 1: Total Total
3 pages

Uploaded by

Uploaded by

DMDW IMP QUES

2)DEFINE DATA MINING

3)GIVE FEW APPLICATIONS OF DATA MINING

There are 2 methods of dividing data into bins:

• Equal Frequency Binning: bins have an equal frequency.

iii) Z-score normalization (μ: mean, σ: standard deviation)

To remove noise and inconsistent data

Where multiple data sources may be combined

Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,

Knowledge to be mined (or: Data mining functions)

Characterization, discrimination, association, classification, clustering, trend/deviation, outlier

Descriptive vs. predictive data mining

Multiple/integrated functions and mining at multiple levels

Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition,

2)EXPLAIN SYSTEM ARCHITECTURE OF DATA MINING

Data mining engine

Data warehouse server

Pattern evaluation module

Graphical user interface

Steps or Major Tasks in Data Preprocessing

e.g., Occupation= “” (missing data)

ii)noisy: containing noise, errors, or outliers

e.g., Salary= “−10” (an error)

Detecting and resolving data value conflicts

Data transformation and data discretization

Discretization: Divide the range of a continuous attribute into intervals

➢ Sort the array of a given data set.

2)GIVE THE KEY PROPERTIES OF A DATA WAREHOUSE

3)GIVE THE DATA WAREHOUSE CHARACTERISTICS

➢ Its content is periodically updated

➢ It contains current and historical data to provide historical perspective of information.

4)ADVANTAGES OF MULTI-DIMENSIONAL DATA MODEL

5)DISADVANTAGES OF MULTI-DIMENSIONAL DATA MODEL

➢ Moving down in the concept hierarchy

➢ Climbing up in the concept hierarchy

7)DIFFERENTITAE BW OLAP AND OLTP

➢ The central information repository is surrounded by number of key components

➢ Data warehouse is an environment, not a product which is based on relational database

➢ Data sourcing, cleanup, transformation, and migration tools

➢ Data marts, Information delivery system

➢ Data query, reporting, analysis, and mining tools

➢ Data warehouse administration and management

➢ They perform conversions, summarization, key changes, structural changes

➢ The data transformation is required to use by decision support tools.

➢ The transformation produces programs, control statements.

The Functionalities of these tools are listed below:

➢ To remove unwanted data from operational db

➢ Converting to common data names and attributes

➢ Calculating summaries and derived data

➢ Establishing defaults for missing data

➢ Accommodating source data definition changes

Data ware house database

➢ Extremely urgent user requirement

➢ The absence of a budget for a full-scale data warehouse strategy

➢ The decentralization of business needs

Query and reporting tools

Used to generate query and report

➢ Production reporting tool used to generate regular operational reports

Application development tools

Data Warehouses usually have a three-level (tier) architecture that includes:

Bottom Tier (Data Warehouse Server)

Top Tier (Front end Tools).

1. A description of the DW structure, including the warehouse schema, dimension, hierarchies,

Data Quality Management

You might also like