0% found this document useful (0 votes)
128 views

DWM Lab Manual

Lab manual

Uploaded by

Pranav KhamitkaR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

DWM Lab Manual

Lab manual

Uploaded by

Pranav KhamitkaR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Vishwaniketan’s

Institute of Management Entrepreneurship & Engineering Technology


[ViMEET]
Department of Computer Science & Engineering (AI &ML)

Data Warehousing & Mining Lab

LIST OF EXPERIMENTS

Sr.No. Name of the Experiment


Construct Data Warehouse in SQL Server 2008
01
Construct Star schema and Snowflake Schema
02
Implementation of OLAP cube using analysis services project in SQL 2008.
03
Implement Classification via decision tree using WEKA tool.
04
Implemention of Navie Bayesian Classifier in Java/Python
05
Implemention of K-Means Clustering algorithm in Java/Python
06
Implement Hierarchical Clustering using WEKA tool.
07
Perform Association Rule Mining using WEKA tool.
08

09 Implementation of page rank algorithm.

10 Implementation of HITS algorithm.

ViMEET CSE(AI&ML) Page 1


EXPERIMENT NO. 01

AIM:- Construct data warehouse in SQL Server 2008

Problem Statement : Create a warehouse in SQL Server 2008 & import various databases
from external sources such as Access /Excel /Text file by using Data Transformation Services
(DTS) tool.

Theory and concept-


The Data Transformation Services (DTS) import/export wizard offers the simplest
method of building a DTS package interactively guiding you through the process of coping
and transferring data. Following are the basic steps for creating a package with the DTS
import/export wizard.

Data Warehouse-
A data warehouse is a relational database that is designed for query and analysis
rather than for transaction processing. It usually contains historical data derived from
transaction data, but it can include data from other sources. It separates analysis workload
from transaction workload and enables an organization to consolidate data from several
sources.

In addition to a relational database, a data warehouse environment includes an extraction,


transportation, transformation, and loading (ETL) solution, an online analytical processing
(OLAP) engine, client analysis tools, and other applications that manage the process of
gathering data and delivering it to business users.

1]Subject Oriented:

Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.

2]Integrated:

Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.

ViMEET CSE(AI&ML) Page 2


3]Nonvolatile:

Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.

4]Time Variant:

In order to discover trends in business, analysts need large amounts of data. This is very much
in contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant.

Output:

ViMEET CSE(AI&ML) Page 3


ViMEET CSE(AI&ML) Page 4
ViMEET CSE(AI&ML) Page 5
EXPERIMENT NO. 02

Title: Implementation of Star Schema Snowflake Schema

Problem Statement:

The Mumbai university wants to design star schema to record grade for course completed by
the student. There are four dimensional table namely Course section, student, professor, lecture
with the attributes as follows
Coursesection=Course_id,Sec_No,Course_Name,Units,Room_id,Room_capacity
Professor=Prof_id,Prof_Name,Dept_id,Dept_NameStudent=Stud_id,Stud_Name,Address,Co
n_No Lecture=Sem_Id,year,Class

Theory and Concept:

Schema is a logical description of the entire database. It includes the name and description of
records of all record types including all associated data-items and aggregates.
A fact table works with dimension tables. A fact table holds the data to be analyzed, and a
dimension table stores data about the ways in which the data in the fact table can be analyzed.
Thus, the fact table consists of two types of columns. The foreign keys column allows joins
with dimension tables, and the measures columns contain the data that is being analyzed.

Star Schema

 Each dimension in a star schema is represented with only one-dimension table.


 This dimension table contains the set of attributes.
 The following diagram shows the sales data of a company with respect to the four
dimensions, namely time, item, branch, and location.
 There is a fact table at the center. It contains the keys to each of four dimensions.
 The fact table also contains the attributes, namely dollars sold and units sold.

ViMEET CSE(AI&ML) Page 6


Output:

ViMEET CSE(AI&ML) Page 7


Snowflake Schema

 Some dimension tables in the Snowflake schema are normalized.


 The normalization splits up the data into additional tables.
 Unlike Star schema, the dimensions table in a snowflake schema are normalized. For
example, the item dimension table in star schema is normalized and split into two
dimension tables, namely item and supplier table.
 Now the item dimension table contains the attributes item_key, item_name, type,
brand, and supplier-key.
 The supplier key is linked to the supplier dimension table. The supplier dimension table
contains the attributes supplier_key and supplier_type.

Output:

ViMEET CSE(AI&ML) Page 8


ViMEET CSE(AI&ML) Page 9
EXPERIMENT NO. 03

Title: Implementation of OLAP cube using analysis services project in SQL 2008.

Problem Statement: All car manufacturing company have sales department. Consider
dimension tables and fact table. Create OLAP for car manufacturing company.

Theory and Concept:

Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It
allows managers, and analysts to get an insight of the information through fast, consistent, and
interactive access to information. This chapter covers the types of OLAP, operations on OLAP,
difference between OLAP, and statistical databases and OLTP

Types of OLAP Servers

We have four types of OLAP servers:

 Relational OLAP (ROLAP)


 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers

Relational OLAP

OLAP servers are placed between relational back-end server and client front-end tools.
To store and manage warehouse data, ROLAP uses relational or extended-relational
DBMS.

ROLAP includes the following:

 Implementation of aggregation navigation logic.


 Optimization for each DBMS back end.
 Additional tools and services.

ViMEET CSE(AI&ML) Page 10


Multidimensional OLAP

MOLAP uses array-based multidimensional storage engines for multidimensional views of


data. With multidimensional data stores, the storage utilization may be low if the data set is
sparse. Therefore, many MOLAP server use two levels of data storage representation to handle
dense and sparse data sets.

Hybrid OLAP (HOLAP)

Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.

Output:

ViMEET CSE(AI&ML) Page 11


ViMEET CSE(AI&ML) Page 12
EXPERIMENT NO. 4

Aim: Classification via decision tree using WEKA tool.

Theory and Concept:


Classification models predict categorical class labels; and prediction models predict
continuous valued functions. For example, we can build a classification model to categorize
bank loan applications as either safe or risky, or a prediction model to predict the expenditures
in dollars of potential customers on computer equipment given their income and occupation.

Fig 1. Data set Screenshot

ViMEET CSE(AI&ML) Page 13


Fig 2. Data Preprocess

Selecting Classifier

At the top of the classify section is the Classifier box. This box has a text field that gives
the name of the currently selected classifier, and its options. Clicking on the text box with
the left mouse button brings up a Generic Object Editor dialog box, just the same as for
filters, that you can use to configure the options of the current classifier. With a right click
(orAlt+Shift+leftclick) you can once again copy the setup string to the clipboard or display
the properties in a Generic Object Editor dialog box. The Choose button allows you to
choose one of the classifiers that are available in WEKA.

ViMEET CSE(AI&ML) Page 14


Fig 3. Classifier Selection Screenshot

Test Options

The result of applying the chosen classifier will be tested according to the options that are
set by clicking in the Test options box. There are fourtest modes:

1. Use training set. The classifier is evaluated on how well it predicts the class of
the instances it was trained on.

2. Supplied test set. The classifier is evaluated on how well it predicts the class of
asset if instances loaded from a file. Clicking the Set...button bring supadialog
allowing you to choose the file to test on.

3. Cross- validation. The classifier is valuated bycross-validation, using the


number off olds that are entered in the Folds text field.
4. Percentage split. The classifierise valuated on how well it predicts a certain
percentage of the data which is held out for testing. The amount of data held out
depends on the valueentered in the %field.

Classifier Evaluation:

Fig 4. Classification

ViMEET CSE(AI&ML) Page 15


Fig 5. Decision Tree

Classifier Rules:

Conclusion: Thus we have implemented Classification via decision tree using


WEKA tool.

ViMEET CSE(AI&ML) Page 16


EXPERIMENT NO. 5

Aim: Implemention of Navie Bayesian Classifier in Java/Python

Theory and Concept:


Classification models predict categorical class labels; and prediction models predict
continuous valued functions. For example, we can build a classification model to categorize
bank loan applications as either safe or risky, or a prediction model to predict the expenditures
in dollars of potential customers on computer equipment given their income and occupation.
Navie Bayesian Classifier…….

Algorithmic steps for Navie Bayesian Classifier:

Program:

Output:

Conclusion: Thus we have implemented Navie Bayesian Classifier

ViMEET CSE(AI&ML) Page 17

You might also like