0% found this document useful (0 votes)
13 views

DM PYQ merged

The document outlines examination papers for Data Mining and Business Intelligence at Gujarat Technological University for various semesters in 2018 and 2019. It includes questions on topics such as data mining definitions, data warehousing, clustering, regression, and the application of data mining in business intelligence. Each section contains specific questions requiring explanations, comparisons, and examples related to data mining concepts and techniques.

Uploaded by

khatripriyal1920
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DM PYQ merged

The document outlines examination papers for Data Mining and Business Intelligence at Gujarat Technological University for various semesters in 2018 and 2019. It includes questions on topics such as data mining definitions, data warehousing, clustering, regression, and the application of data mining in business intelligence. Each section contains specific questions requiring explanations, comparisons, and examples related to data mining concepts and techniques.

Uploaded by

khatripriyal1920
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Seat No.: ________ Enrolment No.

___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VII (NEW) EXAMINATION – WINTER 2018
Subject Code: 2170715 Date: 03/12/2018
Subject Name: Data Mining and Business Intelligence
Time: 10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.

Q.1 (a) What is Data Mining? Why is it called data mining rather knowledge 03
mining?

(b) Explain various features of Data Warehouse? 04

(c) Differentiate between Operational Database System and Data Warehouse 07

Q.2 (a) What is the difference between KDD and Data Mining? 03

(b) What is Concept Hierarchy? List and briefly explain types of Concept 04
Hierarchy

(c) Explain Mean, Median, Mode, Variance, Standard Deviation & five number 07
summary with suitable database example.
OR
(c) What is noise? Explain data smoothing methods as noise removal technique 07
to divide given data into bins of size 3 by bin partition (equal frequency), by
bin means, by bin medians and by bin boundaries.

Consider the data: 10, 2, 19, 18, 20, 18, 25, 28, 22

Q.3 (a) Differentiate Fact table vs. Dimension table 03

(b) Suppose that the data for analysis includes the attribute age. 04

The age values for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72

Use min-max normalization to transform the value 45 for age onto the range
[0:0, 1:0]

(c) Explain mining in following Databases with example. 07


1. Temporal Databases
2. Sequence Databases
3. Spatial Databases
4. Spatiotemporal Databases.
OR
Q.3 (a) List and describe methods for handling missing values in data cleaning. 03

(b) Explain the following as attribute selection measure: 04


(i) Information Gain
(ii) Gain Ratio
1
(c) Explain three tier data warehouse Architecture in details. 07

Q.4 (a) How K-Mean clustering method differs from K-Medoid clustering method? 03

(b) Define data cube and explain 3 operations on it. 04

(c) State the Apriori Property. Generate large itemsets and association rules 07
using Apriori algorithm on the following data set with minimum support
value and minimum confidence value set as 50% and 75% respectively

TID Items Purchased


T101 Cheese, Milk, Cookies
T102 Butter, Milk, Bread
T103 Cheese, Butter, Milk, Bread
T104 Butter, Bread

OR

Q.4 (a) Define following terms : 03


Data Mart , Enterprise Warehouse & Virtual Warehouse

(b) Discuss the application of data warehousing and data mining 04

(c) What is web log? Explain web structure mining and web usage mining in 07
detail

Q.5 (a) Draw the topology of a multilayer, feed-forward Neural Network. 03

(b) Explain Linear regression with example. 04

(c) Explain the major issues in data mining 07


OR
Q.5 (a) Briefly explain text mining 03

(b) What is market basket analysis? Explain the two measures of rule 04
interestingness: support and confidence

(c) What is Big Data? What is big data analytic? Explain the big data- 07
distributed file system.

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE – SEMESTER-VII (NEW SYLLABUS) EXAMINATION- SUMMER - 2018

Subject Code: 2170715 Date:08/05/2018


Subject Name: Data Mining and Business Intelligence (Department Elective-II)
Time: 02:30 PM TO 05:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.

MARKS
Q.1 (a) Explain KDD process using figure. 03
(b) Do feature wise comparison between BI and DW. 04
(c) Explain research issues in Data Mining. 07

Q.2 (a) Explain schemas: Stars, snowflakes and fact constellations using 03
figures.
(b) Do feature wise comparison between ROLAP and MOLAP. 04
(c) Enlist the preprocessing steps with example. Explain procedure 07
of any technique of preprocessing.
OR
(c) Explain what is concept description? Explain data 07
generalization, summarization-based characterization using
example.
Q.3 (a) Do feature wise comparison between classification and 03
prediction.
(b) Write a note on incremental Association Rule Mining. 04
(c) Generate frequent itemsets and generate association rules based 07
on it using apriori algorithm. Minimum support is 50% and
minimum confidence is 70%
TID Items
100 1, 3, 4
200 2, 3, 5
300 1, 2, 3, 5
400 2, 5

OR
Q.3 (a) Differentiate between Overfitting and Tree Pruning w.r.to 03
following parameters.
i). definition figure
ii). use in particular situation
iii). limitation
(b) Explain Mining Multiple-Level Association Rules using 04
example.
(c) Generate decision tree using CART algorithm for the following 07
dataset.
Sr. Temperat
no. Outlook ure Humidity Wind Play
1 Sunny hot high FALSE No
2 Sunny hot high TRUE No
3 Overcast hot high FALSE Yes
4 Rain mild high FALSE Yes
5 Rain cool normal FALSE Yes
6 Rain cool normal TRUE No
7 Overcast cool normal TRUE Yes
8 Sunny mild high FALSE No
9 Sunny cool normal FALSE Yes
10 Rain mild normal FALSE Yes
11 Sunny mild normal TRUE Yes
12 Overcast mild high TRUE Yes
13 Overcast hot normal FALSE Yes
14 Rain mild high TRUE No

Q.4 (a) Do feature wise comparison between OLAP and OLTP. 03


(b) Define data cube and explain 3 operations on it. 04
(c) Define linear and nonlinear regression using figures. Calculate 07
the value of Y for X=100 based on Linear regression prediction
method.
X Y
4 390
9 580
10 650
14 730
4 410
7 530
12 600
22 790
1 350
3 400
8 590
11 640
5 450
6 520
10 690
11 690
16 770
13 700
13 730
10 640

OR
Q.4 (a) Explain Spatial mining using example. 03
(b) Calculate the weights using neural network single layer 04
perceptron model. Three inputs are x0, x1, x2, bias and weights
are as follows:
w1(0) = 30 , w2(0) = 300
b(0)= 50 , η=0.01, xo = +1
Activation function is :
sgn(x) = +1, if x>=0
sgn(x) = -1, if x<0
(a)Calculate x2 for x1=100 and & 200.
(b)For bias b(0)= -1230 recalculate the weights w1 and w2.
(c) How data Mining is useful for Business Intelligence applications 07
viz.Balanced Scorecard, Fraud Detection, Clickstream Mining,
Market Segmentation, retail industry, telecommunications
industry, banking & finance and CRM
Q.5 (a) Explain text mining using example. 03
(b) Explain big data and big data analytics. Explain key roles and 04
their responsibilities for successful analytic project.
(c) Calculate 2 clusters using k-means cluster algorithm. For finding 07
the distance use euclidian distance.
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Assume mean1 as subject1 and mean2 as subject4
OR
Q.5 (a) Explain web mining using example. 03
(b) Explain Hadoop architecture using figure. 04
(c) Explain mapreduce. Explain any example using mapreduce. 07
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER– VII (New) EXAMINATION – WINTER 2019
Subject Code: 2170715 Date: 30/11/2019
Subject Name: Data Mining and Business Intelligence
Time: 10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
MARKS
Q.1 (a) Define data mining and list its features. 03
(b) Differentiate between OLTP and OLAP. 04
(c) Describe the steps involved in data mining when viewed as a process of 07
knowledge discovery.
Q.2 (a) Can BI is used for DM? Or vice versa? Justify. 03
(b) as a ras ainprocess
Explain detail of
theknowle
extract/transform/load (ETL) design of an automated 04
warehouse.
(c) Explain Mean, Median, Mode, Variance, Standard Deviation & five number 07
summary with suitable database example.
OR
(c) Is Graphical visualization is better than text data ?Justify your answer and 07
explain different data visualization technique.
Q.3 (a) Explain why data warehouses are needed for developing business solutions 03
from today’s perspective. Discuss the role of data marts.
(b) Draw and Explain Snowflakes and Fact constellations Schema. 04
(c) Define outlier analysis? Why outlier mining is important? Briefly describe 07
the different approaches: statistical-based outlier detection, distance-based
outlier detection and deviation-based outlier detection.
OR
Q.3 (a) Discuss Following: (i) Meta Data (ii) Virtual Warehouse 03
(b) Briefly outline the major steps of decision tree classification. Why tree 04
pruning useful in decision tree induction?
(c) In data pre-processing why we need data smoothing? Discuss data smoothing 07
by Binning.
Q.4 (a) Draw the topology of a multilayer, feed-forward Neural Network. 03
(b) Describe Concept Hierarchy? List and briefly explain types of Concept 04
Hierarchy
(c) In real-world data, tuples with missing values for some attributes are a 07
common occurrence. Describe various methods for handling this problem.

OR
Q.4 (a) Define “clustering”? Mention any two applications of clustering. 03
(b) Briefly explain Linear and Non-linear regression. 04
(c) Consider the following dataset and find frequent item sets and generate 07
association rules for them using Apriori Algorithm.

1
minimum support count is 2 minimum confidence is 60% .
Q.5 (a) Differentiate Fact table vs. Dimension table 03
(b) What is market basket analysis? Explain the two measures of rule 04
interestingness: support and confidence
(c) Briefly explain the life-cycle of Data Analytics and discuss the role of data 07
scientists.
OR
Q.5 (a) Explain text mining using example. 03
(b) Explain data mining application for fraud detection. 04
(c) Discuss the main features of Hadoop Distributed File System. 07

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VII(NEW) EXAMINATION – SUMMER 2019
Subject Code:2170715 Date:18/05/2019
Subject Name:Data Mining and Business Intelligence
Time:02:30 PM TO 05:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.

Q.1 (a) Explain cluster analysis and outlier analysis with example. 03
(b) A data warehouse is a subject-oriented, integrated, time-variant, and 04
nonvolatile collection of data – Justify.
(c) Consider following database of ten transactions. Let min_sup = 30% and
min_confidence = 60%.
TID items bought
T1 pen, pencil A) Find all frequent itemsets using 05
T2 book, eraser, pencil Apriori algorithm.

T3 book, chalk, eraser, pen B) Generate strong association rules. 02


T4 chalk, eraser, pen
T5 book, pen, pencil
T6 book, eraser, pen, pencil
T7 ink, pen
T8 book, pen, pencil
T9 eraser, pen, pencil
T10 book, chalk, pencil

Q.2 (a) Discuss following terms. 03


1) Supervised learning 2) Correlation analysis 3) Tree pruning
(b) What is noise? Explain binning methods for data smoothing. 04
(c) Discuss data warehouse architecture in detail. 07
OR
(c) Write and discuss the algorithm which is used to generate frequent itemsets 07
using an iterative level-wise approach based on candidate generation.

Q.3 (a) Which are the two measures of rule interestingness? Explain with example. 03
(b) Discuss Hash-based technique to improve efficiency of Apriori algorithm. 04
(c) Explain various data normalization techniques. 07
OR
Q.3 (a) Discuss Big Data. 03
(b) Discuss possible ways for integration of a Data Mining system with a Database 04
or DataWarehouse system.
(c) Enlist data reduction strategies and explain any two. 07
1
Q.4 (a) Discuss various layers of multilayer feed-forward neural network with 03
diagram.
(b) What is apex cuboid? Discuss drill down and roll up operation with diagram. 04
(c) Using Naive Bayesian classification method, predict class label of X = (age = 07
youth, income = medium, student = yes, credit_rating = fair) using following
training dataset.

Class:
age income Student credit_rating
buys_computer
youth high no Fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent Yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
OR
Q.4 (a) Explain various conflict resolution strategies in rule based classification. 03
(b) What is classification? Explain classification as a two step process with 04
diagram.
(c) Discuss fraud detection and click-stream analysis using data mining. 07

Q.5 (a) Compare data mart and data warehouse. 03


(b) Discuss star schema and fact constellation schema with diagram. 04
(c) What do you mean by learning-by-observation? Explain k-Means clustering 07
algorithm in detail.
OR
Q.5 (a) Discuss following terms. 03
1) DataNode 2) NameNode 3) Text mining
(b) Discuss attribute subset selection. 04
(c) Compare OLAP and OLTP in detail. 07
*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE- SEMESTER–VII (NEW) EXAMINATION – WINTER 2020
Subject Code:2170715 Date:28/01/2021
Subject Name:Data Mining and Business Intelligence
Time:10:30 AM TO 12:30 PM Total Marks: 56
Instructions:
1. Attempt any FOUR questions out of EIGHT questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
MARKS

Q.1 (a) Differentiate OLAP and OLTP. 03


(b) Define Schema. Explain the following schemas with suitable example. 04
1) Star 2) Snowflakes 3) Constellations
(c) Define noise data. Enlist the reasons for the presence of noise in data 07
collection. Explain the methods to deal with noise.

Q.2 (a) Define market basket analysis. Explain support and confidence with 03
suitable example for finding the rules.
(b) Explain the following terms. 04
1) Bias 2) Variance 3) Generalization 4) Outlier
(c) Define the Apriori Property. Generate candidate itemsets, frequent itemsets 07
and association rules using Apriori algorithm on the following data set with
minimum support count is 2 and minimum confidence is 60%.

TID Items
T1 BREAD, BUTTER, TOAST
T2 BUTTER, MILK
T3 BUTTER, BUSCUIT
T4 BREAD, BUTTER, MILK
T5 BREAD, BUSCUIT
T6 BUTTER, BUSCUIT
T7 BREAD, BUSCUIT
T8 BREAD, BUTTER, BUSCUIT, TOAST
T9 BREAD, BUTTER, BUSCUIT

Q.3 (a) Minimum salary is 20,000/- Rs and Maximum salary is 1,70,000/- Rs. Map 03
the salary 1,00,000/- Rs in new Range of (60,000 , 2,60,000) Rs using min-
max normalization method.
(b) Define time series database. Explain how to characterize time series data 04
using trend analysis.
(c) Define data cube and explain any 3 operations on it. 07

Q.4 (a) If Mean salary is 54,000Rs and standard deviation is 16,000 Rs then find z 03
score value of 73,600 Rs salary.
(b) Briefly explain linear and non-linear regression. 04
(c) Define Map and Reduce operations with suitable example. 07

Q.5 (a) Discuss the following terms: 03


1) Tree Pruning 2) Information Gain 3) Spatiotemporal Databases
(b) Discuss the major issues/challenges in data mining. 04
(c) Enlist the steps of K-Mean clustering algorithm. Explain it with suitable 07
example.
1
Q.6 (a) Discuss the following terms: 03
1) Correlation analysis 2) Gain Ration 3) Sequence Databases
(b) Differentiate classification and prediction. 04
(c) Enlist the steps of ID3 decision tree generation algorithm. Explain it with 07
suitable example and generate the tree.
Q.7 (a) Explain Web structure and Web usage mining. 03
(b) Draw and explain the topology of a multilayer, feed-forward Neural 04
Network.
(c) Define big data and big data analytics. Explain big data distributed file 07
system.

Q.8 (a) Draw and explain Hadoop architecture. 03


(b) Discuss Hash-based technique to improve efficiency of Apriori algorithm. 04
(c) Explain Baye’s Theorm and Naïve Bayesian Classification with suitable 07
example.

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER– VIII EXAMINATION – SUMMER 2020
Subject Code: 2170715 Date:02/11/2020
Subject Name: DATA MINING AND BUSINESS INTELLIGENCE
Time: 10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.

Q.1 (a) Define the following terms: 03


1) Data warehouse.
2) Business Intelligence
3) Metadata in Data warehouse
(b) List the major steps involved in the ETL process. 04
(c) Draw and explain the Data warehouse architecture. 07

Q.2 (a) Briefly discuss the schemas for multidimensional databases. 03


(b) Differentiate between OLTP and OLAP. 04
(c) Explain Various Data Mining Functionalities with an example. 07
OR
(c) Explain the different issues in data mining. 07
Q.3 (a) What is the need for preprocessing the data? 03
(b) How concept hierarchies are useful in data mining? 04
(c) Consider a transactional database where 1, 2, 3, 4, 5, 6, 7 are items. 07

ID ITEMS
T_1 1, 2, 3, 5
T_2 1, 2, 3, 4, 5
T_3 1, 2, 3, 7
T_4 1, 3, 6
T_5 1, 2, 4, 5, 6

Suppose the minimum support is 60%. Find all frequent itemsets using
Apriori algorithm.
OR
Q.3 (a) What is dimensionality reduction? 03
(b) Explain about Data Transformation method with suitable example. 04
(c) Discuss the variations of the Apriori algorithm to improve the efficiency. 07
Q.4 (a) What is meant by multidimensional association rules? 03
(b) Discuss the Information gain as attribute selection measure. 04
(c) Differentiate classification and prediction. State the issues regarding 07
classification and prediction.
OR
Q.4 (a) What is meant by Maximal Frequent Item Set? 03
(b) Discuss the Gain ratio as attribute selection measure. 04

1
(c) Why is naïve Bayesian classification called “naïve”? Briefly outline the major 07
ideas of naïve Bayesian classification.
Q.5 (a) List out the General applications of Clustering. 03
(b) What is Big Data? What is big data analytic? 04
(c) How the data mining will be used in the retail industry? 07
OR
Q.5 (a) Define the web mining. 03
(b) Discuss the main features of Hadoop Distributed File System. 04
(c) How the data mining will be used in the telecommunication industry? 07

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER– VI (NEW) EXAMINATION – WINTER 2021
Subject Code:3160714 Date:02/12/2021
Subject Name:Data Mining
Time:10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
MARKS
Q.1 (a) Justify the importance of data mining. 03
(b) Differentiate OLTP and data warehouse. 04
(c) Briefly discussed steps of KDD process. 07

Q.2 (a) Explain data reduction and dimensionality reduction? 03


(b) What do you mean by correlation analysis? Justify its importance. 04
(c) List common task involved in the data pre-processing. Explain briefly 07
any four tasks of data pre-processing with suitable example.
OR
(c) Define the following: 07
concept description, support, confidence, strong association rules, data
generalization, and unsupervised learning.
Q.3 (a) How the classification is differs from the prediction? Explain phases of 03
classification.
(b) Attribute income have minimum value of 12000 INR and maximum 04
value of 98000 INR. Normalize income value of 73600 INR,
(i) Using min-max normalization in the range of [0,1]
(ii) Using z-score normalization. Take mean value of income as 54000
and standard deviation is 16000.

(c) Using Apriori algorithm, find all frequent itemsets for following 07
transaction data.
( Take min_sup=60% and min_conf=80% )

ID Items
1 {M,O,N,K,E,Y}
2 {D,O,N,K,E,Y
3 {M,A,K,E}
4 {M,U,C,K,Y}
5 {C,O,O,K,I,E}
OR
Q.3 (a) What is the use of proximity measures? Explain any one proximity 03
measures with equation.
(b) Explain Bayesian learning and inference with suitable example. 04
(c) List the accuracy parameters used for the performance evaluation of 07
classification and discuss any five parameters with appropriate
example.
Q.4 (a) Differentiate supervised and unsupervised learning. 03
(b) Explain logistic regression with appropriate example. 04

1
(c) Explain working of decision tree algorithm with suitable example. 07

OR
Q.4 (a) Differentiate agglomerative and divisive methods of clustering. 03

(b) What do you mean by perceptron? Discuss single-layer and multi layer 04
perceptron.
(c) Explain K-means clustering algorithm and prove that outlier adversely 07
affect the performance of algorithm.
Q.5 (a) Give strength and weakness of k-means in comparison of k-medoids 03
algorithm.
(b) What is outlier? Why outlier mining is important? 04
(c) Write about different clustering approaches with their strength and 07
weakness.
OR
Q.5 (a) Briefly explain the spatial data mining and temporal mining. 03

(b) Discuss any four data mining features available in the WEKA. 04

(c) How data mining is useful for web mining. Discuss any four web 07
mining applications.

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI(NEW) EXAMINATION – WINTER 2022
Subject Code:3160714 Date:16-12-2022
Subject Name:Data Mining
Time:02:30 PM TO 05:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
Marks
Q.1 (a) Compare descriptive and predictive data mining. 03
(b) Explain the data mining functionalities. 04
(c) Explain major requirements and challenges in data mining. 07

Q.2 (a) What do you mean by concept hierarchy? 03


(b) Explain the smoothing techniques. 04
(c) What is Data Cleaning? Describe various methods of Data Cleaning. 07
OR
(c) Explain about the different Data Reduction techniques. 07

Q.3 (a) What are the techniques to improve the efficiency of Apriori algorithm? 03
(b) What is an Itemset? What is a Frequent Itemset? 04
(c) For the given data 07

Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips

Find the frequent itemsets and generate association rules on this. Assume
that minimum support threshold (s = 33.33%) and minimum confident
threshold (c = 60%).
OR
Q.3 (a) Describe the different classifications of Association rule mining. 03
(b) What is meant by Reduced Minimum Support? 04
(c) Explain the steps of the “Apriori Algorithm” for mining frequent itemsets 07
with suitable example.

Q.4 (a) What are Bayesian Classifiers? 03


(b) What are the hierarchical methods used in classification? 04
(c) Describe in detail about Rule based Classification. 07

1
OR
Q.4 (a) What is attribute selection measure? 03
(b) What is the difference between supervised and unsupervised learning 04
scheme.
(c) Describe the issues regarding classification and prediction. Write an 07
algorithm for decision tree.

Q.5 (a) List the requirements of clustering in data mining. 03


(b) Differentiate Agglomerative and Divisive Hierarchical Clustering? 04
(c) Write a short note: Web content mining. 07
OR
Q.5 (a) What is meant by hierarchical clustering? 03
(b) Illustrate strength and weakness of k-mean in comparison with k- 04
medoid algorithm.
(c) Write a short note: Web usage mining. 07
*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – SUMMER 2023
Subject Code:3160714 Date:12-07-2023
Subject Name:Data Mining
Time:10:30 AM TO 01:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.

MARKS
Q.1 (a) What is market basket analysis? Precisely explain the meaning of the 03
following association rule:
computer → antivirus_software [support = 60%, confidence = 60%]
(b) In real-world data, tuples with missing values for some attributes are a 04
common occurrence. List and describe various methods for handling this
problem.
(c) With the help of a suitable diagram, describe the steps involved in data 07
mining when viewed as a process of knowledge discovery.

Q.2 (a) Give a short example to show that items in a strong association rule are 03
not always interesting.
(b) Briefly describe how partitioning technique may improve the efficiency 04
of Apriori algorithm.
(c) Discuss how frequent itemsets can be generated using FP-Growth 07
algorithm with the help of the following transactions. Let minimum
support threshold is 2.
Transaction ID Item IDs
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

OR
(c) A database has the following six transactions. 07
Transaction ID Items
T1 HotDogs, Buns, Ketchup
T2 Chips, Coke
T3 Coke, Chips, HotDogs
T4 Ketchup, Chips
T5 Buns, HotDogs
T6 HotDogs, Chips, Coke

1
Find all frequent itemsets and also generate the strong association rules
using Apriori algorithm. Let minimum support threshold is 33.34% and
minimum confidence threshold is 60%.

Q.3 (a) Describe any three primitives for specifying a data mining task. 03
(b) The following table shows the midterm and final exam grades obtained 04
by students in a database course.
x (Midterm exam) y (Final exam)
72 84
50 63
81 77
74 78
94 90
86 75
59 49
83 79
65 77
33 52
88 74
81 90
Use the method of least squares to find an equation for the prediction of
a student’s final exam grade based on the student’s midterm grade in the
course. Predict the final exam grade of a student who received 86 grade
in the midterm exam.
(c) What is noise? Describe the possible reasons for noisy data. Explain the 07
different techniques to remove the noise from data.
OR
Q.3 (a) Discuss outlier analysis as a data mining functionality with the help of 03
an example.
(b) Explain how classification rules are extracted from a decision tree with 04
the help of an example.
(c) Explain in detail - min-max normalization method. Use this method to 07
normalize the following group of data by setting min = 0 and max = 1.
200, 400, 600, 1000

Q.4 (a) Differentiate classification and clustering. 03


(b) Discuss data matrix and dissimilarity matrix with respect to clustering. 04
(c) Apply ID3 classification algorithm on the following data and construct a 07
decision tree. Show all the stepwise calculations clearly.
Class:
age income student credit_rating
buys_computer
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
2
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
OR
Q.4 (a) Discuss cross-validation method for evaluating the accuracy of a 03
classifier.
(b) How k-means clustering method differs from k-medoids clustering 04
method? Discuss major drawbacks of k-means clustering method.
(c) Predict class label of the tuple X = (age = youth, income = medium, 07
student = yes, credit_rating = fair) with the help of Naive Bayesian
classification method and the following data. Show all the stepwise
calculations clearly.
Class:
age income student credit_rating
buys_computer
youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no

Q.5 (a) Discuss web structure mining. 03


(b) Discuss multimedia mining. 04
(c) Suppose that the data mining task is to cluster the following eight points 07
(with (x, y) representing location) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4,
9)
The distance function is Euclidean distance. Suppose initially we assign
A1, B1, and C1 as the center of each cluster, respectively. With the help
of k-means algorithm calculate,
(i) The three cluster centers after the first round execution
(ii) The final three clusters
OR
Q.5 (a) Discuss agglomerative hierarchical clustering method in brief. 03
(b) Explain the any four typical requirements of clustering in data mining. 04
(c) What is web mining? Explain web usage mining in detail. 07

*************

3
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – WINTER 2023
Subject Code:3160714 Date:11-12-2023
Subject Name:Data Mining
Time:02:30 PM TO 05:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.

Marks

Q.1 (a) Define Data Warehouse. State it’s features. 03


(b) Differentiate between OLAP and OLTP. 04
(c) Explain in detail different steps of KDD process. 07

Q.2 (a) Why to preprocess the data in data Mining? 03


(b) Explain Binning method with the help of example. 04
(c) Explain following terms related to Association Rule Mining: 07
Itemset, Support Count, support, and Association rule.

Transaction ID Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
For given example find support & confidence for
{Milk, Chocolate} ⇒ Pepsi.
{Milk, Pepsi} → {Chocolate}
{Chocolate, Pepsi} → {Milk}
OR
(c) Solve the following problem using Apriori algorithm. 07
Find the frequent itemsets and generate association rules on this.
Assume that minimum support threshold (s = 33.33%), minimum
confident threshold (c = 60%), minimum support count=2.

Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips

Q.3 (a) Define the following terms in Data Transformation: 03


i. Smoothing
ii. Normalization

1
iii. Discretization
(b) Differentiate between Classification and Prediction. 04
(c) Explain Decision Tree Classification algorithm with the help of 07
example.
OR
Q.3 (a) Differentiate between supervised learning and unsupervised 03
learning.
(b) What is Regression? Explain Linear Regression in short. 04
(c) Explain Naïve Bayes Classifier with example. 07
Q.4 (a) What do you mean by Tree Pruning? Explain with example. 03
(b) Explain the following as attribute selection measure: 04
(i) Information Gain
(ii) Gain Ratio
(c) What do you mean by learning-by-observation? Explain k-Means 07
clustering algorithm in detail.
OR
Q.4 (a) Define Data Cube. Explain any two operations on it. 03
(b) Differentiate between Partition method and Hierarchical method of 04
Clustering.
(c) What are the requirements of Clustering in Data Mining? 07

Q.5 (a) How K-Mean clustering method differs from K-Medoid clustering 03
method?
(b) Draw and explain the topology of a multilayer, feed-forward Neural 04
Network.
(c) Explain the major issues in data mining. 07
OR
Q.5 (a) Give difference between text mining and web mining. 03
(b) Why Hadoop is important? 04
(c) What is web log? Explain web structure mining and web usage 07
mining in detail.
************

2
Enrolment No./Seat No_____________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – SUMMER 2024
Subject Code: 3160714 Date:22-05-2024
Subject Name: Data Mining
Time: 10:30 AM TO 01:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
Marks
Q.1 (a) Define data mining. Describe three challenges to data mining regarding data 03
mining methodology and user interaction issues.
(b) Explain the steps in knowledge discovery. 04
(c) Explain the various data mining issues. 07

Q.2 (a) What are the smoothing techniques available to remove noise? 03
(b) Discuss normalization in detail. 04
(c) In real-world data, tuples with missing values for some attributes are a 07
common occurrence. Describe various methods for handling this problem.
OR
(c) Discuss data discretization and concept hierarchy generation. 07
Q.3 (a) How are association rules mined from large databases? 03
(b) Give the difference between Boolean association rule and quantitative 04
association rule.
(c) What are the limitations of the apriori approach for mining? Briefly describe 07
the techniques to improve the efficiency of apriori algorithm.
OR
Q.3 (a) Describe two interesting measures for association rules. 03
(b) How Meta rules are useful in constraint based association mining. 04
(c) Write an algorithm for finding frequent item-sets using candidate generation. 07

Q.4 (a) What are the difference between supervised learning and unsupervised 03
learning?
(b) Write down short note on backpropagation. 04
(c) What is information gain? Explain the steps required to generate a decision 07
tree from a training data set.
OR
Q.4 (a) Differentiate between linear regression and nonlinear regression. 03
(b) Explain various methods of evaluating accuracy of classifier. 04
(c) Write a short on: web content mining. 07

Q.5 (a) Explain temporal mining. 03


(b) Differentiate between partitioning and hierarchical methods for clustering. 04
(c) Explain following clustering algorithm in details: 07
1) CLARA
2) BIRCH
OR
Q.5 (a) List out the applications of distributed and parallel data mining. 03
(b) Illustrate strength and weakness of k-mean in comparison with k-medoid 04
algorithm.
(c) Explain the typical requirements of clustering in data mining. 07
*************
1
Enrolment No./Seat No_______________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE- SEMESTER–VI (NEW) EXAMINATION – WINTER 2024
Subject Code:3160714 Date:02-12-2024
Subject Name:Data Mining
Time:02:30 PM TO 05:00 PM Total Marks:70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.

Q.1 (a) Define each of the following data mining functionalities: characterization, 03
discrimination, regression.
(b) How is a data warehouse different from database? 04
(c) Explain KDD process. 07

Q.2 (a) How to handle missing values? Explain. 03


(b) How to handle noisy data? 04
(c) Consider a database, D, consisting of 9 transactions. 07
Suppose min. support count required is 2.
Let minimum confidence required is 70%.
Find out the frequent itemset using Apriori algorithm.

OR
(c) A database has five transactions. Let min sup=60% and min conf =80%. 07

Tid Item brought


T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y}
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I, E}

Find all frequent item sets using Apriori and FP-growth, respectively. Compare the
efficiency of the two mining processes.

Q.3 (a) Explain market basket analysis. 03


(b) Explain Linear regression. 04
(c) Explain decision tree algorithm. 07

1
OR
Q.3 (a) Explain WEKA tool. 03
(b) Explain logistic regression. 04
(c) Explain CART Classification Method. 07
Q.4 (a) Compare classification and Clustering. 03
(b) Which metrics used for evaluating classifier performance? 04
(c) Explain Principal Component Analysis. 07
OR
Q.4 (a) Compare classification and prediction. 03
(b) Explain outlier detection. 04
(c) Explain Backpropagation algorithm. 07
Q.5 (a) Write applications of clustering graph and network data. 03
(b) What is Web log structure? And discuss issues regarding web logs. 04
(c) Explain PAM clustering Algorithm. 07
OR
Q.5 (a) Write similarity measures for clustering graph and network data. 03
(b) Explain Web Structure mining. 04
(c) Write Applications of Distributed and parallel Data Mining. 07

*************

2
Seat No.: ________ Enrolment No.___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE - SEMESTER–VI (NEW) EXAMINATION – SUMMER 2022
Subject Code:3160714 Date:08/06/2022
Subject Name:Data Mining
Time:10:30 AM TO 01:00 PM Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
4. Simple and non-programmable scientific calculators are allowed.
Marks
Q.1 (a) What are the types of data? 03
(b) Compare descriptive and predictive data mining 04
(c) Draw and explain the data mining architecture. 07

Q.2 (a) What is dimensionality reduction? 03


(b) What are the types of concept hierarchies? 04
(c) What is Data Cleaning? Describe various methods of Data Cleaning. 07
OR
(c) Discuss issues to be considered during data integration. 07

Q.3 (a) What is meant by association rule? 03


(b) How is association rules mined from large databases? 04
(c) Explain the various criteria for the classification of frequent pattern mining. 07
OR
Q.3 (a) List two interesting measures for association rules. 03
(b) What is meant by multidimensional association rules? 04
(c) Write short notes on Maximal Frequent Item Set &Closed Frequent Item Set. 07

Q.4 (a) What is an outlier? 03


(b) What is Bayesian theorem? 04
(c) Demonstrate how Bayesian classification helps in predicting class 07
membership probabilities.
OR
Q.4 (a) Differentiate classification and prediction. 03
(b) What is the difference between “supervised” and unsupervised” learning 04
scheme.
(c) Explain the issues regarding the classification and prediction. 07

Q.5 (a) What is temporal mining? 03


(b) Explain web usage mining. 04
(c) Discuss the K-means clustering algorithm using examples. 07
OR
Q.5 (a) What is multimedia mining? 03
(b) Explain web content mining. 04
(c) What do you meant by Clustering? Explain the requirements used in 07
Clustering?
*************

You might also like