DM PYQ merged
DM PYQ merged
___________
Q.1 (a) What is Data Mining? Why is it called data mining rather knowledge 03
mining?
Q.2 (a) What is the difference between KDD and Data Mining? 03
(b) What is Concept Hierarchy? List and briefly explain types of Concept 04
Hierarchy
(c) Explain Mean, Median, Mode, Variance, Standard Deviation & five number 07
summary with suitable database example.
OR
(c) What is noise? Explain data smoothing methods as noise removal technique 07
to divide given data into bins of size 3 by bin partition (equal frequency), by
bin means, by bin medians and by bin boundaries.
Consider the data: 10, 2, 19, 18, 20, 18, 25, 28, 22
(b) Suppose that the data for analysis includes the attribute age. 04
The age values for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
Use min-max normalization to transform the value 45 for age onto the range
[0:0, 1:0]
Q.4 (a) How K-Mean clustering method differs from K-Medoid clustering method? 03
(c) State the Apriori Property. Generate large itemsets and association rules 07
using Apriori algorithm on the following data set with minimum support
value and minimum confidence value set as 50% and 75% respectively
OR
(c) What is web log? Explain web structure mining and web usage mining in 07
detail
(b) What is market basket analysis? Explain the two measures of rule 04
interestingness: support and confidence
(c) What is Big Data? What is big data analytic? Explain the big data- 07
distributed file system.
*************
2
Seat No.: ________ Enrolment No.___________
MARKS
Q.1 (a) Explain KDD process using figure. 03
(b) Do feature wise comparison between BI and DW. 04
(c) Explain research issues in Data Mining. 07
Q.2 (a) Explain schemas: Stars, snowflakes and fact constellations using 03
figures.
(b) Do feature wise comparison between ROLAP and MOLAP. 04
(c) Enlist the preprocessing steps with example. Explain procedure 07
of any technique of preprocessing.
OR
(c) Explain what is concept description? Explain data 07
generalization, summarization-based characterization using
example.
Q.3 (a) Do feature wise comparison between classification and 03
prediction.
(b) Write a note on incremental Association Rule Mining. 04
(c) Generate frequent itemsets and generate association rules based 07
on it using apriori algorithm. Minimum support is 50% and
minimum confidence is 70%
TID Items
100 1, 3, 4
200 2, 3, 5
300 1, 2, 3, 5
400 2, 5
OR
Q.3 (a) Differentiate between Overfitting and Tree Pruning w.r.to 03
following parameters.
i). definition figure
ii). use in particular situation
iii). limitation
(b) Explain Mining Multiple-Level Association Rules using 04
example.
(c) Generate decision tree using CART algorithm for the following 07
dataset.
Sr. Temperat
no. Outlook ure Humidity Wind Play
1 Sunny hot high FALSE No
2 Sunny hot high TRUE No
3 Overcast hot high FALSE Yes
4 Rain mild high FALSE Yes
5 Rain cool normal FALSE Yes
6 Rain cool normal TRUE No
7 Overcast cool normal TRUE Yes
8 Sunny mild high FALSE No
9 Sunny cool normal FALSE Yes
10 Rain mild normal FALSE Yes
11 Sunny mild normal TRUE Yes
12 Overcast mild high TRUE Yes
13 Overcast hot normal FALSE Yes
14 Rain mild high TRUE No
OR
Q.4 (a) Explain Spatial mining using example. 03
(b) Calculate the weights using neural network single layer 04
perceptron model. Three inputs are x0, x1, x2, bias and weights
are as follows:
w1(0) = 30 , w2(0) = 300
b(0)= 50 , η=0.01, xo = +1
Activation function is :
sgn(x) = +1, if x>=0
sgn(x) = -1, if x<0
(a)Calculate x2 for x1=100 and & 200.
(b)For bias b(0)= -1230 recalculate the weights w1 and w2.
(c) How data Mining is useful for Business Intelligence applications 07
viz.Balanced Scorecard, Fraud Detection, Clickstream Mining,
Market Segmentation, retail industry, telecommunications
industry, banking & finance and CRM
Q.5 (a) Explain text mining using example. 03
(b) Explain big data and big data analytics. Explain key roles and 04
their responsibilities for successful analytic project.
(c) Calculate 2 clusters using k-means cluster algorithm. For finding 07
the distance use euclidian distance.
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Assume mean1 as subject1 and mean2 as subject4
OR
Q.5 (a) Explain web mining using example. 03
(b) Explain Hadoop architecture using figure. 04
(c) Explain mapreduce. Explain any example using mapreduce. 07
Seat No.: ________ Enrolment No.___________
OR
Q.4 (a) Define “clustering”? Mention any two applications of clustering. 03
(b) Briefly explain Linear and Non-linear regression. 04
(c) Consider the following dataset and find frequent item sets and generate 07
association rules for them using Apriori Algorithm.
1
minimum support count is 2 minimum confidence is 60% .
Q.5 (a) Differentiate Fact table vs. Dimension table 03
(b) What is market basket analysis? Explain the two measures of rule 04
interestingness: support and confidence
(c) Briefly explain the life-cycle of Data Analytics and discuss the role of data 07
scientists.
OR
Q.5 (a) Explain text mining using example. 03
(b) Explain data mining application for fraud detection. 04
(c) Discuss the main features of Hadoop Distributed File System. 07
*************
2
Seat No.: ________ Enrolment No.___________
Q.1 (a) Explain cluster analysis and outlier analysis with example. 03
(b) A data warehouse is a subject-oriented, integrated, time-variant, and 04
nonvolatile collection of data – Justify.
(c) Consider following database of ten transactions. Let min_sup = 30% and
min_confidence = 60%.
TID items bought
T1 pen, pencil A) Find all frequent itemsets using 05
T2 book, eraser, pencil Apriori algorithm.
Q.3 (a) Which are the two measures of rule interestingness? Explain with example. 03
(b) Discuss Hash-based technique to improve efficiency of Apriori algorithm. 04
(c) Explain various data normalization techniques. 07
OR
Q.3 (a) Discuss Big Data. 03
(b) Discuss possible ways for integration of a Data Mining system with a Database 04
or DataWarehouse system.
(c) Enlist data reduction strategies and explain any two. 07
1
Q.4 (a) Discuss various layers of multilayer feed-forward neural network with 03
diagram.
(b) What is apex cuboid? Discuss drill down and roll up operation with diagram. 04
(c) Using Naive Bayesian classification method, predict class label of X = (age = 07
youth, income = medium, student = yes, credit_rating = fair) using following
training dataset.
Class:
age income Student credit_rating
buys_computer
youth high no Fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent Yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
OR
Q.4 (a) Explain various conflict resolution strategies in rule based classification. 03
(b) What is classification? Explain classification as a two step process with 04
diagram.
(c) Discuss fraud detection and click-stream analysis using data mining. 07
2
Seat No.: ________ Enrolment No.___________
Q.2 (a) Define market basket analysis. Explain support and confidence with 03
suitable example for finding the rules.
(b) Explain the following terms. 04
1) Bias 2) Variance 3) Generalization 4) Outlier
(c) Define the Apriori Property. Generate candidate itemsets, frequent itemsets 07
and association rules using Apriori algorithm on the following data set with
minimum support count is 2 and minimum confidence is 60%.
TID Items
T1 BREAD, BUTTER, TOAST
T2 BUTTER, MILK
T3 BUTTER, BUSCUIT
T4 BREAD, BUTTER, MILK
T5 BREAD, BUSCUIT
T6 BUTTER, BUSCUIT
T7 BREAD, BUSCUIT
T8 BREAD, BUTTER, BUSCUIT, TOAST
T9 BREAD, BUTTER, BUSCUIT
Q.3 (a) Minimum salary is 20,000/- Rs and Maximum salary is 1,70,000/- Rs. Map 03
the salary 1,00,000/- Rs in new Range of (60,000 , 2,60,000) Rs using min-
max normalization method.
(b) Define time series database. Explain how to characterize time series data 04
using trend analysis.
(c) Define data cube and explain any 3 operations on it. 07
Q.4 (a) If Mean salary is 54,000Rs and standard deviation is 16,000 Rs then find z 03
score value of 73,600 Rs salary.
(b) Briefly explain linear and non-linear regression. 04
(c) Define Map and Reduce operations with suitable example. 07
*************
2
Seat No.: ________ Enrolment No.___________
ID ITEMS
T_1 1, 2, 3, 5
T_2 1, 2, 3, 4, 5
T_3 1, 2, 3, 7
T_4 1, 3, 6
T_5 1, 2, 4, 5, 6
Suppose the minimum support is 60%. Find all frequent itemsets using
Apriori algorithm.
OR
Q.3 (a) What is dimensionality reduction? 03
(b) Explain about Data Transformation method with suitable example. 04
(c) Discuss the variations of the Apriori algorithm to improve the efficiency. 07
Q.4 (a) What is meant by multidimensional association rules? 03
(b) Discuss the Information gain as attribute selection measure. 04
(c) Differentiate classification and prediction. State the issues regarding 07
classification and prediction.
OR
Q.4 (a) What is meant by Maximal Frequent Item Set? 03
(b) Discuss the Gain ratio as attribute selection measure. 04
1
(c) Why is naïve Bayesian classification called “naïve”? Briefly outline the major 07
ideas of naïve Bayesian classification.
Q.5 (a) List out the General applications of Clustering. 03
(b) What is Big Data? What is big data analytic? 04
(c) How the data mining will be used in the retail industry? 07
OR
Q.5 (a) Define the web mining. 03
(b) Discuss the main features of Hadoop Distributed File System. 04
(c) How the data mining will be used in the telecommunication industry? 07
*************
2
Seat No.: ________ Enrolment No.___________
(c) Using Apriori algorithm, find all frequent itemsets for following 07
transaction data.
( Take min_sup=60% and min_conf=80% )
ID Items
1 {M,O,N,K,E,Y}
2 {D,O,N,K,E,Y
3 {M,A,K,E}
4 {M,U,C,K,Y}
5 {C,O,O,K,I,E}
OR
Q.3 (a) What is the use of proximity measures? Explain any one proximity 03
measures with equation.
(b) Explain Bayesian learning and inference with suitable example. 04
(c) List the accuracy parameters used for the performance evaluation of 07
classification and discuss any five parameters with appropriate
example.
Q.4 (a) Differentiate supervised and unsupervised learning. 03
(b) Explain logistic regression with appropriate example. 04
1
(c) Explain working of decision tree algorithm with suitable example. 07
OR
Q.4 (a) Differentiate agglomerative and divisive methods of clustering. 03
(b) What do you mean by perceptron? Discuss single-layer and multi layer 04
perceptron.
(c) Explain K-means clustering algorithm and prove that outlier adversely 07
affect the performance of algorithm.
Q.5 (a) Give strength and weakness of k-means in comparison of k-medoids 03
algorithm.
(b) What is outlier? Why outlier mining is important? 04
(c) Write about different clustering approaches with their strength and 07
weakness.
OR
Q.5 (a) Briefly explain the spatial data mining and temporal mining. 03
(b) Discuss any four data mining features available in the WEKA. 04
(c) How data mining is useful for web mining. Discuss any four web 07
mining applications.
*************
2
Seat No.: ________ Enrolment No.___________
Q.3 (a) What are the techniques to improve the efficiency of Apriori algorithm? 03
(b) What is an Itemset? What is a Frequent Itemset? 04
(c) For the given data 07
Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips
Find the frequent itemsets and generate association rules on this. Assume
that minimum support threshold (s = 33.33%) and minimum confident
threshold (c = 60%).
OR
Q.3 (a) Describe the different classifications of Association rule mining. 03
(b) What is meant by Reduced Minimum Support? 04
(c) Explain the steps of the “Apriori Algorithm” for mining frequent itemsets 07
with suitable example.
1
OR
Q.4 (a) What is attribute selection measure? 03
(b) What is the difference between supervised and unsupervised learning 04
scheme.
(c) Describe the issues regarding classification and prediction. Write an 07
algorithm for decision tree.
2
Seat No.: ________ Enrolment No.___________
MARKS
Q.1 (a) What is market basket analysis? Precisely explain the meaning of the 03
following association rule:
computer → antivirus_software [support = 60%, confidence = 60%]
(b) In real-world data, tuples with missing values for some attributes are a 04
common occurrence. List and describe various methods for handling this
problem.
(c) With the help of a suitable diagram, describe the steps involved in data 07
mining when viewed as a process of knowledge discovery.
Q.2 (a) Give a short example to show that items in a strong association rule are 03
not always interesting.
(b) Briefly describe how partitioning technique may improve the efficiency 04
of Apriori algorithm.
(c) Discuss how frequent itemsets can be generated using FP-Growth 07
algorithm with the help of the following transactions. Let minimum
support threshold is 2.
Transaction ID Item IDs
T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3
OR
(c) A database has the following six transactions. 07
Transaction ID Items
T1 HotDogs, Buns, Ketchup
T2 Chips, Coke
T3 Coke, Chips, HotDogs
T4 Ketchup, Chips
T5 Buns, HotDogs
T6 HotDogs, Chips, Coke
1
Find all frequent itemsets and also generate the strong association rules
using Apriori algorithm. Let minimum support threshold is 33.34% and
minimum confidence threshold is 60%.
Q.3 (a) Describe any three primitives for specifying a data mining task. 03
(b) The following table shows the midterm and final exam grades obtained 04
by students in a database course.
x (Midterm exam) y (Final exam)
72 84
50 63
81 77
74 78
94 90
86 75
59 49
83 79
65 77
33 52
88 74
81 90
Use the method of least squares to find an equation for the prediction of
a student’s final exam grade based on the student’s midterm grade in the
course. Predict the final exam grade of a student who received 86 grade
in the midterm exam.
(c) What is noise? Describe the possible reasons for noisy data. Explain the 07
different techniques to remove the noise from data.
OR
Q.3 (a) Discuss outlier analysis as a data mining functionality with the help of 03
an example.
(b) Explain how classification rules are extracted from a decision tree with 04
the help of an example.
(c) Explain in detail - min-max normalization method. Use this method to 07
normalize the following group of data by setting min = 0 and max = 1.
200, 400, 600, 1000
*************
3
Seat No.: ________ Enrolment No.___________
Marks
Transaction ID Items
1 Bread, Milk
2 Bread, Chocolate, Pepsi, Eggs
3 Milk, Chocolate, Pepsi, Coke
4 Bread, Milk, Chocolate, Pepsi
5 Bread, Milk, Chocolate, Coke
For given example find support & confidence for
{Milk, Chocolate} ⇒ Pepsi.
{Milk, Pepsi} → {Chocolate}
{Chocolate, Pepsi} → {Milk}
OR
(c) Solve the following problem using Apriori algorithm. 07
Find the frequent itemsets and generate association rules on this.
Assume that minimum support threshold (s = 33.33%), minimum
confident threshold (c = 60%), minimum support count=2.
Transaction ID Items
T1 Hot Dogs, Buns, Ketchup
T2 Hot Dogs, Buns
T3 Hot Dogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips
1
iii. Discretization
(b) Differentiate between Classification and Prediction. 04
(c) Explain Decision Tree Classification algorithm with the help of 07
example.
OR
Q.3 (a) Differentiate between supervised learning and unsupervised 03
learning.
(b) What is Regression? Explain Linear Regression in short. 04
(c) Explain Naïve Bayes Classifier with example. 07
Q.4 (a) What do you mean by Tree Pruning? Explain with example. 03
(b) Explain the following as attribute selection measure: 04
(i) Information Gain
(ii) Gain Ratio
(c) What do you mean by learning-by-observation? Explain k-Means 07
clustering algorithm in detail.
OR
Q.4 (a) Define Data Cube. Explain any two operations on it. 03
(b) Differentiate between Partition method and Hierarchical method of 04
Clustering.
(c) What are the requirements of Clustering in Data Mining? 07
Q.5 (a) How K-Mean clustering method differs from K-Medoid clustering 03
method?
(b) Draw and explain the topology of a multilayer, feed-forward Neural 04
Network.
(c) Explain the major issues in data mining. 07
OR
Q.5 (a) Give difference between text mining and web mining. 03
(b) Why Hadoop is important? 04
(c) What is web log? Explain web structure mining and web usage 07
mining in detail.
************
2
Enrolment No./Seat No_____________
Q.2 (a) What are the smoothing techniques available to remove noise? 03
(b) Discuss normalization in detail. 04
(c) In real-world data, tuples with missing values for some attributes are a 07
common occurrence. Describe various methods for handling this problem.
OR
(c) Discuss data discretization and concept hierarchy generation. 07
Q.3 (a) How are association rules mined from large databases? 03
(b) Give the difference between Boolean association rule and quantitative 04
association rule.
(c) What are the limitations of the apriori approach for mining? Briefly describe 07
the techniques to improve the efficiency of apriori algorithm.
OR
Q.3 (a) Describe two interesting measures for association rules. 03
(b) How Meta rules are useful in constraint based association mining. 04
(c) Write an algorithm for finding frequent item-sets using candidate generation. 07
Q.4 (a) What are the difference between supervised learning and unsupervised 03
learning?
(b) Write down short note on backpropagation. 04
(c) What is information gain? Explain the steps required to generate a decision 07
tree from a training data set.
OR
Q.4 (a) Differentiate between linear regression and nonlinear regression. 03
(b) Explain various methods of evaluating accuracy of classifier. 04
(c) Write a short on: web content mining. 07
Q.1 (a) Define each of the following data mining functionalities: characterization, 03
discrimination, regression.
(b) How is a data warehouse different from database? 04
(c) Explain KDD process. 07
OR
(c) A database has five transactions. Let min sup=60% and min conf =80%. 07
Find all frequent item sets using Apriori and FP-growth, respectively. Compare the
efficiency of the two mining processes.
1
OR
Q.3 (a) Explain WEKA tool. 03
(b) Explain logistic regression. 04
(c) Explain CART Classification Method. 07
Q.4 (a) Compare classification and Clustering. 03
(b) Which metrics used for evaluating classifier performance? 04
(c) Explain Principal Component Analysis. 07
OR
Q.4 (a) Compare classification and prediction. 03
(b) Explain outlier detection. 04
(c) Explain Backpropagation algorithm. 07
Q.5 (a) Write applications of clustering graph and network data. 03
(b) What is Web log structure? And discuss issues regarding web logs. 04
(c) Explain PAM clustering Algorithm. 07
OR
Q.5 (a) Write similarity measures for clustering graph and network data. 03
(b) Explain Web Structure mining. 04
(c) Write Applications of Distributed and parallel Data Mining. 07
*************
2
Seat No.: ________ Enrolment No.___________