0% found this document useful (0 votes)
901 views

Question Bank For DMDW

This document contains a question bank for the subject Data Mining and Warehousing. It includes multiple choice questions, short answer questions, and long answer questions covering two units - Unit 1 on data mining concepts and Unit 2 on data warehousing concepts. The multiple choice questions cover topics like data mining tasks, data preprocessing techniques, and OLAP operations. The short answer questions define key terms and ask students to explain concepts. The long answer questions ask students to provide detailed explanations of data preprocessing methods, data mining algorithms, and dimensional modeling in data warehouses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
901 views

Question Bank For DMDW

This document contains a question bank for the subject Data Mining and Warehousing. It includes multiple choice questions, short answer questions, and long answer questions covering two units - Unit 1 on data mining concepts and Unit 2 on data warehousing concepts. The multiple choice questions cover topics like data mining tasks, data preprocessing techniques, and OLAP operations. The short answer questions define key terms and ask students to explain concepts. The long answer questions ask students to provide detailed explanations of data preprocessing methods, data mining algorithms, and dimensional modeling in data warehouses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Department of Computer Science and Engineering

GIET University, Gunupur

Data Mining and Warehousing (R-2018) Question Bank

Subject Code:
Multiple Choice Questions
UNIT-1
1. What is the use of Data Mining?
a. time variant non-volatile collection of data
b. The actual discovery phase of a knowledge
c. The stage of selecting the right data
d. None of these
2. Various operations that are carried on data while processing includes
a. Manipulation
b. Analysis
c. Calculation
d. None of these
3. How Euclidean distance is measured?
a. The process of finding a solution for a problem simply by enumerating all possible solutions
according to some pre-defined order and then testing them
b. The distance between two points as calculated using the Pythagoras theorem
c. A stage of the KDD process in which new data is added to the existing selection
d. None of these
4. Information content is________?
a. Restriction that requires data in one column of a database table to the a subset of
another-column
b. One of the defining aspects of a data warehouse
c. The amount of information with in data as opposed to the amount of redundancy or noise
d. None of these
5. Knowledge Discovery in Databases is referred to___________
a. collection of interesting and useful patterns in a database
b. Set of columns in a database table that can be used to identify each record within this table
uniquely.
c. Non-trivial extraction of implicit previously unknown and potentially useful information from
data
d. None of these
6. Classification and regression are the tasks of_________?
a. Data manipulation
b. Data Analysis
c. Data mining
d. None of these
7. In data preprocessing, Noise is referred as ______________?
a. Random errors in a database table
b. A component of a network
c. One of the defining aspects of a data warehouse
d. None of these
8. Which of the following are the properties of entities?
a. Groups
b. Table
c. Attributes
d. Switchboards
9. In which step of Knowledge Discovery, multiple data sources are combined?
a. Data Cleaning
b. Data Integration
c. Data Selection
d. Data Transformation
10. On which of the following does the critical value for a chi-square statistic rely on?
a. The degrees of freedom
b. The sum of the frequencies
c. The row totals
d. The number of variables

Part - A Short Answer Type Questions


1. Give some examples of data preprocessing techniques. [CO-2, PO-1]
2. List out the preprocessing techniques available in data mining. [CO-2, PO-1]
3. Define data cleaning. [CO-2, PO-1]
4. What are the smoothing techniques available to remove noise? [CO-2, PO-1]
5. Define data transformation. [CO-2, PO-1]
6. Define a concept hierarchy. [CO-2, PO-1]
7. Define data reduction. [CO-2, PO-1]
8. What are the data mining task primitives? [CO-2, PO-1]
9. Define association rule mining. [CO-2, PO-1]
10. Define DMQL. [CO-2, PO-1]
11. What is generalization? [CO-2, PO-1]
12. What is summarization? [CO-2, PO-1]
13. Define discretization. [CO-2, PO-1]
14. Define transactional databases. [CO-2, PO-1]
15. Define relational databases. [CO-2, PO-1]
16. Differentiate the two types of regression. [CO-2, PO-1]

Part - B Long Answer Type Questions


1. Explain various issues related to data cleaning. [CO-2, PO-2]
2. Explain the data preprocessing techniques in detail. [CO-2, PO-2]
3. Explain the smoothing techniques. [CO-2, PO-2]
4. Explain data transformation in detail. [CO-2, PO-2]
5. Discuss normalization in detail. [CO-2, PO-2]
6. Explain in detail about data reduction. [CO-2, PO-2]
7. Explain parametric and non parametric methods of data reduction. [CO-2, PO-2]
8. Discuss data discretization and concept hierarchy generation. [CO-2, PO-2]
9. Explain about generalization and summarization. [CO-2, PO-2]
10. How association rules are mined from databases. [CO-2, PO-2]
11. Explain mining multidimensional data from transactional databases and relational databases.
[CO-2, PO-2]

UNIT-2
Multiple Choice Questions
1. OLAP stands for
a) ​Online analytical processing
b) Online analysis processing
c) Online transaction processing
d) Online aggregate processing
2. Data that can be modeled as dimension attributes and measure attributes are called _______
data.
a)​ Multidimensional
b) Singledimensional
c) Measured
d) Dimensional
3. The generalization of cross-tab which is represented visually is ____________ which is also
called as data cube.
a) ​Two dimensional cube
b) Multidimensional cube
c) N-dimensional cube
d) Cuboid
4. The process of viewing the cross-tab (Single dimensional) with a fixed value of one attribute
is
a) ​Slicing
b) Dicing
c) Pivoting
d) Both Slicing and Dicing
5. The operation of moving from finer-granularity data to a coarser granularity (by means of
aggregation) is called a ________
a) ​Rollup
b) Drill down
c) Dicing
d) Pivoting
6. In SQL the cross-tabs are created using
a) ​Slice
b) Dice
c) Pivot
d) All of the mentioned
8. What do data warehouses support?
a) ​OLAP
b) OLTP
c) OLAP and OLTP
d) Operational databases
9. Business intelligence (BI) is a broad category of application programs which includes
_____________
a) Decision support
b) Data mining
c) OLAP
d) All of the mentioned

10. Which of the following areas are affected by BI?


a) Revenue
b) ​CRM
c) Sales
d) All of the mentioned

Part - A Short Answer Type Questions


1. Define data warehouse. [CO-1, PO-1]
2. What is the need of data warehouses? [CO-1, PO-1]
3. Define multidimensional data model. [CO-1, PO-1]
4. What is a data cube? [CO-1, PO-1]
5. Define dimensions. [CO-1, PO-1]
6. What are facts? [CO-1, PO-1]
7. Define OLTP. [CO-1, PO-1]
8. Define OLAP. [CO-1, PO-1]
9. Define dimension table. [CO-1, PO-1]
10. Define fact table. [CO-1, PO-1]
11. What are lattice of cuboids? [CO-1, PO-1]
12. What is apex of cuboids? [CO-1, PO-1]
13. List out the various OLAP operations. [CO-1, PO-1]
14. Give the names of warehouse schemas. [CO-1, PO-1]
15. Define star schema. [CO-1, PO-1]
16. Define snowflake schema. [CO-1, PO-1]
17. Draw a neat diagram of data warehouse architecture. [CO-1, PO-1]
18. Define data mart. [CO-1, PO-1]
19. Define metadata. [CO-1, PO-1]
20. What are the applications of metadata? [CO-1, PO-1]
21. What are the processes being carried out in backend of data warehouse? [CO-1, PO-1]
22. What are the phases present in development cycle of a data warehouse? [CO-1, PO-1]
23. Give the differences between a database and a data warehouse. [CO-1, PO-1]
24. What is meant by operational environment? [CO-1, PO-1]
25. How a pivot operation acts on the data cube? [CO-1, PO-1]
26. What is generalization? [CO-1, PO-1]​ 17.
27.Define Apriori algorithm. [CO-2, PO-1]
28 What is anti – monotone property. [CO-2, PO-1]
29. Define support and confidence. [CO-2, PO-1]
30. How are association rules mined from large databases? [CO-2, PO-1]
31. Define aggregation. [CO-2, PO-1]
32. What do you mean by numerosity reduction? [CO-2, PO-1]

Part - B Long Answer Type Questions

1. Explain multidimensional data model with a neat diagram. [CO-1, PO-2]


2. List out the OLAP operations and explain the same with an example. [CO-1, PO-2]
3. Describe about dimension modeling in detail. [CO-1, PO-2]
4. Explain the various schemas of a data warehouse. [CO-1, PO-2]
5. Define data warehouse. Draw the architecture of data warehouse and explain the three tiers in
details with a case study? [CO-1, PO-2]
6. Explain in detail about the implementation of a data warehousing for an organization.
[CO-1, PO-2]
7. Define metadata and explain the types of metadata. [CO-1, PO-2]
8. Discuss the development lifecycle of a data warehouse. [CO-1, PO-2]
9. Explain the processes taking place in the backend of a data warehouse. [CO-1, PO-2]
UNIT-3
Multiple Choice Questions

Q1​. What does Apriori algorithm do?


a​. It mines all frequent patterns through pruning rules with lesser support
b. It mines all frequent patterns through pruning rules with higher support
c. Both a and b
d. None of the above
2) What techniques can be used to improve the efficiency of apriori algorithm?
a. Hash-based techniques
b. Transaction Reduction
c. Partitioning
d​. All of the above
3) What do you mean by support(A)?
a. Total number of transactions containing A
b. Total Number of transactions not containing A
c. ​Number of transactions containing A / Total number of transactions
d. Number of transactions not containing A / Total number of transactions
4) How do you calculate Confidence(A -> B)?
a. Support(A B) / Support (A)
b. Support(A B) / Support (B)
c. ​Support(A B) / Support (A)
d. Support(A B) / Support (B)
5) What is association rule mining?
a. Same as frequent itemset mining
b. ​Finding of strong association rules using frequent itemsets
c. Using association to analyse correlation rules
d. None of the above
6) What are tree based classifiers?
a. Classifiers which form a tree with each attribute at one level
b. Classifiers which perform series of condition checking with one attribute at a time
c. ​Both options except none
d. None of the options
7) What is gini index?
a. It is a type of index structure
b. ​It is a measure of purity
c. Both options except none
d. None of the options
8) Which one of these is not a tree based learner?
a. CART
b. ID3
c. ​Bayesian classifier
d. Random Forest
9) Tree/Rule based classification algorithms generate ... rule to perform the classification.
a. ​if-then​.
b. while.
c. do while.
d. switch
10) A _________ is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance event outcomes, resource costs, and utility.
a) ​Decision tree
b) Graphs0
c) Trees
d) Neural Networks

Part - A Short Answer Type Questions


1. What is classification? [CO-3, PO-1]
2. What is an association rule? [CO-3, PO-1]
3. Difference between supervised learning and unsupervised learning. [CO-3, PO-1]
4. What are the goals of time series analysis? [CO-3, PO-1​]
5 What are the steps involved in preparing the data for classification? [CO-3, PO-1]
6. Define the concept of classification. [CO-3, PO-1]
7. What is decision tree? [CO-3, PO-1]
8. What is attribute selection measure? [CO-3, PO-1]
9. List out the tree pruning methods. [CO-3, PO-1]
10. Define pre pruning. [CO-3, PO-1]
11. Define post pruning. [CO-3, PO-1]
12. Define back propagation. [CO-3, PO-1]
13. What are outliers? [CO-3, PO-1]
14. Define the centroid of the cluster. [CO-3, PO-1]
15. What are the hierarchical methods used in classification? [CO-3, PO-1]
16. What are Bayesian classifiers? [CO-3, PO-1]

Part - B Long Answer Type Questions


1. Explain the Back Propagation technique. [CO-3, PO-1]
2. Explain Naïve Bayesian classification in detail with example. [CO-3, PO-1]
3. Explain various classification methods. [CO-3, PO-1]
4. Discuss Classifier accuracy with examples. [CO-3, PO-1]
5. Elaborate the various partitioning methods in detail. [CO-3, PO-1]
6. Explain the hierarchical methods of classifications. [CO-3, PO-1]
7.​ ​Discuss the classification by decision tree induction. [CO-3, PO-1]
8. Dr. No has a patient who is very sick. Without further treatment, this patient will ​[CO-3, PO-2]
die in about 3 months. The only treatment alternative is a risky operation.
The patient is expected to live about 1 year if he survives the operation; however, the probability that
the patient will not survive the operation is 0.3. 1. Draw a decision tree for this simple decision
problem. Show all the probabilities and outcome values.
9.​ Using Apriori algorithm Solve the given problem with ​Support threshold=50%, Confidence=
60%
Transaction List of items

T1 I1,I2,I3

T2 I2,I3,I4

T3 I4,I5

T4 I1,I2,I4

T5 I1,I2,I3,I5

T6 I1,I2,I3,I4

UNIT-4
Multiple Choice Questions

1. Which of the following clustering type has characteristic shown in the below figure?

a) Partitional
b) ​Hierarchical
c) Naive bayes
d) None of the mentioned
2. Point out the correct statement.
a) The choice of an appropriate metric will influence the shape of the clusters
b) Hierarchical clustering is also called HCA
c) In general, the merges and splits are determined in a greedy manner
d) ​All of the mentioned
3. Which of the following is finally produced by Hierarchical Clustering?
a) final estimate of cluster centroids
b​) tree showing how close things are to each other
c) assignment of each point to clusters
d) all of the mentioned
4. Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) ​all of the mentioned
5.Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c​) k-nearest neighbor is same as k-means
d) none of the mentioned
6. Which of the following function is used for k-means clustering?
a) ​k-means
b) k-mean
c) heatmap
d) none of the mentioned
7.Which of the following clustering requires merging approach?
a) Partitional
b) ​Hierarchical
c) Naive Bayes
d) None of the mentioned
8.Which of the following combination is incorrect?
a) Continuous – euclidean distance
b) Continuous – correlation similarity
c) ​Binary – manhattan distance
d) None of the mentioned
Part - A Short Answer Type Questions
Part - A Short Answer Type Questions
1. Define web mining. [CO-4, PO-1]
2. What is a multimedia database? [CO-3, PO-1]
3. Define web content mining. [CO-3, PO-1]
4. Define web structure mining. [CO-3, PO-1]
5. Define web usage mining. [CO-3, PO-1]
6. What is spatial mining? [CO-3, PO-1]
7. What is time series analysis? [CO-3, PO-1]
8. Define sequence mining. [CO-3, PO-1]
9. Define graph mining. [CO-3, PO-1]
10. What are the applications of data mining? [CO-3, PO-1]
11. What are the additional themes in data mining? [CO-3, PO-1]
12. What is page rank? [CO-3, PO-1]
13. Write notes on k-means algorithm. [CO-3, PO-1]
18. List out the density based methods. [CO-3, PO-1]
14. List out the partitioning methods. [CO-3, PO-1]
15. Differentiate Agglomerative and Divisive Hierarchical Clustering? [CO-3, PO-1]

Part - B Long Answer Type Questions


1. Explain the process of mining the World Wide Web. [CO-3, PO-2]
2. Explain about the partitioning methods. [CO-3, PO-1]
3. Discuss about model based clustering methods. [CO-3, PO-1]
4. Explain in detail about outlier analysis. [CO-3, PO-1]
5. Explain the various types of web mining. [CO-3, PO-1]
6. Explain spatial mining and time series mining. [CO-3, PO-1]
7. Discuss about graph mining. [CO-3, PO-1]
8. Discuss about some of the case studies in data mining applications. [CO-3, PO-1]

9. Explain density based clustering methods in detail. [CO-3, PO-1]


10 Discuss about the grid based methods. [CO-3, PO-1]
11. Given are the points A = (1,2), B = (2,2), C = (2, 1), D = (-1, 4), E = (-2, -1), F = (-1,-1) a)
Starting from initial clusters Cluster1 = {A} which contains only the point A and Cluster2 = {D}
which contains only the point D, run the K-means clustering algorithm and report the final
clusters. Use L1 distance as the distance between points which is given by d( (x1, y1), (x2, y2) )
= | x1 – x2 | + | y1 – y2 | ​[CO-3, PO-1]

You might also like