1. ________of data means that the attributes within a given entity are fully dependent on the
entire primary key of the entity.
Correct : C. Functional dependency
2. A fact is said to be fully additive if_________.
Correct : A. It is additive over every dimension of its dimensionality
3. A fact is said to be partially additive if_______.
Correct : B. Additive over at least one but not all of the dimensions
4. A fact is said to be non-additive if_______.
Correct : C. Not additive over any dimension
5. Non-additive measures can often combined with additive measures to create new_________.
Correct : A. Additive measures
6. A fact representing cumulative sales units over a day at a store for a product is a_________.
Correct : B. Fully additive fact
7. Which of the following is the other name of Data mining?
Correct : D. All of the above
8. Which of the following is a predictive model?
Correct : B. Regression
9. Which of the following is a descriptive model?
Correct : C. Sequence discovery
10. A_________model identifies patterns or relationships.
Correct : A. Descriptive
11. A predictive model makes use of______.
Correct : B. Historical data.
12. ______ maps data into predefined groups.
Correct : D. Classification
13. _____ is used to map a data item to a real valued prediction variable.
Correct : B. Time series analysis
14. In _____ , the value of an attribute is examined as it varies over time.
Correct : B. Time series analysis
15. In ______ the groups are not predefined.
Correct : C. Clustering
16. Link Analysis is otherwise called as ____.
Correct : C. Both A & B
17. ______ is a the input to KDD.
Correct : A. Data
18. The output of KDD is ______.
Correct : D. Useful information
19. The KDD process consists of ____steps.
Correct : C. Five
20. Treating incorrect or missing data is called as________.
Correct : B. Preprocessing
21. Converting data from different sources into a common format for processing is called as____ .
Correct : C. Transformation
22. Various visualization techniques are used in_________step of KDD.
Correct : D. Interpretation
23. Extreme values that occur infrequently are called as___________.
Correct : A. Outliers
24. Box plot and scatter diagram techniques are_________.
Correct : B. Geometri
25. _____ is used to proceed from very specific knowledge to more general information.
Correct : A. Induction
26. Describing some characteristics of a set of data by a general model is viewed as___________.
Correct : B. Compression
27. ______ helps to uncover hidden information about the data.
Correct : C. Approximation
28. ______ are needed to identify training data and desired results.
Correct : C. Users
29. Over fitting occurs when a model_________.
Correct : B. Does not fit in future states
30. The problem of dimensionality curse involves___________.
Correct : D. All of the above
31. Incorrect or invalid data is known as _______.
Correct : B. Noisy data
32. ROI is an acronym of _______.
Correct : A. Return on Investment
33. The ______of data could result in the disclosure of information that is deemed to be
confidential.
Correct : B. Unauthorized use
34. _________data are noisy and have many missing attribute values.
Correct : D. D Tr
35. The rise of DBMS occurred in early _______.
Correct : C. 1970's
36. SQL stand for_________.
Correct : B. Structured Query Language
37. Which of the following is not a data mining metric?
Correct : D. All of the above
38. Reducing the number of attributes to solve the high dimensionality problem is called
as_____________.
Correct : B. Dimensionality reduction
39. Data that are not of interest to the data mining task is called as _____.
Correct : C. Irrelevant data
40. _________are effective tools to attack the scalability problem.
Correct : C. Both A & B
41. Market-basket problem was formulated by____________.
Correct : A. Agrawal et al
42. Data mining helps in________.
Correct : D. All of the above
43. The proportion of transaction supporting X in T is called_____________.
Correct : B. Support
44. The absolute number of transactions supporting X in T is called _______.
Correct : C. Support count
45. The value that says that transactions in D that support X also support Y is called__________.
Correct : A. Confidence
46. If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain jam, 10000 transaction contain both bread and jam. Then the support of bread and jam is_________.
Correct : A. 2%
47. 7 If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain jam, 10000 transaction contain both bread and jam. Then the confidence of buying bread with jam is____________.
Correct : D. 50%
48. The left hand side of an association rule is called________.
Correct : C. Antecedent
49. The right hand side of an association rule is called__________.
Correct : A. Consequent
50. Which of the following is not a desirable feature of any efficient algorithm?
Correct : D. To have maximal code length
51. All set of items whose support is greater than the user-specified minimum support are called
as_____________
Correct : B. Frequent set
52. If a set is a frequent set and no superset of this set is a frequent set, then it is
called____________
Correct : A. Maximal frequent set
53. Any subset of a frequent set is a frequent set. This is_________
Correct : B. Downward closure property
54. Any superset of an infrequent set is an infrequent set. This is ___________
Correct : C. Upward closure property
55. If an itemset is not a frequent set and no superset of this is a frequent set, then it is
Correct : B. Border set
56. A priori algorithm is otherwise called as_________
Correct : B. Level-wise algorithm
57. The A Priori algorithm is a____________
Correct : D. Bottom-up search
58. The first phase of A Priori algorithm is___________
Correct : A. Candidate generation
59. The second phase of A Priori algorithm is____________
Correct : C. Pruning
60. The step eliminates the extensions of (k-1)-itemsets which are not found to be frequent, from being considered for counting support.
Correct : B. Pruning
61. The a priori frequent itemset discovery algorithm moves in the lattice
Correct : A. Upward
62. After the pruning of a priori algorithm,__________will remain
Correct : B. No candidate set
63. The number of iterations in a priori
Correct : A. Increases with the size of the maximum frequent set
64. MFCS is the acronym of____________
Correct : C. Maximal Frequent Candidate Set
65. Dynamic Itemset Counting Algorithm was proposed by
Correct : A. Bin et al
66. Itemsets in the category of structures have a counter and the stop number with them
Correct : A. Dashed
67. The itemsets in the_________category structures are not subjected to any counting
Correct : C. Soli
68. Certain itemsets in the dashed circle whose support count reach support value
during an iteration move into the______________
Correct : A. Dashed box
69. Certain itemsets enter afresh into the system and get into the , which are essentially the supersets of the itemsets that move from the dashed circle to the dashed box
Correct : D. Dashed circle
70. The item sets that have completed on full pass move from dashed circle to________
Correct : B. Solid circle
71. The FP-growth algorithm has phases
Correct : B. Two
72. A frequent pattern tree is a tree structure consisting of ________
Correct : D. Both A & B
73. The non-root node of item-prefix-tree consists of fields
Correct : B. Three
74. The frequent-item-header-table consists of fields
Correct : B. Two.
75. The paths from root node to the nodes labelled 'a' are called_________
Correct : D. Prefix subpath
76. The transformed prefix paths of a node 'a' form a truncated database of pattern which cooccur with a is called________
Correct : C. Conditional pattern base
77. The goal of________is to discover both the dense and sparse regions of a data set
Correct : C. Clustering
78. Which of the following is a clustering algorithm?
Correct : B. CLARA
79. clustering technique start with as many clusters as there are records, with each
cluster having only one record
Correct : A. Agglomerative
80. clustering techniques starts with all records in one cluster and then try to split that
Correct : B. Divisive.
81. Which of the following is a data set in the popular UCI machine-learning repository?
Correct : D. MUSHROOM
82. In algorithm each cluster is represented by the center of gravity of the cluster
Correct : B. K-means
83. In each cluster is represented by one of the objects of the cluster located near the center
Correct : A. K-medoid
84. Pick out a k-medoid algorithm
Correct : C. PAM
85. Pick out a hierarchical clustering algorithm
Correct : D. BIRCH
86. CLARANS stands for
Correct : C. Clustering Large Applications based on Randomized Search
87. BIRCH is a________
Correct : C. Hierarchical-agglomerative algorithm
88. The cluster features of different subclusters are maintained in a tree called_________
Correct : A. CF tree
89. The_______algorithm is based on the observation that the frequent sets are normally very
few in number compared to the set of all itemsets
Correct : D. Partition
90. The partition algorithm uses scans of the databases to discover all frequent sets
Correct : A. Two
91. The basic idea of the Apriori algorithm is to generate_____item sets of a particular size &
scans the database
Correct : A. Candidate
92. is the most well-known association rule algorithm and is used in most commercial
products
Correct : A. Apriori algorithm
93. An algorithm called________is used to generate the candidate item sets for each pass after
the first
Correct : B. Apriori-gen
94. The basic partition algorithm reduces the number of database scans to __________ & divides
it into partitions
Correct : B. Two
95. and prediction may be viewed as types of classification
Correct : C. Estimation.
96. can be thought of as classifying an attribute value into one of a set of possible classes
Correct : B. Prediction.
97. Prediction can be viewed as forecasting a value
Correct : C. Continuous.
98. data consists of sample input data as well as the classification assignment for the data
Correct : B. Measuring.
99. Rule based classification algorithms generate_________rule to perform the classification
Correct : A. If-then
100. are a different paradigm for computing which draws its inspiration from
neuroscience