Quiznetik

Data Mining and Data Warehouse | Set 2

1. A fact is said to be partially additive if ___________.

A. it is additive over every dimension of its dimensionality.

B. additive over atleast one but not all of the dimensions.

C. not additive over any dimension.

D. none of the above.

Correct : B. additive over atleast one but not all of the dimensions.

2. A fact is said to be non-additive if ___________.

A. it is additive over every dimension of its dimensionality.

B. additive over atleast one but not all of the dimensions.

C. not additive over any dimension.

D. none of the above.

Correct : C. not additive over any dimension.

3. Non-additive measures can often combined with additive measures to create new _________.

A. additive measures.

B. non-additive measures.

C. partially additive.

D. all of the above.

Correct : A. additive measures.

4. A fact representing cumulative sales units over a day at a store for a product is a _________.

A. additive fact.

B. fully additive fact.

C. partially additive fact.

D. non-additive fact.

Correct : B. fully additive fact.

5. ____________ of data means that the attributes within a given entity are fully dependent on the entire primary key of the entity.

A. additivity.

B. granularity.

C. functional dependency.

D. dependency.

Correct : C. functional dependency.

6. Which of the following is the other name of Data mining?

A. exploratory data analysis.

B. data driven discovery.

C. deductive learning.

D. all of the above.

Correct : D. all of the above.

7. Which of the following is a predictive model?

A. clustering.

B. regression.

C. summarization.

D. association rules.

Correct : B. regression.

8. Which of the following is a descriptive model?

A. classification.

B. regression.

C. sequence discovery.

D. association rules.

Correct : C. sequence discovery.

9. A ___________ model identifies patterns or relationships.

A. descriptive.

B. predictive.

C. regression.

D. time series analysis.

Correct : A. descriptive.

10. A predictive model makes use of ________.

A. current data.

B. historical data.

C. both current and historical data.

D. assumptions.

Correct : B. historical data.

11. ____________ maps data into predefined groups.

A. regression.

B. time series analysis

C. prediction.

D. classification.

Correct : D. classification.

12. __________ is used to map a data item to a real valued prediction variable.

A. regression.

B. time series analysis.

C. prediction.

D. classification.

Correct : B. time series analysis.

13. In ____________, the value of an attribute is examined as it varies over time.

A. regression.

B. time series analysis.

C. sequence discovery.

D. prediction.

Correct : B. time series analysis.

14. In ________ the groups are not predefined.

A. association rules.

B. summarization.

C. clustering.

D. prediction.

Correct : C. clustering.

15. Link Analysis is otherwise called as ___________.

A. affinity analysis.

B. association rules.

C. both a & b.

D. prediction.

Correct : C. both a & b.

16. _________ is a the input to KDD.

A. data.

B. information.

C. query.

D. process.

Correct : A. data.

17. The output of KDD is __________.

A. data.

B. information.

C. query.

D. useful information.

Correct : D. useful information.

18. The KDD process consists of ________ steps.

A. three.

B. four.

C. five.

D. six.

Correct : C. five.

19. Treating incorrect or missing data is called as ___________.

A. selection.

B. preprocessing.

C. transformation.

D. interpretation.

Correct : B. preprocessing.

20. Converting data from different sources into a common format for processing is called as ________.

A. selection.

B. preprocessing.

C. transformation.

D. interpretation.

Correct : C. transformation.

21. Various visualization techniques are used in ___________ step of KDD.

A. selection.

B. transformaion.

C. data mining.

D. interpretation.

Correct : D. interpretation.

22. Extreme values that occur infrequently are called as _________.

A. outliers.

B. rare values.

C. dimensionality reduction.

D. all of the above.

Correct : A. outliers.

23. Box plot and scatter diagram techniques are _______.

A. graphical.

B. geometric.

C. icon-based.

D. pixel-based.

Correct : B. geometric.

24. __________ is used to proceed from very specific knowledge to more general information.

A. induction.

B. compression.

C. approximation.

D. substitution.

Correct : A. induction.

25. Describing some characteristics of a set of data by a general model is viewed as ____________

A. induction.

B. compression.

C. approximation.

D. summarization.

Correct : B. compression.

26. _____________ helps to uncover hidden information about the data.

A. induction.

B. compression.

C. approximation.

D. summarization.

Correct : C. approximation.

27. _______ are needed to identify training data and desired results.

A. programmers.

B. designers.

C. users.

D. administrators.

Correct : C. users.

28. Overfitting occurs when a model _________.

A. does fit in future states.

B. does not fit in future states.

C. does fit in current state.

D. does not fit in current state.

Correct : B. does not fit in future states.

29. The problem of dimensionality curse involves ___________.

A. the use of some attributes may interfere with the correct completion of a data mining task.

B. the use of some attributes may simply increase the overall complexity.

C. some may decrease the efficiency of the algorithm.

D. all of the above.

Correct : D. all of the above.

30. Incorrect or invalid data is known as _________.

A. changing data.

B. noisy data.

C. outliers.

D. missing data.

Correct : B. noisy data.

31. ROI is an acronym of ________.

A. return on investment.

B. return on information.

C. repetition of information.

D. runtime of instruction

Correct : A. return on investment.

32. The ____________ of data could result in the disclosure of information that is deemed to be confidential.

A. authorized use.

B. unauthorized use.

C. authenticated use.

D. unauthenticated use.

Correct : B. unauthorized use.

33. ___________ data are noisy and have many missing attribute values.

A. preprocessed.

B. cleaned.

C. real-world.

D. transformed.

Correct : C. real-world.

34. The rise of DBMS occurred in early ___________.

A. 1950\s.

B. 1960\s

C. 1970\s

D. 1980\s.

Correct : C. 1970\s

35. SQL stand for _________.

A. standard query language.

B. structured query language.

C. standard quick list.

D. structured query list.

Correct : B. structured query language.

36. Which of the following is not a data mining metric?

A. space complexity.

B. time complexity.

C. roi.

D. all of the above.

Correct : D. all of the above.

37. Reducing the number of attributes to solve the high dimensionality problem is called as ________.

A. dimensionality curse.

B. dimensionality reduction.

C. cleaning.

D. overfitting.

Correct : B. dimensionality reduction.

38. Data that are not of interest to the data mining task is called as ______.

A. missing data.

B. changing data.

C. irrelevant data.

D. noisy data.

Correct : C. irrelevant data.

39. ______ are effective tools to attack the scalability problem.

A. sampling.

B. parallelization

C. both a & b.

D. none of the above.

Correct : C. both a & b.

40. Market-basket problem was formulated by __________.

A. agrawal et al.

B. steve et al.

C. toda et al.

D. simon et al.

Correct : A. agrawal et al.

41. Data mining helps in __________.

A. inventory management.

B. sales promotion strategies.

C. marketing strategies.

D. all of the above.

Correct : D. all of the above.

42. The proportion of transaction supporting X in T is called _________.

A. confidence.

B. support.

C. support count.

D. all of the above.

Correct : B. support.

43. The absolute number of transactions supporting X in T is called ___________.

A. confidence.

B. support.

C. support count.

D. none of the above.

Correct : C. support count.

44. The value that says that transactions in D that support X also support Y is called ______________.

A. confidence.

B. support.

C. support count.

D. none of the above.

Correct : A. confidence.

45. If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain jam, 10000 transaction contain both bread and jam. Then the support of bread and jam is _______.

A. 2%

B. 20%

C. 3%

D. 30%

Correct : A. 2%

46. 7 If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain jam, 10000 transaction contain both bread and jam. Then the confidence of buying bread with jam is _______.

A. 33.33%

B. 66.66%

C. 45%

D. 50%

Correct : D. 50%

47. The left hand side of an association rule is called __________.

A. consequent.

B. onset.

C. antecedent.

D. precedent.

Correct : C. antecedent.

48. The right hand side of an association rule is called _____.

A. consequent.

B. onset.

C. antecedent.

D. precedent.

Correct : A. consequent.

49. Which of the following is not a desirable feature of any efficient algorithm?

A. to reduce number of input operations.

B. to reduce number of output operations.

C. to be efficient in computing.

D. to have maximal code length.

Correct : D. to have maximal code length.

50. All set of items whose support is greater than the user-specified minimum support are called as _____________.

A. border set.

B. frequent set.

C. maximal frequent set.

D. lattice.

Correct : B. frequent set.

51. If a set is a frequent set and no superset of this set is a frequent set, then it is called ________.

A. maximal frequent set.

B. border set.

C. lattice.

D. infrequent sets.

Correct : A. maximal frequent set.

52. Any subset of a frequent set is a frequent set. This is ___________.

A. upward closure property.

B. downward closure property.

C. maximal frequent set.

D. border set.

Correct : B. downward closure property.

53. Any superset of an infrequent set is an infrequent set. This is _______.

A. maximal frequent set.

B. border set.

C. upward closure property.

D. downward closure property.

Correct : C. upward closure property.

54. If an itemset is not a frequent set and no superset of this is a frequent set, then it is _______.

A. maximal frequent set

B. border set.

C. upward closure property.

D. downward closure property.

Correct : B. border set.

55. A priori algorithm is otherwise called as __________.

A. width-wise algorithm.

B. level-wise algorithm.

C. pincer-search algorithm.

D. fp growth algorithm.

Correct : B. level-wise algorithm.

56. The A Priori algorithm is a ___________.

A. top-down search.

B. breadth first search.

C. depth first search.

D. bottom-up search.

Correct : D. bottom-up search.

57. The first phase of A Priori algorithm is _______.

A. candidate generation.

B. itemset generation.

C. pruning.

D. partitioning.

Correct : A. candidate generation.

58. The second phaase of A Priori algorithm is ____________.

A. candidate generation.

B. itemset generation.

C. pruning.

D. partitioning.

Correct : C. pruning.

59. The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent, from being considered for counting support.

A. candidate generation.

B. pruning.

C. partitioning.

D. itemset eliminations.

Correct : B. pruning.

60. The a priori frequent itemset discovery algorithm moves _______ in the lattice.

A. upward.

B. downward.

C. breadthwise.

D. both upward and downward.

Correct : A. upward.

61. After the pruning of a priori algorithm, _______ will remain.

A. only candidate set.

B. no candidate set.

C. only border set.

D. no border set.

Correct : B. no candidate set.

62. The number of iterations in a priori ___________.

A. increases with the size of the maximum frequent set.

B. decreases with increase in size of the maximum frequent set.

C. increases with the size of the data.

D. decreases with the increase in size of the data.

Correct : A. increases with the size of the maximum frequent set.

63. MFCS is the acronym of _____.

A. maximum frequency control set.

B. minimal frequency control set.

C. maximal frequent candidate set.

D. minimal frequent candidate set.

Correct : C. maximal frequent candidate set.

64. Dynamuc Itemset Counting Algorithm was proposed by ____.

A. bin et al.

B. argawal et at.

C. toda et al.

D. simon et at.

Correct : A. bin et al.

65. Itemsets in the ______ category of structures have a counter and the stop number with them.

A. dashed.

B. circle.

C. box.

D. solid.

Correct : A. dashed.

66. The itemsets in the _______category structures are not subjected to any counting.

A. dashes.

B. box.

C. solid.

D. circle.

Correct : C. solid.

67. Certain itemsets in the dashed circle whose support count reach support value during an iteration move into the ______.

A. dashed box.

B. solid circle.

C. solid box.

D. none of the above.

Correct : A. dashed box.

68. Certain itemsets enter afresh into the system and get into the _______, which are essentially the supersets of the itemsets that move from the dashed circle to the dashed box.

A. dashed box.

B. solid circle.

C. solid box.

D. dashed circle.

Correct : D. dashed circle.

69. The itemsets that have completed on full pass move from dashed circle to ________.

A. dashed box.

B. solid circle.

C. solid box.

D. none of the above.

Correct : B. solid circle.

70. The FP-growth algorithm has ________ phases.

A. one.

B. two.

C. three.

D. four.

Correct : B. two.

71. A frequent pattern tree is a tree structure consisting of ________.

A. an item-prefix-tree.

B. a frequent-item-header table.

C. a frequent-item-node.

D. both a & b.

Correct : D. both a & b.

72. The non-root node of item-prefix-tree consists of ________ fields.

A. two.

B. three.

C. four.

D. five.

Correct : B. three.

73. The frequent-item-header-table consists of __________ fields.

A. only one.

B. two.

C. three.

D. four.

Correct : B. two.

74. The paths from root node to the nodes labelled 'a' are called __________.

A. transformed prefix path.

B. suffix subpath.

C. transformed suffix path.

D. prefix subpath.

Correct : D. prefix subpath.

75. The transformed prefix paths of a node 'a' form a truncated database of pattern which co-occur with a is called _______.

A. suffix path.

B. fp-tree.

C. conditional pattern base.

D. prefix path.

Correct : C. conditional pattern base.

76. The goal of _____ is to discover both the dense and sparse regions of a data set.

A. association rule.

B. classification.

C. clustering.

D. genetic algorithm.

Correct : C. clustering.

77. Which of the following is a clustering algorithm?

A. a priori.

B. clara.

C. pincer-search.

D. fp-growth.

Correct : B. clara.

78. _______ clustering technique start with as many clusters as there are records, with each cluster having only one record.

A. agglomerative.

B. divisive.

C. partition.

D. numeric.

Correct : A. agglomerative.

79. __________ clustering techniques starts with all records in one cluster and then try to split that cluster into small pieces.

A. agglomerative.

B. divisive.

C. partition.

D. numeric.

Correct : B. divisive.

80. Which of the following is a data set in the popular UCI machine-learning repository?

A. clara.

B. cactus.

C. stirr.

D. mushroom.

Correct : D. mushroom.

81. In ________ algorithm each cluster is represented by the center of gravity of the cluster.

A. k-medoid.

B. k-means.

C. stirr.

D. rock.

Correct : B. k-means.

82. In ___________ each cluster is represented by one of the objects of the cluster located near the center.

A. k-medoid.

B. k-means.

C. stirr.

D. rock.

Correct : A. k-medoid.

83. Pick out a k-medoid algoithm.

A. dbscan.

B. birch.

C. pam.

D. cure.

Correct : C. pam.

84. Pick out a hierarchical clustering algorithm.

A. dbscan

B. birch.

C. pam.

D. cure.

Correct : B. birch.

85. CLARANS stands for _______.

A. clara net server.

B. clustering large application range network search.

C. clustering large applications based on randomized search.

D. clustering application randomized search.

Correct : C. clustering large applications based on randomized search.

86. BIRCH is a ________.

A. agglomerative clustering algorithm.

B. hierarchical algorithm.

C. hierarchical-agglomerative algorithm.

D. divisive.

Correct : C. hierarchical-agglomerative algorithm.

87. The cluster features of different subclusters are maintained in a tree called ___________.

A. cf tree.

B. fp tree.

C. fp growth tree.

D. b tree.

Correct : A. cf tree.

88. The ________ algorithm is based on the observation that the frequent sets are normally very few in number compared to the set of all itemsets.

A. a priori.

B. clustering.

C. association rule.

D. partition.

Correct : D. partition.

89. The partition algorithm uses _______ scans of the databases to discover all frequent sets.

A. two.

B. four.

C. six.

D. eight.

Correct : A. two.

90. The basic idea of the apriori algorithm is to generate________ item sets of a particular size & scans the database.

A. candidate.

B. primary.

C. secondary.

D. superkey.

Correct : A. candidate.

91. An algorithm called________is used to generate the candidate item sets for each pass after the first.

A. apriori.

B. apriori-gen.

C. sampling.

D. partition.

Correct : B. apriori-gen.

92. The basic partition algorithm reduces the number of database scans to ________ & divides it into partitions.

A. one.

B. two.

C. three.

D. four.

Correct : B. two.

93. ___________and prediction may be viewed as types of classification.

A. decision.

B. verification.

C. estimation.

D. illustration.

Correct : C. estimation.

94. ___________can be thought of as classifying an attribute value into one of a set of possible classes.

A. estimation.

B. prediction.

C. identification.

D. clarification.

Correct : B. prediction.

95. Prediction can be viewed as forecasting a_________value.

A. non-continuous.

B. constant.

C. continuous.

D. variable.

Correct : C. continuous.

96. _________data consists of sample input data as well as the classification assignment for the data.

A. missing.

B. measuring.

C. non-training.

D. training.

Correct : D. training.

97. Rule based classification algorithms generate ______ rule to perform the classification.

A. if-then.

B. while.

C. do while.

D. switch.

Correct : A. if-then.

98. ____________ are a different paradigm for computing which draws its inspiration from neuroscience.

A. computer networks.

B. neural networks.

C. mobile networks.

D. artificial networks.

Correct : B. neural networks.

99. The human brain consists of a network of ___________.

A. neurons.

B. cells.

C. tissue.

D. muscles.

Correct : A. neurons.

100. Each neuron is made up of a number of nerve fibres called _____________.

A. electrons.

B. molecules.

C. atoms.

D. dendrites.

Correct : D. dendrites.