1. This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:
Correct : A. k-means clustering
2. Which one of the following is the main reason for pruning a Decision Tree?
Correct : D. to avoid overfitting the training set
3. You've just finished training a decision tree for spam classification, and it is getting abnormally bad performance on both your training and test sets. You know that your implementation has no bugs, so what could be causing the problem?
Correct : A. your decision trees are too shallow.
4. The K-means algorithm:
Correct : C. minimizes the within class variance for a given number of clusters
5. Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?
1. Single-link
2. Complete-link
3. Average-link
Correct : D. 1, 2 and 3
6. In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
4. Data points with non-convex shapes
Correct : D. 1, 2 and 4
7. Hierarchical clustering is slower than non-hierarchical clustering?
Correct : A. true
8. High entropy means that the partitions in classification are
Correct : B. not pure
9. Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate?
Correct : B. density-based clustering
10. The main disadvantage of maximum likelihood methods is that they are _____
Correct : D. computationally intense
11. The maximum likelihood method can be used to explore relationships among more diverse sequences, conditions that are not well handled by maximum parsimony methods.
Correct : A. true
12. Which Statement is not true statement.
Correct : C. k-nearest neighbor is same as k-means
13. what is Feature scaling done before applying K-Mean algorithm?
Correct : A. in distance calculation it will give the same weights for all features
14. With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as
Correct : B. an a priori probability
15. The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%.
We also know that 3% of the adult population subscribes to automotive magazine.
The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%.
Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
Correct : D. 0.0396
16. What is the naïve assumption in a Naïve Bayes Classifier.
Correct : D. all the features of a class are conditionally dependent on each other
17. Based on survey , it was found that the probability that person like to watch serials is 0.25 and the probability that person like to watch netflix series is 0.43. Also the probability that person like to watch serials and netflix sereis is 0.12. what is the probability that a person doesn't like to watch either?
Correct : A. 0.32
18. What is the actual number of independent parameters which need to be estimated in P dimensional Gaussian distribution model?
Correct : D. p(p+3)/2
19. Give the correct Answer for following statements.
1. It is important to perform feature normalization before using the Gaussian kernel.
2. The maximum value of the Gaussian kernel is 1.
Correct : C. 1 is true, 2 is true
20. Which of the following quantities are minimized directly or indirectly during parameter estimation in Gaussian distribution Model?
Correct : A. negative log-likelihood
21. Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively.
Correct : B. 1
22. Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:
Correct : B. y is true when x is known to be true.
23. Which of the following statements about Naive Bayes is incorrect?
Correct : B. attributes are statistically dependent of one another given the class value.
24. How the entries in the full joint probability distribution can be calculated?
Correct : B. using information
25. How many terms are required for building a bayes model?
Correct : C. 3
26. Skewness of Normal distribution is ___________
Correct : C. 0
27. The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
Correct : C. as the value of one attribute decreases the value of the second attribute increases
28. 8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters,
C1, C2, C3 has following observations:
C1: {(2,2), (4,4), (6,6)}
C2: {(0,4), (4,0),(2,5)}
C3: {(5,5), (9,9)}
What will be the cluster centroids if you want to proceed for second iteration?
Correct : D. c1: (4,4), c2: (3,3), c3: (7,7)
29. In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?
Correct : A. p(x/c)
30. Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance
2. Generalized models
3. Better interpretability
Correct : D. 1 and 2
31. What is back propagation?
Correct : A. it is another name given to the curvy function in the perceptron
32. Which of the following is an application of NN (Neural Network)?
Correct : D. all of the mentioned
33. Neural Networks are complex ______________ with many parameters.
Correct : A. linear functions
34. Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results.
Correct : C. true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded
35. Which one of the following is not a major strength of the neural network approach?
Correct : A. neural network learning algorithms are guaranteed to converge to an optimal solution
36. The network that involves backward links from output to the input and hidden layers is called
Correct : C. recurrent neural network
37. Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?
1. Max number of samples
2. Max features
3. Bootstrapping of samples
4. Bootstrapping of features
Correct : D. 1,2,3&4
38. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
d) None of the mentioned
Correct : C. c
39. In an election for the head of college, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes.which of the following ensembles method works similar to the discussed elction Procedure?
Correct : A. ??bagging
40. What is the sequence of the following tasks in a perceptron?
Initialize weights of perceptron randomly
Go to the next batch of dataset
If the prediction does not match the output, change the weights
For a sample input, compute an output
Correct : A. 1, 4, 3, 2
41. In which neural net architecture, does weight sharing occur?
Correct : D. both a and b
42. Which of the following are correct statement(s) about stacking?
1. A machine learning model is trained on predictions of multiple machine learning models
2. A Logistic regression will definitely work better in the second stage as compared to other classification methods
3. First stage models are trained on full / partial feature space of training data
Correct : C. 1 and 3
43. Given above is a description of a neural network. When does a neural network model become a deep learning model?
Correct : A. when you add more hidden layers and increase depth of neural network
44. What are the steps for using a gradient descent algorithm?
1)Calculate error between the actual value and the predicted value
2)Reiterate until you find the best weights of network
3)Pass an input through the network and get values from output layer
4)Initialize random weight and bias
5)Go to each neurons which contributes to the error and change its respective values to reduce the error
Correct : B. 4, 3, 1, 5, 2
45. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4, 10, 10 and 30 respectively. What will be the output?
Correct : D. 348
46. Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional network.
Correct : B. false
47. The F-test
Correct : C. considers the reduction in error when moving from the reduced model to the complete model
48. What is true about an ensembled classifier?
1. Classifiers that are more “sure” can vote with more conviction
2. Classifiers can be more “sure” about a particular part of the space
3. Most of the times, it performs better than a single classifier
Correct : D. all of the above
49. Which of the following option is / are correct regarding benefits of ensemble model?
1. Better performance
2. Generalized models
3. Better interpretability
Correct : C. 1 and 2
50. Which of the following can be true for selecting base learners for an ensemble?
1. Different learners can come from same algorithm with different hyper parameters
2. Different learners can come from different algorithms
3. Different learners can come from different training spaces
Correct : D. 1, 2 and 3
51. True or False: Ensemble learning can only be applied to supervised learning methods.
Correct : B. false
52. True or False: Ensembles will yield bad results when there is significant diversity among the models.
Note: All individual models have meaningful and good predictions.
Correct : B. false
53. Which of the following is / are true about weak learners used in ensemble model?
1. They have low variance and they don’t usually overfit
2. They have high bias, so they can not solve hard learning problems
3. They have high variance and they don’t usually overfit
Correct : A. 1 and 2
54. True or False: Ensemble of classifiers may or may not be more accurate than any of its individual model.
Correct : A. true
55. If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance?
Correct : B. no
56. Generally, an ensemble method works better, if the individual base models have ____________?
Note: Suppose each individual base models have accuracy greater than 50%.
Correct : A. less correlation among predictions
57. In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?
Hint: Persons are like base models of ensemble method.
Correct : A. bagging
58. Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.
Suppose you are using averaging as ensemble technique. What will be the probabilities that ensemble of above 25 classifiers will make a wrong prediction?
Note: All classifiers are independent of each other
Correct : B. 0.06
59. In machine learning, an algorithm (or learning algorithm) is said to be unstable if a small change in training data cause the large change in the learned classifiers.
True or False: Bagging of unstable classifiers is a good idea
Correct : A. true
60. Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?
1. Max number of samples
2. Max features
3. Bootstrapping of samples
4. Bootstrapping of features
Correct : D. all of above
61. How is the model capacity affected with dropout rate (where model capacity means the ability of a neural network to approximate complex functions)?
Correct : B. model capacity decreases in increase in dropout rate
62. True or False: Dropout is computationally expensive technique w.r.t. bagging
Correct : B. false
63. Suppose, you want to apply a stepwise forward selection method for choosing the best models for an ensemble model. Which of the following is the correct order of the steps?
Note: You have more than 1000 models predictions
1. Add the models predictions (or in another term take the average) one by one in the ensemble which improves the metrics in the validation set.
2. Start with empty ensemble
3. Return the ensemble from the nested set of ensembles that has maximum performance on the validation set
Correct : D. none of above
64. Suppose, you have 2000 different models with their predictions and want to ensemble predictions of best x models. Now, which of the following can be a possible method to select the best x models for an ensemble?
Correct : C. both
65. Below are the two ensemble models:
1. E1(M1, M2, M3) and
2. E2(M4, M5, M6)
Above, Mx is the individual base models.
Which of the following are more likely to choose if following conditions for E1 and E2 are given?
E1: Individual Models accuracies are high but models are of the same type or in another term less diverse
E2: Individual Models accuracies are high but they are of different types in another term high diverse in nature
Correct : B. e2
66. True or False: In boosting, individual base learners can be parallel.
Correct : B. false
67. Which of the following is true about bagging?
1. Bagging can be parallel
2. The aim of bagging is to reduce bias not variance
3. Bagging helps in reducing overfitting
Correct : C. 1 and 3
68. Suppose you are using stacking with n different machine learning algorithms with k folds on data.
Which of the following is true about one level (m base models + 1 stacker) stacking?
Note:
Here, we are working on binary classification problem
All base models are trained on all features
You are using k folds for base models
Correct : B. you will have only m features after the first stage
69. Which of the following is the difference between stacking and blending?
Correct : D. none of these
70. Which of the following can be one of the steps in stacking?
1. Divide the training data into k folds
2. Train k models on each k-1 folds and get the out of fold predictions for remaining one fold
3. Divide the test data set in “k” folds and get individual fold predictions by different algorithms
Correct : A. 1 and 2
71. Q25. Which of the following are advantages of stacking?
1) More robust model
2) better prediction
3) Lower time of execution
Correct : A. 1 and 2
72. Which of the following are correct statement(s) about stacking?
A machine learning model is trained on predictions of multiple machine learning models
A Logistic regression will definitely work better in the second stage as compared to other classification methods
First stage models are trained on full / partial feature space of training data
Correct : C. 1 and 3
73. Which of the following is true about weighted majority votes?
1. We want to give higher weights to better performing models
2. Inferior models can overrule the best model if collective weighted votes for inferior models is higher than best model
3. Voting is special case of weighted voting
Correct : D. 1, 2 and 3
74. Which of the following is true about averaging ensemble?
Correct : C. it can be used in both classification as well as regression
75. How can we assign the weights to output of different models in an ensemble?
1. Use an algorithm to return the optimal weights
2. Choose the weights using cross validation
3. Give high weights to more accurate models
Correct : D. all of above
76. Suppose you are given ‘n’ predictions on test data by ‘n’ different models (M1, M2, …. Mn) respectively. Which of the following method(s) can be used to combine the predictions of these models?
Note: We are working on a regression problem
1. Median
2. Product
3. Average
4. Weighted sum
5. Minimum and Maximum
6. Generalized mean rule
Correct : D. all of above
77. In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?
Hint: Persons are like base models of ensemble method.
Correct : A. bagging
78. If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance?
Correct : B. no
79. Which of the following is NOT supervised learning?
Correct : A. pca
80. According to , it's a key success factor for the survival and evolution of all species.
Correct : C. darwin's theory
81. How can you avoid overfitting ?
Correct : A. by using a lot of data
82. What are the popular algorithms of Machine Learning?
Correct : D. all
83. What is Training set?
Correct : B. a set of data is used to discover the potentially predictive relationship.
84. Common deep learning applications include
Correct : D. all above
85. what is the function of Supervised Learning?
Correct : C. both a & b
86. Commons unsupervised applications include
Correct : D. all above
87. Reinforcement learning is particularly efficient when .
Correct : D. all above
88. if there is only a discrete number of possible outcomes (called categories), the process becomes a .
Correct : B. classification.
89. Which of the following are supervised learning applications
Correct : A. spam detection, pattern detection, natural language processing
90. During the last few years, many algorithms have been applied to deep neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state.
Correct : D. none of above
91. Which of the following sentence is correct?
Correct : C. both a & b
92. What is Overfitting in Machine learning?
Correct : A. when a statistical model describes random error or noise instead of underlying relationship overfitting occurs.
93. What is Test set?
Correct : A. test set is used to test the accuracy of the hypotheses generated by the learner.
94. is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value
Correct : B. creating sub-model to predict those features
95. How it's possible to use a different placeholder through the parameter .
Correct : D. missing_values
96. If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class .
Correct : A. robustscaler
97. scikit-learn also provides a class for per- sample normalization, Normalizer. It can apply to each element of a dataset
Correct : B. max, l1 and l2 norms
98. There are also many univariate methods that can be used in order to select the best features according to specific criteria based on .
Correct : A. f-tests and p-values
99. Which of the following selects only a subset of features belonging to a certain percentile
Correct : A. selectpercentile
100. performs a PCA with non-linearly separable data sets.