1. The average squared difference between classifier predicted output and actual output.
Correct : A. mean squared error
2. Simple regression assumes a __________ relationship between the input attribute and output
attribute.
Correct : A. linear
3. Regression trees are often used to model _______ data.
Correct : B. nonlinear
4. The leaf nodes of a model tree are
Correct : C. linear regression equations.
5. Logistic regression is a ________ regression technique that is used to model data having a
_____outcome.
Correct : D. nonlinear, binary
6. This technique associates a conditional probability value with each data instance.
Correct : B. logistic regression
7. This supervised learning technique can process both numeric and categorical input attributes.
Correct : A. linear regression
8. With Bayes classifier, missing data items are
Correct : B. treated as unequal compares.
9. This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
Correct : D. k-means clustering
10. This clustering algorithm initially assumes that each data instance represents a single cluster.
Correct : C. k-means clustering
11. This unsupervised clustering algorithm terminates when mean values computed for the current
iteration of the algorithm are identical to the computed mean values for the previous iteration.
Correct : C. k-means clustering
12. Machine learning techniques differ from statistical techniques in that machine learning methods
Correct : B. are better able to deal with missing and noisy data.
13. In reinforcement learning if feedback is negative one it is defined as____.
Correct : A. Penalty
14. According to____ , it’s a key success factor for the survival and evolution of all species.
Correct : C. Darwin’s theory
15. What is ‘Training set’?
Correct : B. A set of data is used to discover the potentially predictive relationship.
16. Common deep learning applications include____
Correct : D. All above
17. Reinforcement learning is particularly efficient when______________.
Correct : D. All above
18. if there is only a discrete number of possible outcomes (called categories),
the process becomes a______.
Correct : B. Classification.
19. Which of the following are supervised learning applications
Correct : A. Spam detection,
Pattern detection,
Natural Language Processing
20. During the last few years, many ______ algorithms have been applied to deep
neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state.
Correct : D. None of above
21. What is ‘Overfitting’ in Machine learning?
Correct : A. when a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs.
22. What is ‘Test set’?
Correct : A. Test set is used to test the accuracy of the hypotheses generated by the learner.
23. ________is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value
Correct : B. Creating sub-model to predict those features
24. How it's possible to use a different placeholder through the parameter_______.
Correct : D. missing_values
25. If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class________.
Correct : A. RobustScaler
26. scikit-learn also provides a class for per-sample normalization, Normalizer. It can apply________to each element of a dataset
Correct : B. max, l1 and l2 norms
27. There are also many univariate methods that can be used in order to select the best features according to specific criteria based on________.
Correct : A. F-tests and p-values
28. ________performs a PCA with non-linearly separable data sets.
Correct : B. KernelPCA
29. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.
Which of the following statement is true in following case?
Correct : B. Feature F1 is an example of ordinal variable.
30. The parameter______ allows specifying the percentage of elements to put into the test/training set
Correct : C. All above
31. In many classification problems, the target ______ is made up of categorical labels which cannot immediately be processed by any algorithm.
Correct : B. dataset
32. _______adopts a dictionary-oriented approach, associating to each category label a progressive integer number.
Correct : A. LabelEncoder class
33. Function used for linear regression in R is __________
Correct : A. lm(formula, data)
34. In syntax of linear model lm(formula,data,..), data refers to ______
Correct : B. Vector
35. Which of the following methods do we use to find the best fit line for data in Linear Regression?
Correct : A. Least Square Error
36. Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable?
Correct : D. Mean-Squared-Error
37. Which of the following is true about Residuals ?
Correct : A. Lower is better
38. Naive Bayes classifiers are a collection ------------------of algorithms
Correct : A. Classification
39. Naive Bayes classifiers is _______________ Learning
Correct : A. Supervised
40. Features being classified is independent of each other in Naïve Bayes Classifier
Correct : B. true
41. Features being classified is __________ of each other in Naïve Bayes Classifier
Correct : A. Independent
42. Conditional probability is a measure of the probability of an event given that another event has already occurred.
Correct : A. True
43. Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Correct : A. True
44. Bernoulli Naïve Bayes Classifier is ___________distribution
Correct : C. Binary
45. Multinomial Naïve Bayes Classifier is ___________distribution
Correct : B. Discrete
46. Gaussian Naïve Bayes Classifier is ___________distribution
Correct : A. Continuous
47. Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.
Correct : A. True
48. Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature values.
Correct : A. Mean
49. SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = 1jx)
Correct : B. false
50. Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.
Correct : A. True
51. Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel) might lead to overfitting
Correct : A. True
52. SVM is a ------------------ algorithm
Correct : A. Classification
53. SVM is a ------------------ learning
Correct : A. Supervised
54. The linear SVM classifier works by drawing a straight line between two classes
Correct : A. True
55. What is Model Selection in Machine Learning?
Correct : A. The process of selecting models among different mathematical models, which are used to describe the same data set
56. Which are two techniques of Machine Learning ?
Correct : A. Genetic Programming and
Inductive Learning
57. Even if there are no actual supervisors ________ learning is also based on feedback provided by the environment
Correct : B. Reinforcement
58. When it is necessary to allow the model to develop a generalization ability and avoid a common problem called______.
Correct : A. Overfitting
59. Techniques involve the usage of both labeled and unlabeled data is called___.
Correct : B. Semi-supervised
60. A supervised scenario is characterized by the concept of a _____.
Correct : B. Teacher
61. overlearning causes due to an excessive ______.
Correct : A. Capacity
62. Which of the following are several models for feature extraction
Correct : C. None of the above
63. _____ provides some built-in datasets that can be used for testing purposes.
Correct : A. scikit-learn
64. While using _____ all labels are
turned into sequential numbers.
Correct : A. LabelEncoder class
65. _______produce sparse matrices of real numbers that can be fed into any machine learning model.
Correct : C. Both A & B
66. scikit-learn offers the class______, which is responsible for filling the holes using a strategy based on the mean, median, or frequency
Correct : D. Imputer
67. Which of the following scale data by removing elements that don't belong to a given range or by considering a maximum absolute value.
Correct : C. Both A & B
68. scikit-learn also provides a class for per-sample normalization,_____
Correct : A. Normalizer
69. ______dataset with many features contains information proportional to the independence of all features and their variance.
Correct : B. unnormalized
70. In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the_____.
Correct : D. Covariance matrix
71. The_____ parameter can assume different values which determine how the data matrix is initially processed.
Correct : C. init
72. ______allows exploiting the natural sparsity of data while extracting principal components.
Correct : A. SparsePCA
73. Which of the following statement is true about outliers in Linear regression?
Correct : A. Linear regression is sensitive to outliers
74. Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?
Correct : A. Since the there is a relationship means our model is not good
75. Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?
Correct : C. None of the above
76. In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?
Correct : C. Individually R squared cannot tell about variable importance. We can’t say anything about it right now.
77. To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?
Correct : A. Scatter plot
78. which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most.
Correct : A. The polynomial degree
79. Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?
Correct : B. Lasso regression uses subset selection of features
80. Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and Adjusted R-squared increases
Correct : A. 1 and 2
81. What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function
Correct : C. 1 and 2
82. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.What would happen when you use very small C (C~0)?
Correct : A. Misclassification would happen
83. The cost parameter in the SVM means:
Correct : C. The tradeoff between misclassification and simplicity of the model
84. How do you handle missing or corrupted data in a dataset?
Correct : D. d. All of the above
85. Which of the following statements about Naive Bayes is incorrect?
Correct : B. Attributes are statistically dependent of one another given the class value.
86. The SVM’s are less effective when:
Correct : C. The data is noisy and contains overlapping points
87. If there is only a discrete number of possible outcomes called _____.
Correct : B. Categories
88. Some people are using the term ___ instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic.
Correct : A. Inference
89. The term _____ can be freely used, but with the same meaning adopted in physics or system theory.
Correct : D. Prediction
90. Common deep learning applications / problems can also be solved using____
Correct : B. Classic approaches
91. what is the function of ‘Unsupervised Learning’?
Correct : D. All
92. What are the two methods used for the calibration in Supervised Learning?
Correct : A. Platt Calibration and Isotonic Regression
93. Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now, we rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with the same regularization parameter.Now, which of the following option will be correct?
Correct : B. It is more likely for X1 to be included in the model
94. Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?
Correct : B. Lasso regression uses subset selection of features
95. Which of the following statement(s) can be true post adding a variable in a linear regression model?
1. R-Squared and Adjusted R-squared both increase
2. R-Squared increases and Adjusted R-squared decreases
3. R-Squared decreases and Adjusted R-squared decreases
4. R-Squared decreases and Adjusted R-squared increases
Correct : A. 1 and 2
96. We can also compute the coefficient of linear regression with the help of an analytical method called “Normal Equation”. Which of the following is/are true about “Normal Equation”?
1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. No need to iterate
Correct : D. 1,2 and 3.
97. If two variables are correlated, is it necessary that they have a linear relationship?
Correct : B. No
98. When the C parameter is set to infinite, which of the following holds true?
Correct : A. The optimal hyperplane if exists, will be the one that completely separates the data
99. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.What would happen when you use very large value of C(C->infinity)?
Correct : A. We can still classify data correctly for given setting of hyper parameter C