1. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.
Which of the following statement is true in following case?
Correct : B. feature f1 is an example of ordinal variable.
2. What would you do in PCA to get the same projection as SVD?
Correct : A. transform data to zero mean
3. What is PCA, KPCA and ICA used for?
Correct : D. all above
4. Can a model trained for item based similarity also choose from a given set of items?
Correct : A. yes
5. What are common feature selection methods in regression task?
Correct : C. all above
6. The parameter allows specifying the percentage of elements to put into the test/training set
Correct : C. all above
7. In many classification problems, the target is made up of categorical labels which cannot immediately be processed by any algorithm.
Correct : B. dataset
8. adopts a dictionary-oriented approach, associating to each category label a progressive integer number.
Correct : A. labelencoder class
9. If Linear regression model perfectly first i.e., train error is zero, then
Correct : C. couldn't comment on test error
10. Which of the following metrics can be used for evaluating regression models?
i) R Squared
ii) Adjusted R Squared
iii) F Statistics
iv) RMSE / MSE / MAE
Correct : D. i, ii, iii and iv
11. In a simple linear regression model (One independent variable), If we change the input variable by 1 unit.
How much output variable will change?
Correct : D. by its slope
12. Function used for linear regression in R is
Correct : A. lm(formula, data)
13. In syntax of linear model lm(formula,data,..), data refers to
Correct : B. vector
14. In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to
Correct : C. (y-intercept, slope)
15. Linear Regression is a supervised machine learning algorithm.
Correct : A. true
16. It is possible to design a Linear regression algorithm using a neural network?
Correct : A. true
17. Overfitting is more likely when you have huge amount of data to train?
Correct : B. false
18. Which of the following statement is true about outliers in Linear regression?
Correct : A. linear regression is sensitive to outliers
19. Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?
Correct : A. since the there is a relationship means our model is not good
20. Naive Bayes classifiers are a collection
------------------of algorithms
Correct : A. classification
21. Naive Bayes classifiers is
Learning
Correct : A. supervised
22. Features being classified is independent of each other in Nave Bayes Classifier
Correct : B. true
23. Features being classified is of each other in Nave Bayes Classifier
Correct : A. independent
24. Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.
2. P(E) is the probability of the evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence given that hypothesis is true.
4. P(H|E) is the probability of the hypothesis given that the evidence is there.
Correct : A. true
25. In given image, P(H|E) is probability.
Correct : A. posterior
26. In given image, P(H)
is probability.
Correct : B. prior
27. Conditional probability is a measure of the probability of an event given that another
Correct : A. true
28. Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Correct : A. true
29. Bernoulli Nave Bayes Classifier is
distribution
Correct : C. binary
30. Multinomial Nave Bayes Classifier is
distribution
Correct : B. discrete
31. Gaussian Nave Bayes Classifier is
distribution
Correct : A. continuous
32. Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.
Correct : A. true
33. Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the of the feature values.
Correct : A. mean
34. SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = ??1jx)
Correct : B. false
35. Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.
Correct : A. true
36. Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel) might lead to overfitting
Correct : A. true
37. SVM is a algorithm
Correct : A. classification
38. SVM is a learning
Correct : A. supervised
39. The linearSVMclassifier works by drawing a straight line between two classes
Correct : A. true
40. Which of the following function provides unsupervised prediction ?
Correct : D. none of the mentioned
41. Which of the following is characteristic of best machine learning method ?
Correct : D. all above
42. What are the different Algorithm techniques in Machine Learning?
Correct : C. both a & b
43. What is the standard approach to supervised learning?
Correct : A. split the set of example into the training set and the test
44. Which of the following is not Machine Learning?
Correct : B. rule based inference
45. What is Model Selection in Machine Learning?
Correct : A. the process of selecting models among different mathematical models, which are used to describe the same data set
46. Which are two techniques of Machine Learning ?
Correct : A. genetic programming and inductive learning
47. Even if there are no actual supervisors
learning is also based on feedback provided by the environment
Correct : B. reinforcement
48. What does learning exactly mean?
Correct : C. learning is the ability to change according to external stimuli and remembering most of all previous experiences.
49. When it is necessary to allow the model to develop a generalization ability and avoid a common problem called .
Correct : A. overfitting
50. Techniques involve the usage of both labeled and unlabeled data is called .
Correct : B. semi-supervised
51. In reinforcement learning if feedback is negative one it is defined as .
Correct : A. penalty
52. According to , it's a key success factor for the survival and evolution of all species.
Correct : C. darwin's theory
53. A supervised scenario is characterized by the concept of a .
Correct : B. teacher
54. overlearning causes due to an excessive
.
Correct : A. capacity
55. Which of the following is an example of a deterministic algorithm?
Correct : A. pca
56. Which of the following model model include a backwards elimination feature selection routine?
Correct : B. mars
57. Can we extract knowledge without apply feature selection
Correct : A. yes
58. While using feature selection on the data, is the number of features decreases.
Correct : B. yes
59. Which of the following are several models
Correct : C. none of the above
60. provides some built-in datasets that can be used for testing purposes.
Correct : A. scikit-learn
61. While using all labels are turned into sequential numbers.
Correct : A. labelencoder class
62. produce sparse matrices of real numbers that can be fed into any machine learning model.
Correct : C. both a & b
63. scikit-learn offers the class , which is responsible for filling the holes using a strategy based on the mean, median, or frequency
Correct : D. imputer
64. Which of the following scale data by removing elements that don't belong to a given range or by considering a maximum absolute value.
Correct : C. both a & b
65. scikit-learn also provides a class for per- sample normalization,
Correct : A. normalizer
66. dataset with many features contains information proportional to the independence of all features and their variance.
Correct : B. unnormalized
67. In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the .
Correct : D. covariance matrix
68. The parameter can assume different values which determine how the data matrix is initially processed.
Correct : C. init
69. allows exploiting the natural sparsity of data while extracting principal components.
Correct : A. sparsepca
70. Which of the following is true about Residuals ?
Correct : A. lower is better
71. Overfitting is more likely when you have huge amount of data to train?
Correct : B. false
72. Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?
Correct : A. since the there is a relationship means our model is not good
73. Lets say, a Linear regression model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?
Correct : C. none of the above
74. In a linear regression problem, we are using R-squared to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?
Correct : C. individually r squared cannot tell about variable importance. we can't say anything about it right now.
75. Which of the one is true about Heteroskedasticity?
Correct : A. linear regression with varying error terms
76. Which of the following assumptions do we make while deriving linear regression parameters?1. The true relationship between dependent y and predictor x is linear2. The model errors are statistically independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The predictor x is non-stochastic and is measured error-free
Correct : D. all of above.
77. To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?
Correct : A. scatter plot
78. which of the following step / assumption in regression modeling impacts the trade- off between under-fitting and over-fitting the most.
Correct : A. the polynomial degree
79. Can we calculate the skewness of variables based on mean and median?
Correct : B. false
80. Which of the following is true about
Ridge or Lasso regression methods in case of feature selection?
Correct : B. lasso regression uses subset selection of features
81. Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R- Squared increases and Adjusted R-
Correct : A. 1 and 2
82. How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?
Correct : B. 2
83. Conditional probability is a measure of the probability of an event given that another event has already occurred.
Correct : A. true
84. What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. Its a similarity function
Correct : C. 1 and 2
85. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of its hyper parameter.What would happen when you use very small C (C~0)?
Correct : A. misclassification would happen
86. The cost parameter in the SVM means:
Correct : C. the tradeoff between misclassification and simplicity of the model
87. If you remove the non-red circled points from the data, the decision boundary will
Correct : B. false
88. How do you handle missing or corrupted data in a dataset?
Correct : D. all of the above
89. The SVMs are less effective when:
Correct : C. the data is noisy and contains overlapping points
90. If there is only a discrete number of possible outcomes called .
Correct : B. categories
91. Some people are using the term instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic.
Correct : A. inference
92. The term can be freely used, but with the same meaning adopted in physics or system theory.
Correct : D. prediction
93. Common deep learning applications / problems can also be solved using
Correct : B. classic approaches
94. Identify the various approaches for machine learning.
Correct : D. all above
95. what is the function of Unsupervised Learning?
Correct : D. all
96. What are the two methods used for the calibration in Supervised Learning?
Correct : A. platt calibration and isotonic regression
97. Which of the following are several models for feature extraction
Correct : C. none of the above
98. Let's say, a Linear regression model perfectly fits the training data (train error
Correct : C. none of the above
99. Which of the following assumptions do we make while deriving linear regression parameters?
1. The true relationship between dependent y and predictor x is linear
2. The model errors are statistically independent
3. The errors are normally distributed with a 0 mean and constant standard deviation
4. The predictor x is non-stochastic and is measured error-free
Correct : D. all of above.
100. Suppose we fit Lasso Regression to a data set, which has 100 features (X1,X2X100). Now, we rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with the same regularization parameter.Now, which of the following option will be correct?
Correct : B. it is more likely for x1 to be included in the model