Quiznetik

Machine Learning (ML) | Set 7

1. The average squared difference between classifier predicted output and actual output.

Correct : A. mean squared error

2. Simple regression assumes a __________ relationship between the input attribute and output attribute.

Correct : A. linear

3. Regression trees are often used to model _______ data.

Correct : B. nonlinear

4. The leaf nodes of a model tree are

Correct : C. linear regression equations.

5. Logistic regression is a ________ regression technique that is used to model data having a _____outcome.

Correct : D. nonlinear, binary

6. This technique associates a conditional probability value with each data instance.

Correct : B. logistic regression

7. This supervised learning technique can process both numeric and categorical input attributes.

Correct : A. linear regression

8. With Bayes classifier, missing data items are

Correct : B. treated as unequal compares.

9. This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.

Correct : D. k-means clustering

10. This clustering algorithm initially assumes that each data instance represents a single cluster.

Correct : C. k-means clustering

11. This unsupervised clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration.

Correct : C. k-means clustering

12. Machine learning techniques differ from statistical techniques in that machine learning methods

Correct : B. are better able to deal with missing and noisy data.

13. In reinforcement learning if feedback is negative one it is defined as____.

Correct : A. Penalty

14. According to____ , it’s a key success factor for the survival and evolution of all species.

Correct : C. Darwin’s theory

15. What is ‘Training set’?

Correct : B. A set of data is used to discover the potentially predictive relationship.

16. Common deep learning applications include____

Correct : D. All above

17. Reinforcement learning is particularly efficient when______________.

Correct : D. All above

18. if there is only a discrete number of possible outcomes (called categories), the process becomes a______.

Correct : B. Classification.

19. Which of the following are supervised learning applications

Correct : A. Spam detection, Pattern detection, Natural Language Processing

20. During the last few years, many ______ algorithms have been applied to deep neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state.

Correct : D. None of above

21. What is ‘Overfitting’ in Machine learning?

Correct : A. when a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs.

22. What is ‘Test set’?

Correct : A. Test set is used to test the accuracy of the hypotheses generated by the learner.

23. ________is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value

Correct : B. Creating sub-model to predict those features

24. How it's possible to use a different placeholder through the parameter_______.

Correct : D. missing_values

25. If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class________.

Correct : A. RobustScaler

26. scikit-learn also provides a class for per-sample normalization, Normalizer. It can apply________to each element of a dataset

Correct : B. max, l1 and l2 norms

27. There are also many univariate methods that can be used in order to select the best features according to specific criteria based on________.

Correct : A. F-tests and p-values

28. ________performs a PCA with non-linearly separable data sets.

Correct : B. KernelPCA

29. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Which of the following statement is true in following case?

Correct : B. Feature F1 is an example of ordinal variable.

30. The parameter______ allows specifying the percentage of elements to put into the test/training set

Correct : C. All above

31. In many classification problems, the target ______ is made up of categorical labels which cannot immediately be processed by any algorithm.

Correct : B. dataset

32. _______adopts a dictionary-oriented approach, associating to each category label a progressive integer number.

Correct : A. LabelEncoder class

33. Function used for linear regression in R is __________

Correct : A. lm(formula, data)

34. In syntax of linear model lm(formula,data,..), data refers to ______

Correct : B. Vector

35. Which of the following methods do we use to find the best fit line for data in Linear Regression?

Correct : A. Least Square Error

36. Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable?

Correct : D. Mean-Squared-Error

37. Which of the following is true about Residuals ?

Correct : A. Lower is better

38. Naive Bayes classifiers are a collection ------------------of algorithms

Correct : A. Classification

39. Naive Bayes classifiers is _______________ Learning

Correct : A. Supervised

40. Features being classified is independent of each other in Naïve Bayes Classifier

Correct : B. true

41. Features being classified is __________ of each other in Naïve Bayes Classifier

Correct : A. Independent

42. Conditional probability is a measure of the probability of an event given that another event has already occurred.

Correct : A. True

43. Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Correct : A. True

44. Bernoulli Naïve Bayes Classifier is ___________distribution

Correct : C. Binary

45. Multinomial Naïve Bayes Classifier is ___________distribution

Correct : B. Discrete

46. Gaussian Naïve Bayes Classifier is ___________distribution

Correct : A. Continuous

47. Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.

Correct : A. True

48. Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature values.

Correct : A. Mean

49. SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = 􀀀1jx)

Correct : B. false

50. Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.

Correct : A. True

51. Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel) might lead to overfitting

Correct : A. True

52. SVM is a ------------------ algorithm

Correct : A. Classification

53. SVM is a ------------------ learning

Correct : A. Supervised

54. The linear SVM classifier works by drawing a straight line between two classes

Correct : A. True

55. What is Model Selection in Machine Learning?

Correct : A. The process of selecting models among different mathematical models, which are used to describe the same data set

56. Which are two techniques of Machine Learning ?

Correct : A. Genetic Programming and Inductive Learning

57. Even if there are no actual supervisors ________ learning is also based on feedback provided by the environment

Correct : B. Reinforcement

58. When it is necessary to allow the model to develop a generalization ability and avoid a common problem called______.

Correct : A. Overfitting

59. Techniques involve the usage of both labeled and unlabeled data is called___.

Correct : B. Semi-supervised

60. A supervised scenario is characterized by the concept of a _____.

Correct : B. Teacher

61. overlearning causes due to an excessive ______.

Correct : A. Capacity

62. Which of the following are several models for feature extraction

Correct : C. None of the above

63. _____ provides some built-in datasets that can be used for testing purposes.

Correct : A. scikit-learn

64. While using _____ all labels are turned into sequential numbers.

Correct : A. LabelEncoder class

65. _______produce sparse matrices of real numbers that can be fed into any machine learning model.

Correct : C. Both A & B

66. scikit-learn offers the class______, which is responsible for filling the holes using a strategy based on the mean, median, or frequency

Correct : D. Imputer

67. Which of the following scale data by removing elements that don't belong to a given range or by considering a maximum absolute value.

Correct : C. Both A & B

68. scikit-learn also provides a class for per-sample normalization,_____

Correct : A. Normalizer

69. ______dataset with many features contains information proportional to the independence of all features and their variance.

Correct : B. unnormalized

70. In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the_____.

Correct : D. Covariance matrix

71. The_____ parameter can assume different values which determine how the data matrix is initially processed.

Correct : C. init

72. ______allows exploiting the natural sparsity of data while extracting principal components.

Correct : A. SparsePCA

73. Which of the following statement is true about outliers in Linear regression?

Correct : A. Linear regression is sensitive to outliers

74. Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?

Correct : A. Since the there is a relationship means our model is not good

75. Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?

Correct : C. None of the above

76. In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?

Correct : C. Individually R squared cannot tell about variable importance. We can’t say anything about it right now.

77. To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?

Correct : A. Scatter plot

78. which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most.

Correct : A. The polynomial degree

79. Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?

Correct : B. Lasso regression uses subset selection of features

80. Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and Adjusted R-squared increases

Correct : A. 1 and 2

81. What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function

Correct : C. 1 and 2

82. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.What would happen when you use very small C (C~0)?

Correct : A. Misclassification would happen

83. The cost parameter in the SVM means:

Correct : C. The tradeoff between misclassification and simplicity of the model

84. How do you handle missing or corrupted data in a dataset?

Correct : D. d. All of the above

85. Which of the following statements about Naive Bayes is incorrect?

Correct : B. Attributes are statistically dependent of one another given the class value.

86. The SVM’s are less effective when:

Correct : C. The data is noisy and contains overlapping points

87. If there is only a discrete number of possible outcomes called _____.

Correct : B. Categories

88. Some people are using the term ___ instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic.

Correct : A. Inference

89. The term _____ can be freely used, but with the same meaning adopted in physics or system theory.

Correct : D. Prediction

90. Common deep learning applications / problems can also be solved using____

Correct : B. Classic approaches

91. what is the function of ‘Unsupervised Learning’?

Correct : D. All

92. What are the two methods used for the calibration in Supervised Learning?

Correct : A. Platt Calibration and Isotonic Regression

93. Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100).  Now, we rescale one of these feature by multiplying with 10 (say that feature is X1),  and then refit Lasso regression with the same regularization parameter.Now, which of the following option will be correct?

Correct : B. It is more likely for X1 to be included in the model

94. Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature selection?

Correct : B. Lasso regression uses subset selection of features

95. Which of the following statement(s) can be true post adding a variable in a linear regression model? 1. R-Squared and Adjusted R-squared both increase 2. R-Squared increases and Adjusted R-squared decreases 3. R-Squared decreases and Adjusted R-squared decreases 4. R-Squared decreases and Adjusted R-squared increases

Correct : A. 1 and 2

96. We can also compute the coefficient of linear regression with the help of an analytical method called “Normal Equation”. Which of the following is/are true about “Normal Equation”? 1. We don’t have to choose the learning rate 2. It becomes slow when number of features is very large 3. No need to iterate

Correct : D. 1,2 and 3.

97. If two variables are correlated, is it necessary that they have a linear relationship?

Correct : B. No

98. When the C parameter is set to infinite, which of the following holds true?

Correct : A. The optimal hyperplane if exists, will be the one that completely separates the data

99. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.What would happen when you use very large value of C(C->infinity)?

Correct : A. We can still classify data correctly for given setting of hyper parameter C

100. SVM can solve linear and non-linear problems

Correct : A. true