Quiznetik

Machine Learning (ML) | Set 1

1. Application of machine learning methods to large databases is called

Correct : A. data mining.

2. If machine learning model output involves target variable then that model is called as

Correct : B. predictive model

3. In what type of learning labelled training data is used

Correct : B. supervised learning

4. In following type of feature selection method we start with empty feature set

Correct : A. forward feature selection

5. In PCA the number of input dimensiona are equal to principal components

Correct : A. true

6. PCA can be used for projecting and visualizing data in lower dimensions.

Correct : A. true

7. Which of the following is the best machine learning method?

Correct : D. all of the above

8. What characterize unlabeled examples in machine learning

Correct : D. there is plenty of confusing knowledge

9. What does dimensionality reduction reduce?

Correct : B. collinerity

10. Data used to build a data mining model.

Correct : A. training data

11. The problem of finding hidden structure in unlabeled data is called…

Correct : B. unsupervised learning

12. Of the Following Examples, Which would you address using an supervised learning Algorithm?

Correct : A. given email labeled as spam or not spam, learn a spam filter

13. Dimensionality Reduction Algorithms are one of the possible ways to reduce the computation time required to build a model

Correct : A. true

14. You are given reviews of few netflix series marked as positive, negative and neutral. Classifying reviews of a new netflix series is an example of

Correct : A. supervised learning

15. Which of the following is a good test dataset characteristic?

Correct : C. both a and b

16. Following are the types of supervised learning

Correct : D. all of the above

17. Type of matrix decomposition model is

Correct : A. descriptive model

18. Following is powerful distance metrics used by Geometric model

Correct : C. both a and b??

19. The output of training process in machine learning is

Correct : A. machine learning model

20. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Here feature type is

Correct : B. ordinal

21. PCA is

Correct : C. feature extraction

22. Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model.

Correct : A. true

23. Which of the following techniques would perform better for reducing dimensions of a data set?

Correct : A. removing columns which have too many missing values

24. Supervised learning and unsupervised clustering both require which is correct according to the statement.

Correct : C. input attribute.

25. What characterize is hyperplance in geometrical model of machine learning?

Correct : B. a plane with 2 dimensional fewer than number of input attributes

26. Like the probabilistic view, the ________ view allows us to associate a probability of membership with each classification.

Correct : D. inductive

27. Database query is used to uncover this type of knowledge.

Correct : D. multidimensional

28. A person trained to interact with a human expert in order to capture their knowledge.

Correct : D. knowledge extractor

29. Some telecommunication company wants to segment their customers into distinct groups ,this is an example of

Correct : C. unsupervised learning

30. In the example of predicting number of babies based on stork's population ,Number of babies is

Correct : A. outcome

31. Which learning Requires Self Assessment to identify patterns within data?

Correct : A. unsupervised learning

32. Select the correct answers for following statements. 1. Filter methods are much faster compared to wrapper methods. 2. Wrapper methods use statistical methods for evaluation of a subset of features while Filter methods use cross validation.

Correct : B. 1 is true and 2 is false

33. The "curse of dimensionality" referes

Correct : A. all the problems that arise when working with data in the higher dimensions, that did not exist in the lower dimensions.

34. In simple term, machine learning is

Correct : C. both a and b??

35. If machine learning model output doesnot involves target variable then that model is called as

Correct : A. descriptive model

36. Following are the descriptive models

Correct : D. both a and c

37. Different learning methods does not include?

Correct : D. introduction

38. A measurable property or parameter of the data-set is

Correct : B. feature

39. Feature can be used as a

Correct : C. both a and b??

40. It is not necessary to have a target variable for applying dimensionality reduction algorithms

Correct : A. true

41. The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA? 1. PCA is an unsupervised method2. It searches for the directions that data have the largest variance3. Maximum number of principal components <= number of features4. All principal components are orthogonal to each other

Correct : D. all of the above

42. Which of the following is a reasonable way to select the number of principal components "k"?

Correct : A. choose k to be the smallest value so that at least 99% of the varinace is retained. - answer

43. Which of the folllowing is an example of feature extraction?

Correct : B. applying pca to project high dimensional data

44. Prediction is

Correct : A. the result of application of specific theory or rule in a specific case

45. You are given sesimic data and you want to predict next earthquake , this is an example of

Correct : A. supervised learning

46. PCA works better if there is 1. A linear structure in the data 2. If the data lies on a curved surface and not on a flat surface 3. If variables are scaled in the same unit

Correct : C. 1 and 3

47. A student Grade is a variable F1 which takes a value from A,B,C and D. Which of the following is True in the following case?

Correct : B. variable f1 is an example of ordinal variable

48. What can be major issue in Leave-One-Out-Cross-Validation(LOOCV)?

Correct : B. high variance

49. Imagine a Newly-Born starts to learn walking. It will try to find a suitable policy to learn walking after repeated falling and getting up.specify what type of machine learning is best suited?

Correct : D. reinforcement learning

50. Support Vector Machine is

Correct : C. geometric model

51. In multiclass classification number of classes must be

Correct : C. greater than two

52. Which of the following can only be used when training data are linearlyseparable?

Correct : A. linear hard-margin svm

53. Impact of high variance on the training set ?

Correct : A. overfitting

54. What do you mean by a hard margin?

Correct : A. the svm allows very low error in classification

55. The effectiveness of an SVM depends upon:

Correct : A. selection of kernel

56. What are support vectors?

Correct : C. all of the above

57. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.

Correct : A. true

58. What is the purpose of the Kernel Trick?

Correct : A. to transform the data from nonlinearly separable to linearly separable

59. Which of the following can only be used when training data are linearlyseparable?

Correct : A. linear hard-margin svm

60. The firing rate of a neuron

Correct : B. is more analogous to the output of a unit in a neural net than the output voltage of the neuron

61. Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target?

Correct : D. mean-squared-error

62. The cost parameter in the SVM means:

Correct : C. the tradeoff between misclassification and simplicity of the model

63. The kernel trick

Correct : D. exploits the fact that in many learning algorithms, the weights can be written as a linear combination of input points

64. How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinary least squares regression?

Correct : C. ridge has larger bias, smaller variance

65. Which of the following are real world applications of the SVM?

Correct : D. all of the above

66. How can SVM be classified?

Correct : C. it is a model trained using supervised learning. it can be used for classification and regression.

67. Which of the following can help to reduce overfitting in an SVM classifier?

Correct : A. use of slack variables

68. Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. Which of the following is best option would you more likely to consider iterating SVM next time?

Correct : C. you will try to calculate more variables

69. What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space 2. It’s a similarity function

Correct : C. 1 and 2

70. You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which is false.

Correct : B. this is an instance of underfitting

71. Suppose your model is demonstrating high variance across the different training sets. Which of the following is NOT valid way to try and reduce the variance?

Correct : B. improve the optimization algorithm being used for error minimization.

72. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

Correct : B. the model would consider only the points close to the hyperplane for modeling

73. We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other 2. Some times, feature normalization is not feasible in case of categorical variables 3. Feature normalization always helps when we use Gaussian kernel in SVM

Correct : B. 1 and 2

74. Wrapper methods are hyper-parameter selection methods that

Correct : C. are useful mainly when the learning machines are “black boxes”

75. Which of the following methods can not achieve zero training error on any linearly separable dataset?

Correct : B. 15-nearest neighbors

76. Suppose we train a hard-margin linear SVM on n > 100 data points in R2, yielding a hyperplane with exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)?

Correct : D. n+1

77. Let S1 and S2 be the set of support vectors and w1 and w2 be the learnt weight vectors for a linearly separable problem using hard and soft margin linear SVMs respectively. Which of the following are correct?

Correct : B. s1 may not be a subset of s2

78. Which statement about outliers is true?

Correct : C. the nature of the problem determines how outliers are used

79. If TP=9 FP=6 FN=26 TN=70 then Error rate will be

Correct : C. 28 percentage

80. Imagine, you are solving a classification problems with highly imbalanced class. The majority class is observed 99% of times in the training data. Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in such a case? 1. Accuracy metric is not a good idea for imbalanced class problems. 2.Accuracy metric is a good idea for imbalanced class problems. 3.Precision and recall metrics are good for imbalanced class problems. 4.Precision and recall metrics aren’t good for imbalanced class problems.

Correct : A. 1 and 3

81. he minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM’s?

Correct : A. large datasets

82. Perceptron Classifier is

Correct : C. supervised learning algorithm

83. Type of dataset available in Supervised Learning is

Correct : B. labeled dataset

84. which among the following is the most appropriate kernel that can be used with SVM to separate the classes.

Correct : B. gaussian rbf kernel

85. The SVMs are less effective when

Correct : C. the data is noisy and contains overlapping points

86. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

Correct : B. the model would consider only the points close to the hyperplane for modeling

87. What is the precision value for following confusion matrix of binary classification?

Correct : B. 0.09

88. Which of the following are components of generalization Error?

Correct : C. both of them

89. Which of the following is not a kernel method in SVM?

Correct : A. linear kernel

90. During the treatement of cancer patients , the doctor needs to be very careful about which patients need to be given chemotherapy.Which metric should we use in order to decide the patients who should given chemotherapy?

Correct : A. precision

91. Which one of the following is suitable? 1. When the hypothsis space is richer, overfitting is more likely. 2. when the feature space is larger , overfitting is more likely.

Correct : C. true,true

92. Which of the following is a categorical data?

Correct : A. branch of bank

93. The soft margin SVM is more preferred than the hard-margin SVM when-

Correct : B. the data is noisy and contains overlapping points

94. In SVM which has quadratic kernel function of polynomial degree 2 that has slack variable C as one hyper paramenter. What would happen if we use very large value for C

Correct : A. we can still classify the data correctly for given setting of hyper parameter c

95. In SVM, RBF kernel with appropriate parameters to perform binary classification where the data is non-linearly seperable. In this scenario

Correct : B. the decision boundry in the transformed feature space in linear

96. Which of the following is true about SVM? 1. Kernel function map low dimensional data to high dimensional space. 2. It is a similarity Function

Correct : C. 1 is true, 2 is true

97. What is the Accuracy in percentage based on following confusion matrix of three class classification. Confusion Matrix C= [14 0 0] [ 1 15 0] [ 0 0 6]

Correct : B. 0.97

98. Which of the following method is used for multiclass classification?

Correct : A. one vs rest

99. Based on survey , it was found that the probability that person like to watch serials is 0.25 and the probability that person like to watch netflix series is 0.43. Also the probability that person like to watch serials and netflix sereis is 0.12. what is the probability that a person doesn't like to watch either?

Correct : C. 0.44

100. A machine learning problem involves four attributes plus a class. The attributes have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many maximum possible different examples are there?

Correct : D. 72