1. Data in ___________ bytes size is called Big Data.
Correct : C. Peta
2. How many V's of Big Data
Correct : D. 5
3. Transaction data of the bank is?
Correct : A. structured data
4. In how many forms BigData could be found?
Correct : B. 3
5. Which of the following are Benefits of Big Data Processing?
Correct : D. All of the above
6. Which of the following are incorrect Big Data Technologies?
Correct : D. Apache Pytarch
7. The overall percentage of the world’s total data has been created just within
the past two years is ?
Correct : C. 90%
8. Apache Kafka is an open-source platform that was created by?
Correct : A. LinkedIn
9. What was Hadoop named after?
Correct : C. The toy elephant of Cutting’s son
10. What are the main components of Big Data?
Correct : D. All of the above
11. All of the following accurately describe Hadoop, EXCEPT ____________
Correct : B. Real-time
12. __________ has the world’s largest Hadoop cluster.
Correct : C. Facebook
13. Facebook Tackles Big Data With _______ based on Hadoop.
Correct : A. Project Prism
14. ___________ is general-purpose computing model and runtime system for
distributed data analytics.
Correct : A. Mapreduce
15. The examination of large amounts of data to see what patterns or other
useful information can be found is known as
Correct : C. Big data analytics
16. Big data analysis does the following except?
Correct : D. Analyzes data
17. What makes Big Data analysis difficult to optimize?
Correct : B. Both data and cost effective ways to mine data to make business sense out of it
18. The new source of big data that will trigger a Big Data revolution in the
years to come is?
Correct : C. Transactional data and sensor data
19. The unit of data that flows through a Flume agent is
Correct : D. Event
20. Listed below are the three steps that are followed to deploy a Big Data
Solution except
Correct : B. Data dissemination
21. Who popularized bigdata term?
Correct : B. John Mashey
22. Numbers ,text, image, audio and video data is ____
Correct : D. Variety
23. Real time data is ______.
Correct : C. unique
24. ______ is the term that is used to describe data that is high volume , high
velocity and /or high variety.
Correct : B. Bigdata
25. According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like Hadoop?
Correct : A. Big data management and data mining
26. Point out the wrong statement.
Correct : C. The programming model, MapReduce, used by Hadoop is difficult to write and test
27. __________ can best be described as a programming model used to develop
Hadoop-based applications that can process massive amounts of data.
Correct : A. MapReduce
28. __________ has the world’s largest Hadoop cluster.
Correct : C. Facebook
29. Facebook Tackles Big Data With _______ based on Hadoop.
Correct : A. ‘Project Prism’
30. Data science is the process of diverse set of data through ?
Correct : D. All of the above
31. The modern conception of data science as an independent discipline is
sometimes attributed to?
Correct : A. William S.
32. Which of the following language is used in Data science?
Correct : C. R
33. Which of the following is false?
Correct : B. Raw data should be processed only one time.
34. What is the work of Data Architect?
Correct : C. build data solutions that are optimized for performance and design applications
35. Which of the following is correct skills for a Data Scientist?
Correct : D. All of the above
36. Which of the following are correct component for data science?
Correct : D. All of the above
37. Which of the following is not a part of data science process?
Correct : C. Communication Building
38. Which of the following are the Data Sources in data science?
Correct : C. Both A and B
39. Which of the following is not a application for data science?
Correct : D. Privacy Checker
40. Point out the correct statement.
Correct : A. Raw data is original source of data
41. Which of the following is one of the key data science skills?
Correct : D. All of the above
42. Which of the following is a key characteristic of a hacker?
Correct : B. Willing to find answers on their own
43. Raw data should be processed only one time.
Correct : B. False
44. Which of the following is the common goal of statistical modelling?
Correct : A. Inference
45. Causal analysis is commonly applied to census data.
Correct : B. False
46. Which of the following model is usually a gold standard for data analysis?
Correct : C. Causal
47. Which of the following is a revision control system?
Correct : A. Git
48. Which of the following step is performed by data scientist after acquiring
the data?
Correct : A. Data Cleaning
49. Which of the following focuses on the discovery of (previously) unknown
properties on the data?
Correct : A. Data mining
50. Which of the following can be used to create sub–samples using a maximum
dissimilarity approach?
Correct : B. maxDissim
51. Which of the following can be used to impute data sets based only on information
in the training set?
Correct : B. preProcess
52. Which of the following model model include a backwards elimination feature
selection routine?
Correct : B. MARS
53. Which of the following is a categorical outcome?
Correct : C. Accuracy
54. Which of the following function provides unsupervised prediction ?
Correct : D. None of the Mentioned
55. What is true about Machine Learning?
Correct : D. All of the above
56. ML is a field of AI consisting of learning algorithms that?
Correct : D. All of the above
57. p → 0q is not a?
Correct : B. horn clause
58. The action _______ of a robot arm specify to Place block A on block B.
Correct : A. STACK(A,B)
59. A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.
Correct : C. top-down parser
60. A model of language consists of the categories which does not include
________.
Correct : B. structural units.
61. Different learning methods does not include?
Correct : A. Introduction
62. The model will be trained with data in one single batch is known as ?
Correct : C. Both A and B
63. Which of the following are ML methods?
Correct : A. based on human supervision
64. In Model based learning methods, an iterative process takes place on the
ML models that are built based on various model parameters, called ?
Correct : C. hyperparameters
65. Which of the following is a widely used and effective machine learning
algorithm based on the idea of bagging?
Correct : D. Random Forest
66. To find the minimum or the maximum of a function, we set the gradient to
zero because:
Correct : A. The value of the gradient at extrema of a function is always zero
67. Which of the following is a disadvantage of decision trees?
Correct : C. Decision trees are prone to be overfit
68. How do you handle missing or corrupted data in a dataset?
Correct : D. All of the above
69. When performing regression or classification, which of the following is the
correct way to preprocess the data?
Correct : A. Normalize the data -> PCA -> training
70. Which of the following statements about regularization is not correct?
Correct : D. None of the above
71. Which of the following techniques can not be used for normalization in
text mining?
Correct : C. Stop Word Removal
72. In which of the following cases will K-means clustering fail to give good results?
1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
Correct : D. All of the above
73. Which of the following is a reasonable way to select the number of principal components "k"?
Correct : A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
74. What is a sentence parser typically used for?
Correct : B. It is used to parse sentences to derive their most likely syntax tree structures.
75. Data Analysis is a process of?
Correct : D. All of the above
76. Which of the following is not a major data analysis approaches?
Correct : B. Predictive Intelligence
77. How many main statistical methodologies are used in data analysis?
Correct : A. 2
78. In descriptive statistics, data from the entire population or a sample is
summarized with ?
Correct : C. numerical descriptors
79. Data Analysis is defined by the statistician?
Correct : D. John Tukey
80. Which of the following is true about hypothesis testing?
Correct : A. answering yes/no questions about the data
81. The goal of business intelligence is to allow easy interpretation of large
volumes of data to identify new opportunities.
Correct : A. TRUE
82. The branch of statistics which deals with development of particular
statistical methods is classified as
Correct : D. applied statistics
83. Which of the following is true about regression analysis?
Correct : C. modeling relationships within the data
84. Text Analytics, also referred to as Text Mining?
Correct : A. TRUE
85. What is true about Data Visualization?
Correct : D. All of the above
86. Data can be visualized using?
Correct : D. All of the above
87. Data visualization is also an element of the broader _____________.
Correct : B. data presentation architecture
88. Which method shows hierarchical data in a nested format?
Correct : A. Treemaps
89. Which is used to inference for 1 proportion using normal approx?
Correct : D. prop.test()
90. Which is used to find the factor congruence coefficients?
Correct : C. factor.congruence
91. Which of the following is tool for checking normality?
Correct : A. qqline()
92. Which of the following is false?
Correct : C. Data visualization decrease the insights and take solwer decisions
93. Common use cases for data visualization include?
Correct : D. All of the above
94. Which of the following plots are often used for checking randomness in
time series?
Correct : C. Autocorrelation
95. To find the minimum or the maximum of a function, we set the gradient to zero because:
Correct : A. The value of the gradient at extrema of a function is always zero
96. Which of the following techniques can not be used for normalization in text mining?
Correct : C. Stop Word Removal
97. In which of the following cases will K-means clustering fail to give good
results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
Correct : D. All of the above
98. Which of the following is a reasonable way to select the number of
principal components "k"?
Correct : A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
99. Which of the following is false?
Correct : B. Raw data should be processed only one time.
100. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?