1. A CUDA program is comprised of two primary components: a host and a _____.
Correct : A. gpu??kernel
2. The kernel code is dentified by the ________qualifier with void return type
Correct : B. __global__??
3. The kernel code is only callable by the host
Correct : A. true
4. The kernel code is executable on the device and host
Correct : B. false
5. Calling a kernel is typically referred to as _________.
Correct : D. kernel invocation
6. Host codes in a CUDA application can Initialize a device
Correct : A. true
7. Host codes in a CUDA application can Allocate GPU memory
Correct : A. true
8. Host codes in a CUDA application can not Invoke kernels
Correct : B. false
9. CUDA offers the Chevron Syntax to configure and execute a kernel.
Correct : A. true
10. the BlockPerGrid and ThreadPerBlock parameters are related to the ________ model supported by CUDA.
Correct : C. thread??abstraction
11. _________ is Callable from the device only
Correct : C. _device_
12. ______ is Callable from the host
Correct : B. __global__??
13. ______ is Callable from the host
Correct : A. _host_
14. CUDA supports ____________ in which code in a single thread is executed by all other threads.
Correct : C. thread abstraction
15. In CUDA, a single invoked kernel is referred to as a _____.
Correct : C. grid
16. A grid is comprised of ________ of threads.
Correct : A. block
17. A block is comprised of multiple _______.
Correct : A. treads
18. a solution of the problem in representing the parallelismin algorithm is
Correct : D. cuda
19. Host codes in a CUDA application can not Reset a device
Correct : B. false
20. Host codes in a CUDA application can Transfer data to and from the device
Correct : A. true
21. Host codes in a CUDA application can not Deallocate memory on the GPU
Correct : B. false
22. Any condition that causes a processor to stall is called as _____.
Correct : A. hazard
23. The time lost due to branch instruction is often referred to as _____.
Correct : C. branch penalty
24. _____ method is used in centralized systems to perform out of order execution.
Correct : B. score boarding
25. The computer cluster architecture emerged as an alternative for ____.
Correct : C. super computers
26. NVIDIA CUDA Warp is made up of how many threads?
Correct : D. 32
27. Out-of-order instructions is not possible on GPUs.
Correct : B. false
28. CUDA supports programming in ....
Correct : C. c, c++, third party wrappers for java, python, and more
29. FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.
Correct : A. 32-bit ieee floating point instructions
30. Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).
Correct : D. 8
31. Each NVIDIA GPU has ------ Streaming Multiprocessors
Correct : D. 16
32. CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.
Correct : B. “zero-overhead”, 1 clock
33. Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation.
Correct : B. simt (single instruction multiple thread)
34. Limitations of CUDA Kernel
Correct : B. no recursion, no call stack, no static variable declarations
35. What is Unified Virtual Machine
Correct : A. it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
36. _______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.
Correct : C. cuda c, gpus.
37. The CUDA architecture consists of --------- for parallel computing kernels and functions.
Correct : D. ptx instruction set architecture
38. CUDA stands for --------, designed by NVIDIA.
Correct : C. compute unified device architecture
39. The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false.
Correct : A. true
40. The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.
Correct : A. 128, 256, 512
41. NVIDIA 8-series GPUs offer -------- .
Correct : A. 50-200 gflops
42. IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.
Correct : B. 32-bit integer instructions
43. CUDA Hardware programming model supports:
a) fully generally data-parallel archtecture;
b) General thread launch;
c) Global load-store;
d) Parallel data cache;
e) Scalar architecture;
f) Integers, bit operation
Correct : D. a,b,c,d,e,f
44. In CUDA memory model there are following memory types available:
a) Registers;
b) Local Memory;
c) Shared Memory;
d) Global Memory;
e) Constant Memory;
f) Texture Memory.
Correct : C. a, b, c, d, e, f
45. What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World!\n"); return 0; }
Correct : B. __global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
46. Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; }
Correct : A. a
47. A simple kernel for adding two integers: __global__ void add( int *a, int *b, int *c ) { *c = *a + *b; } where __global__ is a CUDA C keyword which indicates that:
Correct : A. add() will execute on device, add() will be called from host
48. If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:
Correct : C. cudamalloc( (void**) &dev_a, sizeof( int ) )
49. If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:
Correct : B. cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
50. Triple angle brackets mark in a statement inside main function, what does it indicates?
Correct : A. a call from host code to device code
51. What makes a CUDA code runs in parallel
Correct : D. first parameter value inside triple angle bracket (n) indicates excecution of kernel n times in parallel
52. In ___________, the number of elements to be sorted is small enough to fit into the process's main memory.
Correct : A. internal sorting
53. ______________ algorithms use auxiliary storage (such as tapes and hard disks) for sorting because the number of elements to be sorted is too large to fit into memory.
Correct : C. external sorting
54. ______ can be comparison-based or noncomparison-based.
Correct : B. sorting
55. The fundamental operation of comparison-based sorting is ________.
Correct : A. compare-exchange
56. The complexity of bubble sort is Θ(n2).
Correct : A. true
57. Bubble sort is difficult to parallelize since the algorithm has no concurrency.
Correct : A. true
58. Quicksort is one of the most common sorting algorithms for sequential computers because of its simplicity, low overhead, and optimal average complexity.
Correct : A. true
59. The performance of quicksort depends critically on the quality of the ______-.
Correct : B. pivot
60. the complexity of quicksort is O(nlog n).
Correct : A. true
61. The main advantage of ______ is that its storage requirement is linear in the depth of the state space being searched.
Correct : B. dfs
62. _____ algorithms use a heuristic to guide search.
Correct : A. bfs
63. If the heuristic is admissible, the BFS finds the optimal solution.
Correct : A. true
64. The search overhead factor of the parallel system is defined as the ratio of the work done by the parallel formulation to that done by the sequential formulation
Correct : A. true
65. The critical issue in parallel depth-first search algorithms is the distribution of the search space among the processors.
Correct : A. true
66. Graph search involves a closed list, where the major operation is a _______
Correct : C. lookup
67. Breadth First Search is equivalent to which of the traversal in the Binary Trees?
Correct : C. level-order traversal
68. Time Complexity of Breadth First Search is? (V – number of vertices, E – number of edges)
Correct : A. o(v + e)
69. Which of the following is not an application of Breadth First Search?
Correct : B. when the graph is a linked list
70. In BFS, how many times a node is visited?
Correct : C. equivalent to number of indegree of the node
71. Is Best First Search a searching algorithm used in graphs.
Correct : A. true
72. Which of the following is not a stable sorting algorithm in its typical implementation.
Correct : C. quick sort
73. Which of the following is not true about comparison based sorting algorithms?
Correct : D. heap sort is not a comparison based sorting algorithm.
74. mathematically efficiency is
Correct : A. e=s/p
75. Cost of a parallel system is sometimes referred to____ of product
Correct : C. both
76. Scaling Characteristics of Parallel Programs Ts is
Correct : B. constant
77. Speedup tends to saturate and efficiency _____ as a consequence of Amdahl’s law.
Correct : C. decreases
78. Speedup obtained when the problem size is _______ linearly
with the number of processing elements.
Correct : A. increase
79. The n × n matrix is partitioned among n processors, with each processor storing complete ___ of the matrix.
Correct : A. row
80. cost-optimal parallel systems have an efficiency of ___
Correct : A. 1
81. The n × n matrix is partitioned among n2 processors such that each processor owns a _____ element.
Correct : C. single
82. how many basic communication operations are used in matrix vector multiplication
Correct : C. 3
83. In DNS algorithm of matrix multiplication it used
Correct : C. 3d partition
84. In the Pipelined Execution, steps contain
Correct : D. all
85. the cost of the parallel algorithm is higher than the sequential run time by a factor of __
Correct : A. 2020-03-02 00:00:00
86. The load imbalance problem in Parallel Gaussian Elimination: can be alleviated by using a ____ mapping
Correct : B. cyclic
87. A parallel algorithm is evaluated by its runtime in function of
Correct : D. all
88. For a problem consisting of W units of work, p__W processors can be used optimally.
Correct : A. <=
89. C(W)__Θ(W) for optimality (necessary condition).
Correct : D. equals
90. many interactions in oractical parallel programs occur in _____ pattern
Correct : A. well defined
91. efficient implementation of basic communication operation can improve
Correct : A. performance
92. efficient use of basic communication operations can reduce
Correct : A. development effort and
93. Group communication operations are built using_____ Messenging primitives.
Correct : A. point-to-point
94. one processor has a piece of data and it need to send to everyone is
Correct : A. one -to-all
95. the dual of one -to-all is
Correct : A. all-to-one reduction
96. Data items must be combined piece-wise and the result made available at
Correct : A. target processor finally
97. wimpleat way to send p-1 messages from source to the other p-1 processors
Correct : C. concurrency
98. In a eight node ring, node ____ is source of broadcast
Correct : D. 0
99. The processors compute ______ product of the vector element and the loval matrix