These are the lecture notes of the Aalto University course CS-E4580 Programming Parallel Computers. The exercises and practical instructions are available here.
All modern computers have massively parallel processors. The CPU of a normal desktop or laptop computer may easily have hundreds of arithmetic operations in progress simultaneously:
However, the programmer has to be aware of these opportunities of parallelism and know how to exploit them. Otherwise your program may easily waste more than 99% of the computing power of your CPU.
The following figure highlights the importance of exploiting parallelism in practice, on a normal desktop computer with a modern 4-core Intel CPU. Here “V0” is a baseline solution that would look perfectly reasonable if we were programming an old-fashioned sequential computer. However, on a modern CPU it turns out that we are using only 2% of the theoretical single-core performance, and only 0.6% of the theoretical multi-core performance. We are leaving 99.4% of the computing power unused!
We will walk through this example in our lectures, and develop a sequence of faster solutions (V1, V2, …) that make a much better use of the vast computing resources that we have in a modern computer; V7 is 42 times faster than the baseline solution already on a single core, and 151 times faster when we use all 4 cores.