Introduction
|
Parallel computing is much better suited for modelling, simulating and understanding complex, real-world phenomena.
Modern computers have several levels of parallelism
|
Parallel Computer Architecture
|
Sequential computers, including modern CPU cores, resemble very well von Neumann’s 1945 design.
Parallel computers can be characterized as collections of von Neumann CPUs.
Parallel computers may be shared-memory, distributed-memory, or both.
|
Parallel Programming Models
|
There are many layers of parallelism in modern computer systems
An application can implement any or all of vectorization, multithreading, and message passing
|
Performance and Scalability
|
An increase of the number of processors usually leads to a decrease in efficiency.
An increase of problem size usually leads to an increase in efficiency.
A parallel problem can often be solved efficiently by increasing the number of processors and the problem size simultaneously. This is called “weak scaling”.
Not every problem is amenable to weak scaling.
|
Independent Tasks and Job Schedulers
|
|
Input and Output
|
|
Analyzing Performance Using a Profiler
|
Don’t start to parallelize or optimize your code without having used a profiler first.
A programmer can easily spend many hours of work “optimizing” a part of the code which eventually speeds up the program by only a minuscule amount.
When viewing the profiler report, look for areas where the largest amounts of CPU time are spent, working your way down.
Pay special attention to areas that you didn’t expect to be slow.
|
Thinking in Parallel
|
Adapting a sequential code so it will run efficiently parallel needs both planning and experimentation.
It is vital to first understand both the problem and the sequential algorithm.
Shorter independent tasks need more overall communication.
Longer tasks and large variations in task-length can cause resources to be left unused.
Domain Decomposition can be used in many cases to reduce communication by processing short-range interactions locally.
There are many textbooks and publications that describe different parallel algorithms. Try finding existing solutions for similar problems.
|