Parallel Computer Architecture

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • How is a typical CPU organized?

  • How are parallel computers organized?

Objectives

To make full use of computing resources the programmer needs to have a working knowledge of the parallel programming concepts and tools available for writing parallel applications. To build a basic understanding of how parallel computing works, let’s start with an explanation of the components that comprise computers and their function.

Von Neumann’s Architecture

John von Neumann published a general description of a computer architecture in 1945. It describes the stored-program principle where computer memory is used to store both instructions and data. The von Neumann architecture is still an excellent model for the computer processors we use today.

Parallel computers still follow this basic design, just multiplied in units. A physical CPU chip in the 2020s may contain several von Neumann CPUs, which we usually call cores when we need to make the distinction. An individual computer might also contain more than one of these multi-core CPU chips. But essential layout of a modern core is the same as the von Neumann architecture shown above.

For the remainder of this course we may sometimes say “CPU” when we should more precisely say “core”, because the physical CPU chip is not the important component when thinking about parallel programming. The core executes instructions in sequential order, not in parallel. The core is the basic building block of parallel computing hardware.

Classifications of parallel computers

There are various academic classifications of parallel computer architectures. Here are two you can look up later if you’re interested:

  1. Flynn’s taxonomy (1966) is based on whether there are one or more instruction streams and one or more data streams.
  2. Feng’s classification (1972) is based on the number of bits than can be processed at one time.

Flynn’s taxonomy is probably the most widely known. It gives rise to terms like “Single-Instruction, Multiple-Data” (SIMD) and “Multiple-Instruction, Multiple-Data” (MIMD).

Memory organization

We can see from the von Neumann architecture that a processing core needs to get both data and instructions from memory, and write data back to that same memory. Engineering constraints mean that the processor can typically carry out instructions far faster than data can be read and written to main memory, so how memory is organized has important consequences for the design and execution of parallel programs.

There are currently two main ways to organize memory for parallel computing:

A hybrid of these two could be considered a third way.

Shared memory

Shared memory computers can be further sub-divided into two categories:

In a shared memory system it is only necessary to build a data structure in memory and pass references to the data structure to parallel subroutines. For example, a matrix multiplication routine that breaks matrices into a set of blocks only needs to pass the indices of each block to the parallel subroutines.

Advantages

Disadvantages

Distributed memory

Advantages

Disadvantages

Hybrid Distributed-Shared Memory

Practically all HPC computer systems today employ both shared and distributed memory architectures.

The important advantage is increased scalability. Increased complexity of programming is an important disadvantage.

Programming models are memory models

When designing a parallel solution to your problem, the major decision is usually “which memory model should we follow?”

You’ll learn more about these two major models in the OpenMP programming and MPI programming segments of this Summer School.

Counting CPUs

Imagine you’re looking at a product description from a computer vendor. You’ve determined that the computer has two CPU slots, and that each slot holds a 4-core Intel i7-6700 processor. How many “von Neumann CPUs” does this machine have?

Solution

The machine has 8 von Neumann CPUs, or more briefly, 8 cores.

The term “CPU” these days has two distinct meanings, depending on context. A hardware CPU is a physical object (e.g. an i7-6700) that will usually have several “cores”. A “core” corresponds to the “von Neumann CPU” we described earlier in this lesson. Consequently, when reading about software and parallel programming, you may see the term “CPU” used where the term “core” would be more correct.

If you followed the link to the chip description, you might have seem the term “hyperthreading”, which further confuses the matter. Hyperthreading is a proprietary Intel technology that uses software to try to double the processing capacity of each core. This works well for many low-intensity workloads, but for high-intensity parallel computing, performance is usually limited by the number of physical cores. So we don’t count the extra “logical cores” supposely provided by hyperthreading.

Key Points

  • Sequential computers, including modern CPU cores, resemble very well von Neumann’s 1945 design.

  • Parallel computers can be characterized as collections of von Neumann CPUs.

  • Parallel computers may be shared-memory, distributed-memory, or both.