Skip to content

Shared Memory Architecture

Tommy Gorham edited this page May 18, 2022 · 3 revisions

In the previous speedup example, we used OpenMP, an API for writing multithreaded applications.

An important note when designing high performance applications is that the parallel algorithm needs to effectively utilize the compute resources that derive from the memory architecture of the physical machine. OpenMP helps us effectively utilize the shared memory compute resources on the CPU by allowing us to tell the computer to "execute this part of the code in parallel." In essence, OpenMP is a parallel programming model for CPU parallel execution.

In order to facilitate the development of appropriate solutions on numerous system architectures, parallel programming models exist as an abstraction for expressing algorithms with respect to the hardware. There are several parallel programming models that are currently used HPC systems.

programmingmodels

Both OpenMP and Kokkos are shared memory programming models designed for shared memory machines.

Shared memory machines are computers that are composed of multiple processing elements that share an address space. These are further divided into two classes:

  • Symmetric multiprocessor (SMP): a shared address space with “equal-time” access for each processor, and the OS treats every processor the same way.

uma

  • Non Uniform address space multiprocessor (NUMA): different memory regions have different access costs … think of memory segmented into “Near” and “Far” memory

numa

For this project, I ran my code on a NUMA shared memory compute node with two numa regions. You can view some of the machine specs and these numa regions from the Linux Shell Command lscpu as seen here

Features of a shared memory program

  • One process and lots of threads that communicate by sharing variables

  • Threads interact through reads/writes to a shared address space

  • OS scheduler decides when to run which threads

  • A race condition happens when the program's outcome changes as the threads are scheduled differently

  • Synchronization can be used to protect such conflicts and assure correct results, but it is expensive

  • Memory access patterns can be optimized to reduce the need for synchronization

Next: Heterogenous System Architectures

Clone this wiki locally