Shared Memory Architecture

In the previous speedup example, we used OpenMP, an API for writing multithreaded applications.

An important note when designing high performance applications is that the parallel algorithm needs to effectively utilize the compute resources that derive from the memory architecture of the physical machine. OpenMP helps us effectively utilize the shared memory compute resources on the CPU by allowing us to tell the computer to "execute this part of the code in parallel." In essence, OpenMP is a parallel programming model for CPU parallel execution.

In order to facilitate the development of appropriate solutions on numerous system architectures, parallel programming models exist as an abstraction for expressing algorithms with respect to the hardware. There are several parallel programming models that are currently used HPC systems.

programmingmodels

Both OpenMP and Kokkos are shared memory programming models designed for shared memory machines.

Shared memory machines are computers that are composed of multiple processing elements that share an address space. These are further divided into two classes:

Symmetric multiprocessor (SMP): a shared address space with “equal-time” access for each processor, and the OS treats every processor the same way.

uma

Non Uniform address space multiprocessor (NUMA): different memory regions have different access costs … think of memory segmented into “Near” and “Far” memory

numa

For this project, I ran my code on a NUMA shared memory compute node with two numa regions. You can view some of the machine specs and these numa regions from the Linux Shell Command lscpu as seen here

Features of a shared memory program

One process and lots of threads that communicate by sharing variables
Threads interact through reads/writes to a shared address space
OS scheduler decides when to run which threads
A race condition happens when the program's outcome changes as the threads are scheduled differently
Synchronization can be used to protect such conflicts and assure correct results, but it is expensive
Memory access patterns can be optimized to reduce the need for synchronization

Next: Heterogenous System Architectures

Wiki

Home

Fundamental Concepts

Getting Started with Kokkos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shared Memory Architecture

Features of a shared memory program

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally