Skip to content
Tommy Gorham edited this page May 20, 2022 · 18 revisions

Project Description

Welcome to the Modern CPU-GPU Wiki Page, an effort to delineate contemporary high-performance computing (HPC) practices on heterogeneous CPU-GPU system architectures. The goal of this project is to centralize the knowledge that is needed for programming manycore computer clusters with GPU accelerators using a C++ parallel programming model for performance portability called Kokkos.

This wiki page exists to aid in understanding the appropriate topics and real-world significance of this project. In the main directory I solve and analyze the performance of four principle parallel programming algorithms using C++ and Kokkos. Ideally, this project will result as a useful code base and reference by demonstrating how to write single-source, parallel programs that are able to compile for the CPU or the GPU without the need to rewrite the code.

To Make Use Of This Repository

For Intermediate Users: After reading though this page start here to go through the fundamental concepts in the wiki page.

For Advanced Users: or those that do not need an exposition, you can hop right in to the program source code here, the performance analysis here, or you can jump to the build instructions here to reproduce this analysis on a cluster of your own. Lastly, please do not hesitate to let me know how your results compare!

Essential Prerequisite

  • A firm understanding C++ or C

Useful Prerequisites

  • Preferably you have a device with a GPU or access to a cluster with GPU(s) to run the example code and compare performance
  • SSH and basic Linux commands if you are using a cluster
  • Parallel programming shared memory architectures with OpenMP (Kokkos is a shared memory programming model and the syntax is similar to that of OpenMP)
  • Using something like CUDA, OpenCL, HIP, etc for GPUs

The Problem

As modern HPC systems, namely supercomputers, increase in scale and heterogeneity, so does the difficulty of efficiently leveraging a broader range of diversified compute resources. Thus, the increase in the number of heterogenous system architectures (meaning those that contain at least one accelerator such as a GPU), along with the variance in the manufacturer of the GPU chip itself (e.g., NVIDIA, AMD, Intel, etc.) have created new obstacles for large-scale scientific applications. These obstacles are primarily regarding the applications' development, maintenance, and capability to effectively exploit diverse system architectures in order to achieve theoretical performance in a hardware agnostic way.

In Other Words

There is a need for portability: High performance computing applications currently have a large demand for solutions that can perform parallel execution on CPUs and/or GPUs, without having to rewrite thousands of lines of code to cope with the architecture of the system its running on.

There is a need for easily obtainable performance: "A major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices". Link to the source of this quote

An Intriguing Goal

Create a unified programming model to simplify modern HPC codes and solve the above two challenges by expressing generic parallelism via C++ implementations that are capable of leveraging any of the current and future heterogenous systems at compile-time.

A Modern Solution

By utilizing Kokkos, (a C/C++ Performance Portability Programming Model) designed by Sandia National Laboratories, we can address the problem of increasing heterogeneity seen in modern systems by abstracting diverse CPU and GPU targets without losing performance.

Next: What is HPC?

Clone this wiki locally