Overview of Kokkos

The Goal for Modern Heterogenous CPU-GPU Computing is performance portability

Using Kokkos enables us to write portable, single-source parallel implementations of scientific algorithms looking to attain high performance by making use of multiple computational resources with different computation units.

Why?

As modern HPC systems, namely supercomputers, increase in scale and heterogeneity, so does the difficulty of efficiently leveraging a broader range of diversified compute resources. Thus, the increase in the number of heterogenous system architectures (meaning those that contain at least one accelerator such as a GPU), along with the variance in the manufacturer of the GPU chip itself (e.g., NVIDIA, AMD, Intel, etc.) have created new obstacles for large-scale scientific applications. These obstacles are primarily regarding the applications' development, maintenance, and capability to effectively exploit diverse system architectures in order to achieve theoretical performance in a hardware agnostic way.

How?

By utilizing Kokkos, (a C/C++ Performance Portability Programming Model) designed by Sandia National Laboratories, we can address the problem of increasing heterogeneity seen in modern systems by abstracting diverse CPU and GPU targets without losing performance. Kokkos automatically optimizes memory access by mapping parallel work indices and multidimensional array layout optimally for the architecture.

No really, HOW?! (E.g., Going from OpenMP To Kokkos)

OpenMP parallel for

#pragma omp parallel for
for(int i = 0; i < N; ++i){
/* loop body */ 
}

Kokkos parallel for

parallel_for(N,[=] (const int i) { 
/* loop body */ 
});

OpenMP Compute Pi

#pragma omp parallel // begin parallel section 
{
    #pragma omp for reduction(+:sum)
    for(int i=0; i < N; ++i) { 
        sum += 4.0/(1.0+((i+0.5)*step) * ((i+0.5)*step));
        } 
}    
est = step*sum;

Kokkos Compute Pi

// begin parallel section
Kokkos::parallel_reduce("compute_pi", N, [=] (const int i, double& update){
    update += 4.0/(1.0+((i+0.5)*step) * ((i+0.5)*step)) ;
    },sum);
est = step*sum;

Kokkos also has an incredible wiki page that enabled me to

Build the environment
Initialize the Environment
Express Hardware-agnostic Parallelism
Solve Common parallel algorithms with single-source implementations
Compile for parallel execution on the CPU, the GPU, or both
Achieve Portable Speedup for Program1,2,4 and particularly Program3

Explicit build instructions for these programs can be found in my README.md.

Further details of each program are explained via comments within the source code.

Wiki

Home

Fundamental Concepts

Getting Started with Kokkos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overview of Kokkos

The Goal for Modern Heterogenous CPU-GPU Computing is performance portability

Why?

How?

No really, HOW?! (E.g., Going from OpenMP To Kokkos)

OpenMP parallel for

Kokkos parallel for

OpenMP Compute Pi

Kokkos Compute Pi

Explicit build instructions for these programs can be found in my README.md.

Further details of each program are explained via comments within the source code.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally