-
Notifications
You must be signed in to change notification settings - Fork 0
Overview of Kokkos
Using Kokkos enables us to write portable, single-source parallel implementations of scientific algorithms looking to attain high performance by making use of multiple computational resources with different computation units.
As modern HPC systems, namely supercomputers, increase in scale and heterogeneity, so does the difficulty of efficiently leveraging a broader range of diversified compute resources. Thus, the increase in the number of heterogenous system architectures (meaning those that contain at least one accelerator such as a GPU), along with the variance in the manufacturer of the GPU chip itself (e.g., NVIDIA, AMD, Intel, etc.) have created new obstacles for large-scale scientific applications. These obstacles are primarily regarding the applications' development, maintenance, and capability to effectively exploit diverse system architectures in order to achieve theoretical performance in a hardware agnostic way.
By utilizing Kokkos, (a C/C++ Performance Portability Programming Model) designed by Sandia National Laboratories, we can address the problem of increasing heterogeneity seen in modern systems by abstracting diverse CPU and GPU targets without losing performance. Kokkos automatically optimizes memory access by mapping parallel work indices and multidimensional array layout optimally for the architecture.
#pragma omp parallel for
for(int i = 0; i < N; ++i){
/* loop body */
}
parallel_for(N,[=] (const int i) {
/* loop body */
});
#pragma omp parallel // begin parallel section
{
#pragma omp for reduction(+:sum)
for(int i=0; i < N; ++i) {
sum += 4.0/(1.0+((i+0.5)*step) * ((i+0.5)*step));
}
}
est = step*sum;
// begin parallel section
Kokkos::parallel_reduce("compute_pi", N, [=] (const int i, double& update){
update += 4.0/(1.0+((i+0.5)*step) * ((i+0.5)*step)) ;
},sum);
est = step*sum;
Kokkos also has an incredible wiki page that enabled me to
- Build the environment
- Initialize the Environment
- Express Hardware-agnostic Parallelism
- Solve Common parallel algorithms with single-source implementations
- Compile for parallel execution on the CPU, the GPU, or both
- Achieve Portable Speedup for Program1,2,4 and particularly Program3
Explicit build instructions for these programs can be found in my README.md.
Wiki
Fundamental Concepts
- What is HPC?
- How Do Computers Solve Problems?
- Serial to Parallel speedup example
- Shared Memory Architecture
- Heterogenous Architectures
Getting Started with Kokkos