Skip to content

Serial To Parallel Speedup Example

Tommy Gorham edited this page May 18, 2022 · 3 revisions

In this example we use OpenMP (an API for writing multi-threaded applications) in order to go from using one thread of CPU execution, to two threads of CPU execution. The goal of this example is to show how utilizing more cores of a CPU in parallel helps us solve problems faster.

CPU Threads

CPU threads are the virtual components that divide the physical core of a CPU into virtual multiple cores. Recall from the previous section how multiple cores deliver the workload to the CPU more efficiently, as long as we tell the computer to run in parallel with something like OpenMP.

Why OpenMP for this example?

I use OpenMP in this example because Kokkos is a shared memory programming model, and it's very benefecial when writing/understanding Kokkos to also have a firm grasp on OpenMP shared-memory parallelism.

Serial Compute Pi

...
         static long num_steps = 100000000;

         start_time = omp_get_wtime();
         for (i=1;i<= num_steps; i++){
         x = (i-0.5)*step;
         sum = sum + 4.0/(1.0+x*x);
         }
	  pi = step * sum;
	  run_time = omp_get_wtime() - start_time;
}	

Parallel with 2 Threads

...
#define NUM_THREADS 2
          omp_set_num_threads(NUM_THREADS); 
	  start_time = omp_get_wtime();
	  #pragma omp parallel for private(x) reduction(+:sum)
	  for (i=1;i<= num_steps; i++){
		  x = (i-0.5)*step;
		  sum = sum + 4.0/(1.0+x*x);
	  }

	  pi = step * sum;
	  run_time = omp_get_wtime() - start_time;	

Results on a 4 Core CPU with 8 Threads (2 threads per core)


Number of Physical Cores * Number of Threads per core = Number of Logical Cores

4 * 2 = 8

2threads_speedup

Speed up


speedup = Serial / Parallel speed up = 1.90 (about twice as fast with 2 Threads as opposed to 1)

Next: Shared Memory Architecture

Clone this wiki locally