-
Notifications
You must be signed in to change notification settings - Fork 3
3. Software
The version 4.0 of the OpenMP standard introduces new directives that enable the transfer of computation to heterogeneous computing devices (e.g. GPUs or DSP). We use this programming model to transfer computation to the HARP2 platform or to an HARP2 simulator (for debug purpose). This can be done using two modes as described below: (1) Offloading a pre-synthesized module; and (2) Using HardCloud to synthesize C and offload the resulting module.
#pragma omp target device(HARPSIM) map(to: A) map(from: B)
#pragma omp parallel for use(hrw) module(loopback)
// Software version of the loopback hardware module.
for (int i = 0; i < NI; i++) {
B[i] = A[i];
}
The example above shows the syntax that was adopted. The map(:to) clause indicates the data that will be sent to the accelerator, while the map(:from) indicates the data that will be received from the accelerator as a result. The clause use(hrw) specifies that the annotated code block will use a pre-designed hardware, for example module (loopback), to do the computation instead of the C code following the annotation. The device(HARPSIM) clause indicates that the execution will be performed by the HARP2 simulator. Optionally to HARPSIM, one can use the HARP device that instructs the HardCloud to generate code for the real HARP instead of for the simulator.
#pragma omp target device(HARPSIM) map(to: A, B, C) map(from: D, E, F, G, H)
#pragma omp parallel for use(hrw) module(loopback)
for (int i = 0; i < NI; i++) {
D[i] = A[i];
E[i] = B[i];
F[i] = C[i];
G[i] = A[i] + B[i] + C[i];
H[i] = A[i] * B[i] * C[i];
}
This example shows the possibility to work with multiple variables. The map(:to), in this case, has three variables to be offloaded to the FPGA. Next, the result will be mapped to five variables as specified in the map(:from).
#pragma omp target device(HARPSIM) map(to: A) map(from: B)
#pragma omp parallel use(hrw) module(loopback)
{
loopback(A, &B);
}
Another feature of HardCloud is the use of OpenMP parallel directive. Instead of a for loop, the block of code above contains a call to a C function that executes the loopback operation, writing the values from variable A to B.
[This feature is not supported by this release]
Instead of using the module clause, to specify a pre-designed hardware module, a programmer can use the HardCloud synthesize clause to generate a new bitstream starting from C code. For example, by using the synthesize clause in the following annotated code, a C code matrix multiplication can be converted to OpenCL, followed to Verilog and finally synthesized as a hardware bitstream using the Intel FPGA SDK for OpenCL. HardCloud takes the resulting bitstream, automatically configures the HARP2 FPGA and finally runs the application.
#pragma omp target device(HARP)
#pragma omp target map(to: A[:N*N], B[:N*N]) map(from: C[:N*N])
// Convert loop to OpenCL and then to Verilog and synthesize
#pragma omp parallel for use(hrw) synthesize(matmul)
for(int i=0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
C[i * N + j] = 0;
for (int k = 0; k < N; ++k) {
C[i * N + j] += A[i * N + k] * B[k * N + j];
}
}
}