Skip to content

Conversation

ngaloppo
Copy link

Based on Ben's concurrent execution sample code, created a simplified version that submits a time sink kernel on two threads concurrently. Execution time per kernel instance on a single thread is only half as long as execution time on two threads.

@bashbaug Any idea what could be going on here? Is there an issue perhaps with sharing a context / device across multiple threads?

[@tgl:~/code/simple-sycl-samples/build] [intelpython-python3.9] concurrent-execution(+1/-0)* ± ../install/Release/thread_concurrency -p 2
Running on SYCL platform: Intel(R) OpenCL HD Graphics
Running on SYCL device: Intel(R) Iris(R) Xe Graphics [0x9a49]
Initializing tests...
... done!
Testing without threads
                                      go (i=  0): Average time: 0.031794 seconds
Testing with threads
                                      go (i=  1): Average time: 0.062274 seconds
                                      go (i=  0): Average time: 0.063263 seconds
Cleaning up...
... done!

@bashbaug
Copy link
Owner

Brief notes in case this helps somebody else in the future:

This device is able to execute kernels concurrently by batching them together into one submission. One way to do this is to put kernels into the same out-of-order queue without any dependencies. The driver will also batch submissions from multiple queues - both in-order and out-of-order queues - though the submissions need to be close enough together to batch.

In this particular case, one of the threads get start slightly before the other, so the calls go:

// Thread 1:
clEnqueueNDRangeKernel
clFinish

// Thread 2, a little later:
clEnqueueNDRangeKernel
clFinish

// Repeat

Because the clFinish from Thread 1 happens before the clEnqueueNDRangeKernel from Thread 2, the driver will not batch these two submissions together, and they won't run concurrently.

If the calls happened to occur very close together in time, or if this is enforced with e.g. a thread barrier, the kernels should execute concurrently. In other words, there is no inherent reason why kernels from multiple threads cannot run concurrently, they just happen not to run concurrently in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants