-
Notifications
You must be signed in to change notification settings - Fork 36
Add a SplitKernels example #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this example19, please, following the existing naming scheme?
We purposely chose something different and there is precedence for it with And generally, the name |
|
|
As I pointed out, it is not the naming scheme for all examples. And IMO it is also not a great naming scheme.
Easy, if the change is strictly better we change |
Yes, and as I pointed out in the past, this was not supposed to happen. |
By now I don't care anymore whether this is called SplitKernels or example19. I would just really like to see it merged. @hahnjo, @hageboeck and me would like to continue to work on that example and merge multiple PRs on top, including very experimental changes. So it is important for us to have freedom in that example. |
|
I stand by my previous comments:
|
d71dc03 to
c6fdedb
Compare
I believe all of the above are addressed now. @bernhardmgruber @hahnjo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a code point of view; some minor last comments inline.
FWIW here are the results for -particles 10000 (10k) on my RTX 2070 SUPER:
example18 |
example19 |
|
|---|---|---|
-batch 52 (default) |
102s | 117s |
-batch 209 |
54.6s | 63.8s |
-batch 419 |
45.5s | 49.5s |
In addition: Remove old NVTX tracing code.
Co-authored by Guilherme Amadio <[email protected]>
Jonas pointed out the advance happens anyway when RNG states are branched, so no need to do it here.
Jonas asked for a batch size that's comparable with other examples. Using the default capacity of the containers in example18, one arrives at 52 primaries per batch.
On Jonas' request, use transportBlocks blocks as a starting point for all interaction kernels.
c6fdedb to
7be02b1
Compare
|
From #203 (comment)
I gave this a try, and it seems to be equivalent in terms of performance but we don't need that awfully templated |
@bernhardmgruber