Add a SplitKernels example #216

hageboeck · 2022-07-07T16:46:18Z

New example based on example 18
Discrete interactions moved to dedicated kernels for tests, profiling and R&D

@bernhardmgruber

phsft-bot · 2022-07-07T16:46:22Z

Can one of the admins verify this patch?

hahnjo

Can we call this example19, please, following the existing naming scheme?

examples/SplitKernels/gammas.cu

examples/SplitKernels/main.cu

bernhardmgruber · 2022-07-08T10:32:40Z

Can we call this example19, please, following the existing naming scheme?

We purposely chose something different and there is precedence for it with ECS, FisherPrice_*, Raytracer_Benchmark, TestEm3, etc. It is also different compared to the other examples, which seem rather finished, in that we want to continuously apply patches and improvements to SplitKernels. We want a playground for fast evolution.

And generally, the name example19 does not carry any semantic meaning. It is already confusing enough that many other examples also carry no names indicating what they do, and they also not necessarily share a chronology. I would even propose to rename the existing examples as well so they include names like Geant4_integration or standalone, etc.

hahnjo · 2022-07-08T10:36:08Z

example19 is the naming scheme for the next example, and I'd like to ask that you follow that. What would you do if you have another idea, to implement on top of split kernels? Creating a new example20 or whatever the number will be is easy and cheap.

bernhardmgruber · 2022-07-08T11:04:37Z

example19 is the naming scheme for the next example, and I'd like to ask that you follow that.

As I pointed out, it is not the naming scheme for all examples. And IMO it is also not a great naming scheme.

What would you do if you have another idea, to implement on top of split kernels?

Easy, if the change is strictly better we change SplitKernels. If it is a variant that is not better, like a compacted or double buffered version, we can name it e.g. SplitKernels_compact. We also have precedence for that with TestEm3 and TestEm3Compact, which are definitely better names than exampleX.

hahnjo · 2022-07-08T11:23:49Z

As I pointed out, it is not the naming scheme for all examples.

Yes, and as I pointed out in the past, this was not supposed to happen.

examples/SplitKernels/README.md

examples/SplitKernels/electrons.cu

bernhardmgruber · 2022-08-18T07:22:16Z

Can we call this example19, please, following the existing naming scheme?

By now I don't care anymore whether this is called SplitKernels or example19. I would just really like to see it merged.

@hahnjo, @hageboeck and me would like to continue to work on that example and merge multiple PRs on top, including very experimental changes. So it is important for us to have freedom in that example.

hahnjo · 2022-08-18T12:29:46Z

I stand by my previous comments:

This should be called example19, following the established naming scheme. If there are smaller tweaks to the example maintaining the workflow, they can be done in this example in follow-up PRs. For bigger changes, just add a new example.
IMO at least the default batch size should be adjusted to provide comparable results to the existing examples (on second thought, I don't care so much about the capacity, but right now these two are linked).
The arbitrary choices in the launch configurations for the split kernels should take the number of particles currently in flight into account. For an easy start, I suggested just defaulting to transportBlocks.

hageboeck · 2022-08-22T16:16:57Z

I stand by my previous comments:

This should be called example19, following the established naming scheme. If there are smaller tweaks to the example maintaining the workflow, they can be done in this example in follow-up PRs. For bigger changes, just add a new example.

IMO at least the default batch size should be adjusted to provide comparable results to the existing examples (on second thought, I don't care so much about the capacity, but right now these two are linked).

The arbitrary choices in the launch configurations for the split kernels should take the number of particles currently in flight into account. For an easy start, I suggested just defaulting to transportBlocks.

I believe all of the above are addressed now. @bernhardmgruber @hahnjo
For the batch sizes I went for 52 in the end, as that's what example18 would do by default.

hahnjo

Looks good from a code point of view; some minor last comments inline.

FWIW here are the results for -particles 10000 (10k) on my RTX 2070 SUPER:

	`example18`	`example19`
`-batch 52` (default)	102s	117s
`-batch 209`	54.6s	63.8s
`-batch 419`	45.5s	49.5s

examples/Example19/README.md

examples/Example19/SplitKernels.cuh

In addition: Remove old NVTX tracing code.

Co-authored by Guilherme Amadio <[email protected]>

Jonas pointed out the advance happens anyway when RNG states are branched, so no need to do it here.

Jonas asked for a batch size that's comparable with other examples. Using the default capacity of the containers in example18, one arrives at 52 primaries per batch.

On Jonas' request, use transportBlocks blocks as a starting point for all interaction kernels.

hahnjo · 2022-09-02T09:08:57Z

From #203 (comment)

Another approach (that I think we had in the past? not sure) is having queues for the discrete processes. That would even save us from determining which Track need to run in which of the (split) kernels.

I gave this a try, and it seems to be equivalent in terms of performance but we don't need that awfully templated InteractionLoop. Let me know if you want me to submit this as a PR...

bernhardmgruber mentioned this pull request Jul 7, 2022

Automatically adjust buffer capacity in TestEm3 #214

Closed

hahnjo requested changes Jul 8, 2022

View reviewed changes

hageboeck force-pushed the SplitKernels branch from 8932e38 to d71dc03 Compare July 8, 2022 12:10

hahnjo reviewed Jul 12, 2022

View reviewed changes

examples/SplitKernels/README.md Outdated Show resolved Hide resolved

examples/SplitKernels/electrons.cu Outdated Show resolved Hide resolved

Add automatic occupancy annotations to NVTX tracer.

c855f40

hageboeck force-pushed the SplitKernels branch from d71dc03 to c6fdedb Compare August 22, 2022 16:16

hahnjo approved these changes Aug 23, 2022

View reviewed changes

examples/Example19/README.md Outdated Show resolved Hide resolved

examples/Example19/README.md Outdated Show resolved Hide resolved

examples/Example19/SplitKernels.cuh Outdated Show resolved Hide resolved

hageboeck added 9 commits August 23, 2022 11:27

Add example19 as a copy of example18.

0483d54

In addition: Remove old NVTX tracing code.

[ex19] Update README.

3822f79

[ex19] Use cudaDeviceProp to allocate more memory.

40724c2

Co-authored by Guilherme Amadio <[email protected]>

[ex19] Add InteractionKernel template.

48f5063

[ex19] Move discrete interactions into dedicated kernels.

80f8944

[ex19] Increase number of blocks.

70fb3e9

[ex19] Don't advance the RNG before discrete interactions.

c15a11d

Jonas pointed out the advance happens anyway when RNG states are branched, so no need to do it here.

[ex19] Fix default batch size to 52.

bc938cf

Jonas asked for a batch size that's comparable with other examples. Using the default capacity of the containers in example18, one arrives at 52 primaries per batch.

[ex19] Launch all interaction kernels with transportBlocks blocks.

7be02b1

On Jonas' request, use transportBlocks blocks as a starting point for all interaction kernels.

hageboeck force-pushed the SplitKernels branch from c6fdedb to 7be02b1 Compare August 23, 2022 09:28

hahnjo merged commit ebd96f0 into apt-sim:master Aug 23, 2022

hageboeck deleted the SplitKernels branch August 23, 2022 16:26

hageboeck restored the SplitKernels branch August 23, 2022 16:51

This was referenced Aug 24, 2022

Allow inlining FlatWrapper()/FlatArrayWrapper(), but not Advance() #215

Closed

Example19 with split kernels is not reproducible #220

Closed

hahnjo mentioned this pull request Oct 10, 2022

Put RNG in shared memory where beneficial #229

Draft

1 task

Add a SplitKernels example #216

Add a SplitKernels example #216

Uh oh!

Conversation

hageboeck commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phsft-bot commented Jul 7, 2022

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bernhardmgruber commented Jul 8, 2022

Uh oh!

hahnjo commented Jul 8, 2022

Uh oh!

bernhardmgruber commented Jul 8, 2022

Uh oh!

hahnjo commented Jul 8, 2022

Uh oh!

Uh oh!

Uh oh!

bernhardmgruber commented Aug 18, 2022

Uh oh!

hahnjo commented Aug 18, 2022

Uh oh!

hageboeck commented Aug 22, 2022

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hahnjo commented Sep 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hageboeck commented Jul 7, 2022 •

edited

Loading