Skip to content

Conversation

@bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Sep 30, 2022

Adds a lambda function to mark a tracks survival more explicit. Then keeps more track state in local registers.

example19 -batch 1000 -gdml_file ../examples/Example14/macros/testEm3.gdml, 5 time average:

master + #225:
Mean: 0.200212
Stddev: 0.00043419

with this PR:
Mean: 0.18434
Stddev: 0.000701656

Depends on:

@bernhardmgruber bernhardmgruber changed the title Local vars ex19 Keep more track members in registers in example19 Sep 30, 2022
@phsft-bot
Copy link

Can one of the admins verify this patch?

@bernhardmgruber bernhardmgruber force-pushed the local_vars_ex19 branch 2 times, most recently from 822af79 to 71fbce9 Compare September 30, 2022 13:14
@hahnjo
Copy link
Contributor

hahnjo commented Oct 3, 2022

The first commit is the same as #209, right? The second looks similar to #213 which unexpectedly changed results - was this ever understood?

@bernhardmgruber
Copy link
Contributor Author

The first commit is the same as #209, right?

Yes, at least similar in spirit!

The second looks similar to #213 which unexpectedly changed results - was this ever understood?

Yes it is the same refactoring. Unfortunately, I still don't understand why it makes a difference in #213 :/

@hahnjo
Copy link
Contributor

hahnjo commented Oct 4, 2022

The second looks similar to #213 which unexpectedly changed results - was this ever understood?

Yes it is the same refactoring. Unfortunately, I still don't understand why it makes a difference in #213 :/

Hm, I'm feeling a bit uneasy merging something which gives different results when applied to a different example. I think we really must make an effort to understand what's going on there before we proceed with changing other examples...

@bernhardmgruber bernhardmgruber force-pushed the local_vars_ex19 branch 2 times, most recently from dceb689 to 5db0d5d Compare October 7, 2022 09:41
@bernhardmgruber bernhardmgruber marked this pull request as ready for review October 7, 2022 09:53
@hahnjo hahnjo mentioned this pull request Nov 1, 2022
8 tasks
@hahnjo
Copy link
Contributor

hahnjo commented Nov 1, 2022

Also causes differences when activating magnetic field, cf #213 (comment)

@bernhardmgruber
Copy link
Contributor Author

I am starting to believe there is a compiler bug involved. When computing the safety in TransportElectrons, it makes a difference whether the argument is provided from a local variable or a reference to global memory:

    double safety = 0;
    if (!navState.IsOnBoundary()) {
      const auto localPos = currentTrack.pos;
      const double safety1 = BVHNavigator::ComputeSafety(localPos, navState);
      const double safety2 = BVHNavigator::ComputeSafety(currentTrack.pos, navState);
      if (safety1 != safety2) {
        printf("safety: %.20f vs.\nsafety: %.20f\n", safety1, safety2);
        printf("pos: %.20f %.20f %.20f vs.\npos: %.20f %.20f %.20f\n", localPos.x(), localPos.y(), localPos.z(), currentTrack.pos.x(),
               currentTrack.pos.y(), currentTrack.pos.z());
        __trap();
      }
      safety = safety2;
    }

The kernel traps and produces the following output:

safety: 0.21831958555947394984 vs.
safety: 0.21831958555947403311
safety: 0.01544092245763720173 vs.
safety: 0.01544092245763728499
safety: 0.06332618586926033744 vs.
safety: 0.06332618586926039295
pos: 28.70628184657631720711 0.08269122271328140095 -0.00006273496558749841 vs.
pos: 28.70628184657631720711 0.08269122271328140095 -0.00006273496558749841
pos: 28.49718803731910909960 0.08297612356319541971 0.00014061084553511203 vs.
pos: 28.49718803731910909960 0.08297612356319541971 0.00014061084553511203
pos: 28.54689026516190963889 0.08430407572358261659 0.00013250864783830081 vs.
pos: 28.54689026516190963889 0.08430407572358261659 0.00013250864783830081

You can see that the inputs are (obviously) identical, but the result computed by ComputeSafety has a tiny difference.

@hahnjo do you have any idea on what we could do here? The result is marginally affected by this difference, as you observed.

@bernhardmgruber
Copy link
Contributor Author

The last commit introduces two workarounds to reproduce the old behavior, but I would like to get rid of them. How can I proceed?

@hahnjo
Copy link
Contributor

hahnjo commented Nov 7, 2022

Okay, that's not a compiler bug but a feature / more optimizations happening. I can get identical results before / with only the first commit by building with --fmad=false. I'm obviously fine if different rounding of individual floating point operations changes the result, but we need to verify the same for the other examples touched by #213.

And same question as there: Do you want to merge the changes in this PR into #213 because they are essentially the same modification?

@bernhardmgruber
Copy link
Contributor Author

Wow, that is awesome! Thx for finding that out! How did you discover the difference comes from fma?

Do you want to merge the changes in this PR into #213 because they are essentially the same modification?

I split off the changes for example19 on purpose, because I needed the change for my LLAMA heatmaps to be more accurate and I needed the change for my ACAT22 presentation. So I hoped by just aiming in example19 I could get them in faster. Now that ACAT22 is in the past, I don't mind to combine both of them. But since you now tested that the changeset is fine, I would like to remove my workaround commit and have it merged sooner, if that is fine for you!

The track variables are loaded at the beginning of the kernel and
written back when the track survives.
The signature of InitAsSecondary() had to be changed to not load from
the track again.

The results of the simulation change slighly because nvcc now generates
a few fma instructions in different places (thx to hahnjo for finding
out!).
@hahnjo
Copy link
Contributor

hahnjo commented Nov 7, 2022

Do you want to merge the changes in this PR into #213 because they are essentially the same modification?

I split off the changes for example19 on purpose, because I needed the change for my LLAMA heatmaps to be more accurate and I needed the change for my ACAT22 presentation. So I hoped by just aiming in example19 I could get them in faster. Now that ACAT22 is in the past, I don't mind to combine both of them. But since you now tested that the changeset is fine, I would like to remove my workaround commit and have it merged sooner, if that is fine for you!

No no, that's not how I want to do this. I will test all examples before merging any of this, there's no "overtaking on the right" for a single example...

@bernhardmgruber
Copy link
Contributor Author

No no, that's not how I want to do this. I will test all examples before merging any of this, there's no "overtaking on the right" for a single example...

In that case, I can merge the PRs.

@bernhardmgruber bernhardmgruber deleted the local_vars_ex19 branch November 7, 2022 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants