Timers for Benchmarking (related to issue #1471 "Basic Instrumentation") #1778

wschenck · 2020-09-24T21:19:06Z

UPDATE: Previous TODOs are all done!

This PR is a suggestion how to meet the requirements stated in issue #1471 ("Basic Instrumentation") regarding timers for profiling and benchmarking.

In the current version, the "basic timers" are always enabled (without any switch; this design decision is the result of a discussion between suku248, jarsi and me). The "detailed timers" can be enabled by the cmake option "-Dwith-detailed-timers=ON". To give an example of a cmake command with detailed timers enabled:
cmake ../.. -DCMAKE_INSTALL_PREFIX:PATH=~/opt/NEST/zam884_nest3-timers -Dwith-mpi=ON -Dwith-detailed-timers=ON

In contrast to the requirements in issue #1471, even more detailed timers are provided. The reason is that these timers are required for the dry-run development for 5G to compare real and dry runs of NEST simulations. But in the end, we think that these additional timers are useful for many additional benchmarking purposes. The complete list is:

time_construction_create (basic timer, always enabled)
time_construction_connect (basic timer, always enabled)
time_communicate_prepare (basic timer, always enabled)
- nested in time_communicate_prepare:
  - time_gather_target_data
    - nested in time_gather_target_data:
      - time_communicate_target_data
time_simulate (basic timer, always enabled)
- nested in time_simulate:
  - time_update
  - time_gather_spike_data
    - nested in time_gather_spike_data:
      - time_collocate_spike_data
      - time_communicate_spike_data
      - time_deliver_spike_data

…imers Use cmake option with-detailed-timers

…on-connect Additional timer stuff from suku248

suku248 · 2020-10-14T13:18:45Z

@hakonsbm could you please check that all high-level connect functions contain timers?
@jarsi could you please check that time measurements for this PR are comparable to previous measurements?
@terhorstd do you think the issue #1471 is fully addressed by this PR?

hakonsbm

Looks pretty good, thanks for implementing this! There are some TODOs that should be fixed or removed, see my comments below.

@suku248 There are essentially only three connection functions in the C++ API, where all high-level connect functions converge, and these all contain timers. I have also checked that timing works with the high-level functions. However, It would be nice to have a proper unittest of the instrumentation.

hakonsbm · 2020-10-15T08:14:09Z

nestkernel/event_delivery_manager.cpp

      // Needs to be called /after/ set_end_and_invalid_markers_.
      set_complete_marker_spike_data_( assigned_ranks, send_buffer_position, send_buffer );
 #pragma omp barrier
+      // TODO: Discuss, if the barrier above should be better placed after if-clause


A decision should be made here and the todo should be removed.

All threads should come to the same result for gather_completed_checker_.all_true().
Hence, the barrier can stay inside the if-clause. Now removed comment.

hakonsbm · 2020-10-15T08:14:36Z

nestkernel/event_delivery_manager.cpp

+      // TODO_SDR: Following function call commented out because it causes real runs to hang!
+      // kernel().mpi_manager.synchronize(); // to get an accurate time measurement
+      // across ranks


Can this be removed?

I agree, we shouldn't introduce a MPI synchronization point. This would yield significant overhead, which is also not accounted for in any timer at the moment.

Agreed. Now removed synch point and comment.

hakonsbm · 2020-10-15T08:15:00Z

nestkernel/event_delivery_manager.cpp

+      // TODO_SDR: Following function call commented out because it causes real runs to hang!
+      // kernel().mpi_manager.synchronize(); // to get an accurate time measurement
+      //                                    // across ranks


Can this be removed?

I agree, we shouldn't introduce a MPI synchronization point. This would yield significant overhead, which is also not accounted for in any timer at the moment.

Agreed. Now removed synch point and comment.

jarsi · 2020-10-20T10:06:50Z

Thank you very much for implementing these!

The timers and their placement are comparable with what we have used so far.

I am wondering about some new omp barrriers. I think their purpose is to make sure every thread takes the same amount of time finishing some timed interval. But won't a bunch of new barriers considerably slow down heavily multithreaded (100 + threads) measurements and consequently distort the measurement?

Would it make sense to either not introduce new barriers at all, thus assuming the work is balanced evenly across all threads, or to let every thread have its own stopwatch and taking either the mean or the max across one mpi process as measured time.

jarsi · 2020-10-21T08:15:32Z

nestkernel/event_delivery_manager.cpp

+    }
+
+#ifdef TIMER_DETAILED
+#pragma omp barrier


I guess this barrier is placed here in order to guarantee that all threads take the same amount of time and tid = 0 represents all other threads. But I think every new barrier would introduce significant synchronization overhead and distort the overall measurement. This would be especially severe if we used these timers on architectures with high core counts allowing a huge number of threads.

Agreed. Now removed barrier.

jarsi · 2020-10-21T08:16:12Z

nestkernel/event_delivery_manager.cpp

    kernel().connection_manager.clean_source_table( tid );
+
+#ifdef TIMER_DETAILED
+#pragma omp barrier


Same as above.

Agreed. Now removed barrier.

jarsi · 2020-10-21T08:16:58Z

nestkernel/simulation_manager.cpp

  kernel().event_delivery_manager.gather_target_data( tid );

+#ifdef TIMER_DETAILED
+#pragma omp barrier


Same as above.

Yes, it's not frequently called. Keep this barrier.

Remove barriers and synch points as suggested by reviewers

suku248 · 2020-11-16T10:22:40Z

@wschenk please check the removed barriers and MPI synch points and ping the reviewers ( @hakonsbm @jarsi @terhorstd ) if you are happy with the changes.

suku248 · 2020-11-16T10:29:26Z

@suku248 There are essentially only three connection functions in the C++ API, where all high-level connect functions converge, and these all contain timers. I have also checked that timing works with the high-level functions. However, It would be nice to have a proper unittest of the instrumentation.

@hakonsbm Thanks for checking the connect functions! I don't think it's possible to create a unit test for this, but including the detailed-timers case in CI might make sense.

wschenck · 2020-11-22T22:47:42Z

nestkernel/event_delivery_manager.cpp

    kernel().connection_manager.clean_source_table( tid );

 #ifdef TIMER_DETAILED
-#pragma omp barrier


I think meanwhile that it is more logical at this point to put sw_communicate_target_data.start()/stop() into the "omp single" region.

Otherwise I am happy with the changes, thank you! (@hakonsbm @jarsi @terhorstd @suku248)

wschenck · 2021-02-03T20:44:43Z

@hakonsbm, @jarsi, @terhorstd, @suku248 : From my perspective, the pull request is ready to go. If you have no further objections, please pull the code into master.

hakonsbm

Looks good to me now.

heplesser · 2021-03-09T10:45:27Z

@suku248 @jarsi Are you happy with this PR or do you see the need for further changes?

suku248 · 2021-03-09T10:53:45Z

We agreed in the last Hackathon that with the fixes to time_construction_connect by @wschenck the PR is now ready for the final steps of the review.

heplesser · 2021-03-24T13:18:29Z

@jarsi @jasperalbers Could you review this soon?

jasperalbers · 2021-03-24T15:46:20Z

@heplesser I will be on it soon, hopefully at the end of the week, or latest by the start of next week.

jasperalbers

I did a test with the multi-area model which confirms that the timing data is reasonably similar to the old timers implemented by @jarsi, thus I don't see any necessary changes in the implementation.
I would suggest to add the info about

how to set the correct compilation flag
how to access the timers
the precise definition of the measured times

somewhere in the documentation (sorry if this already exists, I wasn't able to find it).

suku248 · 2021-03-26T12:35:58Z

Thanks @jasperalbers ! There is a separate issue for the documentation #1977.
If you are happy with this PR, could you please approve?

suku248 and others added 9 commits September 24, 2020 22:45

Include standard timers

a87433f

Rework and extend timers

99053d1

Format code

a52a912

Rename time readout parameters in dictionaries

96d9410

Remove old time variables

5a2b77f

Add timer time_communicate_prepare and improve timer initialization

83840b3

Seperate basic from detailed timers

c049eea

Change name of symbolic constant TIMER to TIMER_DETAILED

ed7df90

Adjust comment

d3664b4

stinebuu added I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: Normal Handle this with default priority T: Enhancement New functionality, model or documentation labels Sep 30, 2020

suku248 and others added 6 commits October 14, 2020 11:13

Use cmake option with-detailed-timers

cd7e2d9

Merge pull request #1 from suku248/nest3-timers-todo2-with-detailed-t…

d95c1ab

…imers Use cmake option with-detailed-timers

Add timer time_construction_create

736c0a5

Time high-level connect calls

b41c652

Make timer private (except time_construction_connect)

7356ae7

Merge pull request #2 from suku248/nest3-timers-todo1-time-constructi…

b2a4b9e

…on-connect Additional timer stuff from suku248

suku248 marked this pull request as draft October 14, 2020 13:13

suku248 requested review from hakonsbm, jarsi and terhorstd October 14, 2020 13:14

suku248 assigned wschenck Oct 14, 2020

suku248 marked this pull request as ready for review October 14, 2020 17:08

hakonsbm suggested changes Oct 15, 2020

View reviewed changes

jarsi reviewed Oct 21, 2020

View reviewed changes

suku248 added 3 commits November 16, 2020 10:58

Remove barriers and synch points as suggested by reviewers

4edf7e7

Merge branch 'nest3-timers' into nest3-timers-barriers

f16e2aa

Merge pull request #3 from suku248/nest3-timers-barriers

f61be4e

Remove barriers and synch points as suggested by reviewers

Merge remote-tracking branch 'upstream/master' into nest3-timers

2273e70

wschenck commented Nov 22, 2020

View reviewed changes

suku248 and others added 6 commits December 23, 2020 12:00

Merge branch 'master' into nest3-timers

a700cfe

Correct/improve small glitch reg. timers in event_delivery_manager.cpp

7c19d46

Merge branch 'master' into nest3-timers

6fec5d0

Fix error reg. time_construction_connect

8d948c2

Fix code for time_construction_connect

3de4b56

Merge branch 'master' into nest3-timers

5fefea7

hakonsbm approved these changes Feb 4, 2021

View reviewed changes

heplesser requested review from jasperalbers and removed request for terhorstd March 9, 2021 10:44

suku248 mentioned this pull request Mar 15, 2021

Write documentation for built-in timers #1977

Closed

6 tasks

heplesser mentioned this pull request Mar 24, 2021

Can we replace nest.Prepare/nest.Run/nest.Simulate/nest.Cleanup by a single or two wrapper command(s)? #1773

Closed

jasperalbers reviewed Mar 26, 2021

View reviewed changes

jasperalbers approved these changes Mar 26, 2021

View reviewed changes

suku248 merged commit caafd0e into nest:master Mar 26, 2021

suku248 linked an issue Apr 12, 2021 that may be closed by this pull request

Basic Instrumentation #1471

Closed

jarsi mentioned this pull request Jun 9, 2021

Add timers for NEST 2.14.1 #2063

Merged

jarsi mentioned this pull request Jun 30, 2021

Add standard and detailed timers for benchmarking purposes for NEST 2.20.2 #2086

Merged

Timers for Benchmarking (related to issue #1471 "Basic Instrumentation") #1778

Timers for Benchmarking (related to issue #1471 "Basic Instrumentation") #1778

Uh oh!

Conversation

wschenck commented Sep 24, 2020 • edited by terhorstd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

suku248 commented Oct 14, 2020

Uh oh!

hakonsbm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jarsi commented Oct 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suku248 commented Nov 16, 2020

Uh oh!

suku248 commented Nov 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschenck commented Feb 3, 2021

Uh oh!

hakonsbm left a comment

Choose a reason for hiding this comment

Uh oh!

heplesser commented Mar 9, 2021

Uh oh!

suku248 commented Mar 9, 2021

Uh oh!

heplesser commented Mar 24, 2021

Uh oh!

jasperalbers commented Mar 24, 2021

Uh oh!

jasperalbers left a comment

Choose a reason for hiding this comment

Uh oh!

suku248 commented Mar 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wschenck commented Sep 24, 2020 •

edited by terhorstd

Loading