-
Notifications
You must be signed in to change notification settings - Fork 1.2k
"Mark-delay" performance improvement to major GC #13580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
31fc961
to
72500e9
Compare
Thanks! I’ll have a look tomorrow. |
72500e9
to
9af1835
Compare
b341dd4
to
4a6c7af
Compare
I've addressed @kayceesrk's review comments and rebased. |
Thanks to @kayceesrk and @stedolan I think this can come out of draft now. |
The code looks ok now. MSVC 32-bit has a failing test. I'll wait until the test is fixed before I approve the PR. |
4a6c7af
to
7d91cc4
Compare
any insights on this so far? |
7d91cc4
to
295543e
Compare
The Win32 problem is due to an accounting error, observed across platforms, which causes the |
(note: this accounting problem is specific to this PR, and not observed on trunk). |
7a17a33
to
af5fb77
Compare
I have further diagnosed the accounting problem, and put in a dirty hack to prove my hypothesis. My hack, which should not be merged, demonstrates that addressing this accounting issue fixes the problem by artificially consuming the rest of the slice budget at the point at which sweeping is first completed. |
Co-authored-by: Stephen Dolan <[email protected]>
…unter at the start of any slice when it falls very far behind alloc_counter.
d372edb
to
bf509d2
Compare
Yes, as an interim fix, we shouldn't require available work to update the finaliser tables. This isn't ideal, and I've added a TODO about it. |
bf509d2
to
8dac6a5
Compare
8dac6a5
to
1e4b55c
Compare
@jmid the multicore tests failure seems to be an internal error. Is that correct? |
Apart from the unrelated failure in multicore tests, the usual compiler testsuite passes cleanly. I'm satisfied with the PR. I've already left an approval. |
Sort of, I earlier explained it here: #13580 (comment) I kicked off a fresh test run here: https://github.com/ocaml-multicore/multicoretests/tree/target-mark-delay-13580 If you want to give this a go, I've previously shared instructions for building and running the test suite locally here: #13580 (comment) |
I've been able to narrow this down a bit - both in the CI with
and locally where I just commented out the debug message line: diff --git a/runtime/minor_gc.c b/runtime/minor_gc.c
index 247382c227..6e2b42f339 100644
--- a/runtime/minor_gc.c
+++ b/runtime/minor_gc.c
@@ -881,7 +881,7 @@ caml_stw_empty_minor_heap_no_major_slice(caml_domain_state* domain,
}
CAML_EV_BEGIN(EV_MINOR_MEMPROF_CLEAN);
- CAML_GC_MESSAGE(MINOR, "Updating memprof.\n");
+ /* CAML_GC_MESSAGE(MINOR, "Updating memprof.\n"); */
caml_memprof_after_minor_gc(domain);
CAML_EV_END(EV_MINOR_MEMPROF_CLEAN); Here I was able to reproduce the assertion error a couple of times. It triggers once every 15 runs or so (at the beginning of a PBT run so
Resulting in this stack trace: Program terminated with signal SIGILL, Illegal instruction.
#0 caml_abort () at runtime/caml/misc.h:370
370 __builtin_trap();
[Current thread is 1 (Thread 0x7f0f691bb640 (LWP 2935090))]
(gdb) thread apply all bt
Thread 6 (Thread 0x7f0f627fc640 (LWP 2935093)):
#0 futex_wait (private=0, expected=2, futex_word=0x559dec5c0000) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x559dec5c0000, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007f0f69f41002 in lll_mutex_lock_optimized (mutex=0x559dec5c0000) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=mutex@entry=0x559dec5c0000) at ./nptl/pthread_mutex_lock.c:93
#4 0x0000559db367462f in caml_plat_lock_blocking (m=0x559dec5c0000) at runtime/caml/platform.h:457
#5 backup_thread_func (v=0x559dec5bff60) at runtime/domain.c:1095
#6 0x00007f0f69f3dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7 0x00007f0f69fcf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 5 (Thread 0x7f0f63fff640 (LWP 2935091)):
#0 futex_wait (private=0, expected=2, futex_word=0x559dec5bfef0) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x559dec5bfef0, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007f0f69f41002 in lll_mutex_lock_optimized (mutex=0x559dec5bfef0) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=mutex@entry=0x559dec5bfef0) at ./nptl/pthread_mutex_lock.c:93
#4 0x0000559db367462f in caml_plat_lock_blocking (m=0x559dec5bfef0) at runtime/caml/platform.h:457
#5 backup_thread_func (v=0x559dec5bfe50) at runtime/domain.c:1095
#6 0x00007f0f69f3dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7 0x00007f0f69fcf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 4 (Thread 0x7f0f69ea6740 (LWP 2934676)):
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x559dec623918) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x559dec623918) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x559dec623918, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007f0f69f3ca41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x559dec623930, cond=0x559dec6238f0) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=cond@entry=0x559dec6238f0, mutex=mutex@entry=0x559dec623930) at ./nptl/pthread_cond_wait.c:627
#5 0x0000559db369de23 in sync_condvar_wait (m=0x559dec623930, c=0x559dec6238f0) at runtime/sync_posix.h:116
#6 caml_ml_condition_wait (wcond=<optimized out>, wmut=<optimized out>) at runtime/sync.c:193
#7 <signal handler called>
#8 0x0000559db36266fe in camlStdlib__Domain$loop_769 () at domain.ml:294
#9 0x0000559db36257ee in camlStdlib__Mutex$protect_280 () at mutex.ml:28
#10 0x0000559db362669c in camlStdlib__Domain$join_766 () at domain.ml:299
#11 0x0000559db35a5d7c in camlLin_domain$run_parallel_788 () at lib/lin_domain.ml:24
#12 0x0000559db35a6147 in camlLin_domain$stress_prop_938 () at lib/lin_domain.ml:43
#13 0x0000559db35acfba in camlUtil$fun_2564 () at lib/util.ml:5
#14 0x0000559db35c43fd in camlQCheck2$loop_4296 () at src/core/QCheck2.ml:1805
#15 0x0000559db35c4344 in camlQCheck2$run_law_4291 () at src/core/QCheck2.ml:1810
#16 0x0000559db35c5104 in camlQCheck2$check_state_input_4359 () at src/core/QCheck2.ml:1933
#17 0x0000559db35c555e in camlQCheck2$check_cell_inner_11009 () at src/core/QCheck2.ml:2003
#18 0x0000559db35b53b7 in camlQCheck_base_runner$aux_map_1464 () at src/runner/QCheck_base_runner.ml:425
#19 0x0000559db360153d in camlStdlib__List$map_340 () at list.ml:85
#20 0x0000559db35b4c73 in camlQCheck_base_runner$run_tests_inner_2470 () at src/runner/QCheck_base_runner.ml:434
#21 0x0000559db35b5a4d in camlQCheck_base_runner$run_tests_main_inner_3102 () at src/runner/QCheck_base_runner.ml:478
#22 0x0000559db35a582f in camlDune__exe__Lin_tests$entry () at src/ephemeron/lin_tests.ml:35
#23 0x0000559db359f48c in caml_program ()
#24 <signal handler called>
#25 0x0000559db36a3b9a in caml_startup_common (pooling=<optimized out>, argv=0x7ffc2f6cc2a8) at runtime/startup_nat.c:127
#26 caml_startup_common (argv=0x7ffc2f6cc2a8, pooling=<optimized out>) at runtime/startup_nat.c:86
#27 0x0000559db36a3c2f in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:134
#28 caml_startup (argv=<optimized out>) at runtime/startup_nat.c:139
#29 caml_main (argv=<optimized out>) at runtime/startup_nat.c:146
#30 0x0000559db359f022 in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37
Thread 3 (Thread 0x7f0f689ba640 (LWP 2934691)):
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x559dec5bfdb4) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x559dec5bfdb4) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x559dec5bfdb4, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007f0f69f3ca41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x559dec5bfd60, cond=0x559dec5bfd88) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=cond@entry=0x559dec5bfd88, mutex=mutex@entry=0x559dec5bfd60) at ./nptl/pthread_cond_wait.c:627
#5 0x0000559db3695beb in caml_plat_wait (cond=cond@entry=0x559dec5bfd88, mut=mut@entry=0x559dec5bfd60) at runtime/platform.c:146
#6 0x0000559db367479a in backup_thread_func (v=0x559dec5bfd40) at runtime/domain.c:1087
#7 0x00007f0f69f3dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#8 0x00007f0f69fcf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 2 (Thread 0x7f0f62ffd640 (LWP 2935092)):
#0 0x0000559db36948ad in caml_gc_log (msg=msg@entry=0x559db36b0490 "requesting stw empty_minor_heap") at runtime/misc.c:95
#1 0x0000559db36943af in caml_try_empty_minor_heap_on_all_domains () at runtime/minor_gc.c:952
#2 0x0000559db3694445 in caml_empty_minor_heaps_once () at runtime/minor_gc.c:978
#3 0x0000559db36753fa in caml_domain_terminate (last=last@entry=false) at runtime/domain.c:2088
#4 0x0000559db3675a79 in domain_thread_func (v=<optimized out>) at runtime/domain.c:1274
#5 0x00007f0f69f3dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6 0x00007f0f69fcf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 1 (Thread 0x7f0f691bb640 (LWP 2935090)):
#0 caml_abort () at runtime/caml/misc.h:370
#1 caml_failed_assert (expr=expr@entry=0x559db36aeaec "Has_status_val(v, status)", file_os=file_os@entry=0x559db36ae991 "runtime/major_gc.c", line=line@entry=491) at runtime/misc.c:51
#2 0x0000559db3687c3c in orph_ephe_list_verify_status (status=<optimized out>) at runtime/major_gc.c:491
#3 0x0000559db368afd4 in caml_mark_roots_stw (participant_count=participant_count@entry=3, barrier_participants=barrier_participants@entry=0x559dec5c8550) at runtime/major_gc.c:1573
#4 0x0000559db3693bd6 in caml_stw_empty_minor_heap_no_major_slice (domain=<optimized out>, mark_requested_p=<optimized out>, participating_count=<optimized out>, participating=<optimized out>) at runtime/minor_gc.c:890
#5 0x0000559db36744d4 in stw_handler (domain=0x7f0f64002b80) at runtime/domain.c:1535
#6 handle_incoming (s=0x559dec5bfe68) at runtime/domain.c:360
#7 0x0000559db36750cc in caml_handle_incoming_interrupts () at runtime/domain.c:373
#8 caml_handle_gc_interrupt () at runtime/domain.c:1967
#9 0x0000559db369b983 in caml_do_pending_actions_res () at runtime/signals.c:344
#10 0x0000559db36944c5 in caml_alloc_small_dispatch (dom_st=0x7f0f64002b80, wosize=3, flags=3, nallocs=1, encoded_alloc_lens=0x559db3785047 <camlStdlib__Ephemeron$frametable+4431> "\002\070\v") at runtime/minor_gc.c:1003
#11 <signal handler called>
#12 0x0000559db365afc2 in camlStdlib__Ephemeron$replace_793 () at ephemeron.ml:307
#13 0x0000559db35abef6 in camlLin$run_2234 () at lib/lin.ml:515
#14 0x0000559db3607e96 in camlStdlib__Array$map_355 () at array.ml:126
#15 0x0000559db35a5b76 in camlLin_domain$interp_644 () at lib/lin_domain.ml:10
#16 0x0000559db35a5e9e in camlLin_domain$main_818 () at lib/lin_domain.ml:20
#17 0x0000559db36265c6 in camlStdlib__Domain$body_759 () at domain.ml:268
#18 <signal handler called>
#19 0x0000559db367037a in caml_callback_exn (closure=<optimized out>, closure@entry=139704173766104, arg=<optimized out>, arg@entry=1) at runtime/callback.c:206
#20 0x0000559db3670ced in caml_callback_res (closure=closure@entry=139704173766104, arg=arg@entry=1) at runtime/callback.c:321
#21 0x0000559db3675952 in domain_thread_func (v=0x7ffc2f6cc050) at runtime/domain.c:1269
#22 0x00007f0f69f3dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#23 0x00007f0f69fcf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) (I am supposed to be on holiday, so I'll let someone else debug from here 🙂 ) |
The failure was surprisingly easy to recreate using |
@NickBarnes @jmid fix for the multicoretests failure is here NickBarnes#5. |
I've given your fix a spin on the CI (incl. repeating the above failing test) and it held up nicely 👍 |
In ocaml#13580 (comment) jmid reports that he needed to tweak the GC verbosity setting to avoid getting spammed by minor-gc messages when debugging an assertion failure. The other sub-phases of the GC minor all uses `caml_gc_log` rather than CAML_GC_MESSAGE, and do not seem to cause similar spamming issues. Fixing the code to be consistent will avoid inconsistent verbosity levels in end-user scripts.
Move the orphaned ephemerons GC colour check inside the barrier.
Multicoretests run passes now: https://github.com/ocaml/ocaml/actions/runs/16492771907/job/46633963474?pr=13580. The other multicoretests failure is unrelated to this PR's changes. See earlier comment: #13580 (comment). I believe the PR is good to merge. |
Merging the PR now. Thanks @NickBarnes. Looking forward to the GC pacing improvement PRs next. |
In ocaml#13580 (comment) jmid reports that he needed to tweak the GC verbosity setting to avoid getting spammed by minor-gc messages when debugging an assertion failure. The other sub-phases of the GC minor all uses `caml_gc_log` rather than CAML_GC_MESSAGE, and do not seem to cause similar spamming issues. Fixing the code to be consistent will avoid inconsistent verbosity levels in end-user scripts.
This is the upstreaming of oxcaml/oxcaml#2348 oxcaml/oxcaml#2358 (minor) and oxcaml/oxcaml#3029 by @stedolan. It introduces a new sweep-only phase at the start of each major GC cycle. This reduces the "latent garbage delay" - the time between a block becoming unreachable and it becoming available for allocation - by approximately half a major GC cycle.
Because marking, including root marking, doesn't take place until part-way through the GC cycle (when we move from sweep-only to mark-and-sweep), the allocation colour is not always
MARKED
but changes fromUNMARKED
toMARKED
at that point. Effectively we switch from a grey mutator allocating white to a black mutator allocating black.This PR is in draft because I've just done a fairly mechanical (although manual) patch application; I'm publishing it so that @stedolan and perhaps @kayceesrk can take a look. It passes the whole testsuite on my machine, including the new test (
parallel/churn.ml
) written by @stedolan for theflambda-backend
mark-delay work.