Skip to content

Commit abeecee

Browse files
d-nettogbaraldivchuravy
authored
Implement parallel marking (#48600)
Using a work-stealing queue after Chase and Lev, optimized for weak memory models by Le et al. Default number of GC threads is half the number of compute threads. Co-authored-by: Gabriel Baraldi <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
1 parent 4158640 commit abeecee

30 files changed

+722
-156
lines changed

NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,15 @@ Language changes
1717

1818
Compiler/Runtime improvements
1919
-----------------------------
20+
2021
* The `@pure` macro is now deprecated. Use `Base.@assume_effects :foldable` instead ([#48682]).
22+
* The mark phase of the Garbage Collector is now multi-threaded ([#48600]).
2123

2224
Command-line option changes
2325
---------------------------
2426

27+
* New option `--gcthreads` to set how many threads will be used by the Garbage Collector ([#48600]).
28+
The default is set to `N/2` where `N` is the amount of worker threads (`--threads`) used by Julia.
2529

2630
Multi-threading changes
2731
-----------------------

base/options.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ struct JLOptions
1111
cpu_target::Ptr{UInt8}
1212
nthreadpools::Int16
1313
nthreads::Int16
14+
ngcthreads::Int16
1415
nthreads_per_pool::Ptr{Int16}
1516
nprocs::Int32
1617
machine_file::Ptr{UInt8}

base/threadingconstructs.jl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,13 @@ function threadpooltids(pool::Symbol)
9999
end
100100
end
101101

102+
"""
103+
Threads.ngcthreads() -> Int
104+
105+
Returns the number of GC threads currently configured.
106+
"""
107+
ngcthreads() = Int(unsafe_load(cglobal(:jl_n_gcthreads, Cint))) + 1
108+
102109
function threading_run(fun, static)
103110
ccall(:jl_enter_threaded_region, Cvoid, ())
104111
n = threadpoolsize()

doc/man/julia.1

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,11 @@ supported (Linux and Windows). If this is not supported (macOS) or
118118
process affinity is not configured, it uses the number of CPU
119119
threads.
120120

121+
.TP
122+
--gcthreads <n>
123+
Enable n GC threads; If unspecified is set to half of the
124+
compute worker threads.
125+
121126
.TP
122127
-p, --procs {N|auto}
123128
Integer value N launches N additional local worker processes `auto` launches as many workers

doc/src/base/multi-threading.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Base.Threads.nthreads
1010
Base.Threads.threadpool
1111
Base.Threads.nthreadpools
1212
Base.Threads.threadpoolsize
13+
Base.Threads.ngcthreads
1314
```
1415

1516
See also [Multi-Threading](@ref man-multithreading).

doc/src/manual/command-line-interface.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ The following is a complete list of command-line switches available when launchi
107107
|`-E`, `--print <expr>` |Evaluate `<expr>` and display the result|
108108
|`-L`, `--load <file>` |Load `<file>` immediately on all processors|
109109
|`-t`, `--threads {N\|auto}` |Enable N threads; `auto` tries to infer a useful default number of threads to use but the exact behavior might change in the future. Currently, `auto` uses the number of CPUs assigned to this julia process based on the OS-specific affinity assignment interface, if supported (Linux and Windows). If this is not supported (macOS) or process affinity is not configured, it uses the number of CPU threads.|
110+
| `--gcthreads {N}` |Enable N GC threads; If unspecified is set to half of the compute worker threads.|
110111
|`-p`, `--procs {N\|auto}` |Integer value N launches N additional local worker processes; `auto` launches as many workers as the number of local CPU threads (logical cores)|
111112
|`--machine-file <file>` |Run processes on hosts listed in `<file>`|
112113
|`-i` |Interactive mode; REPL runs and `isinteractive()` is true|

doc/src/manual/environment-variables.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,14 @@ then spinning threads never sleep. Otherwise, `$JULIA_THREAD_SLEEP_THRESHOLD` is
316316
interpreted as an unsigned 64-bit integer (`uint64_t`) and gives, in
317317
nanoseconds, the amount of time after which spinning threads should sleep.
318318

319+
### [`JULIA_NUM_GC_THREADS`](@id env-gc-threads)
320+
321+
Sets the number of threads used by Garbage Collection. If unspecified is set to
322+
half of the number of worker threads.
323+
324+
!!! compat "Julia 1.10"
325+
The environment variable was added in 1.10
326+
319327
### [`JULIA_IMAGE_THREADS`](@id env-image-threads)
320328

321329
An unsigned 32-bit integer that sets the number of threads used by image

doc/src/manual/multi-threading.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ julia> Threads.threadid()
7272
three processes have 2 threads enabled. For more fine grained control over worker
7373
threads use [`addprocs`](@ref) and pass `-t`/`--threads` as `exeflags`.
7474

75+
### Multiple GC Threads
76+
77+
The Garbage Collector (GC) can use multiple threads. The amount used is either half the number
78+
of compute worker threads or configured by either the `--gcthreads` command line argument or by using the
79+
[`JULIA_NUM_GC_THREADS`](@ref env-gc-threads) environment variable.
80+
81+
!!! compat "Julia 1.10"
82+
The `--gcthreads` command line argument requires at least Julia 1.10.
83+
7584
## [Threadpools](@id man-threadpools)
7685

7786
When a program's threads are busy with many tasks to run, tasks may experience

src/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ ifeq ($(USE_SYSTEM_LIBUV),0)
9999
UV_HEADERS += uv.h
100100
UV_HEADERS += uv/*.h
101101
endif
102-
PUBLIC_HEADERS := $(BUILDDIR)/julia_version.h $(wildcard $(SRCDIR)/support/*.h) $(addprefix $(SRCDIR)/,julia.h julia_assert.h julia_threads.h julia_fasttls.h julia_locks.h julia_atomics.h jloptions.h)
102+
PUBLIC_HEADERS := $(BUILDDIR)/julia_version.h $(wildcard $(SRCDIR)/support/*.h) $(addprefix $(SRCDIR)/,work-stealing-queue.h julia.h julia_assert.h julia_threads.h julia_fasttls.h julia_locks.h julia_atomics.h jloptions.h)
103103
ifeq ($(OS),WINNT)
104104
PUBLIC_HEADERS += $(addprefix $(SRCDIR)/,win32_ucontext.h)
105105
endif

src/gc-debug.c

Lines changed: 30 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -198,12 +198,21 @@ static void restore(void)
198198

199199
static void gc_verify_track(jl_ptls_t ptls)
200200
{
201+
// `gc_verify_track` is limited to single-threaded GC
202+
if (jl_n_gcthreads != 0)
203+
return;
201204
do {
202205
jl_gc_markqueue_t mq;
203-
mq.current = mq.start = ptls->mark_queue.start;
204-
mq.end = ptls->mark_queue.end;
205-
mq.current_chunk = mq.chunk_start = ptls->mark_queue.chunk_start;
206-
mq.chunk_end = ptls->mark_queue.chunk_end;
206+
jl_gc_markqueue_t *mq2 = &ptls->mark_queue;
207+
ws_queue_t *cq = &mq.chunk_queue;
208+
ws_queue_t *q = &mq.ptr_queue;
209+
jl_atomic_store_relaxed(&cq->top, 0);
210+
jl_atomic_store_relaxed(&cq->bottom, 0);
211+
jl_atomic_store_relaxed(&cq->array, jl_atomic_load_relaxed(&mq2->chunk_queue.array));
212+
jl_atomic_store_relaxed(&q->top, 0);
213+
jl_atomic_store_relaxed(&q->bottom, 0);
214+
jl_atomic_store_relaxed(&q->array, jl_atomic_load_relaxed(&mq2->ptr_queue.array));
215+
arraylist_new(&mq.reclaim_set, 32);
207216
arraylist_push(&lostval_parents_done, lostval);
208217
jl_safe_printf("Now looking for %p =======\n", lostval);
209218
clear_mark(GC_CLEAN);
@@ -214,7 +223,7 @@ static void gc_verify_track(jl_ptls_t ptls)
214223
gc_mark_finlist(&mq, &ptls2->finalizers, 0);
215224
}
216225
gc_mark_finlist(&mq, &finalizer_list_marked, 0);
217-
gc_mark_loop_(ptls, &mq);
226+
gc_mark_loop_serial_(ptls, &mq);
218227
if (lostval_parents.len == 0) {
219228
jl_safe_printf("Could not find the missing link. We missed a toplevel root. This is odd.\n");
220229
break;
@@ -248,11 +257,22 @@ static void gc_verify_track(jl_ptls_t ptls)
248257

249258
void gc_verify(jl_ptls_t ptls)
250259
{
260+
// `gc_verify` is limited to single-threaded GC
261+
if (jl_n_gcthreads != 0) {
262+
jl_safe_printf("Warn. GC verify disabled in multi-threaded GC\n");
263+
return;
264+
}
251265
jl_gc_markqueue_t mq;
252-
mq.current = mq.start = ptls->mark_queue.start;
253-
mq.end = ptls->mark_queue.end;
254-
mq.current_chunk = mq.chunk_start = ptls->mark_queue.chunk_start;
255-
mq.chunk_end = ptls->mark_queue.chunk_end;
266+
jl_gc_markqueue_t *mq2 = &ptls->mark_queue;
267+
ws_queue_t *cq = &mq.chunk_queue;
268+
ws_queue_t *q = &mq.ptr_queue;
269+
jl_atomic_store_relaxed(&cq->top, 0);
270+
jl_atomic_store_relaxed(&cq->bottom, 0);
271+
jl_atomic_store_relaxed(&cq->array, jl_atomic_load_relaxed(&mq2->chunk_queue.array));
272+
jl_atomic_store_relaxed(&q->top, 0);
273+
jl_atomic_store_relaxed(&q->bottom, 0);
274+
jl_atomic_store_relaxed(&q->array, jl_atomic_load_relaxed(&mq2->ptr_queue.array));
275+
arraylist_new(&mq.reclaim_set, 32);
256276
lostval = NULL;
257277
lostval_parents.len = 0;
258278
lostval_parents_done.len = 0;
@@ -265,7 +285,7 @@ void gc_verify(jl_ptls_t ptls)
265285
gc_mark_finlist(&mq, &ptls2->finalizers, 0);
266286
}
267287
gc_mark_finlist(&mq, &finalizer_list_marked, 0);
268-
gc_mark_loop_(ptls, &mq);
288+
gc_mark_loop_serial_(ptls, &mq);
269289
int clean_len = bits_save[GC_CLEAN].len;
270290
for(int i = 0; i < clean_len + bits_save[GC_OLD].len; i++) {
271291
jl_taggedvalue_t *v = (jl_taggedvalue_t*)bits_save[i >= clean_len ? GC_OLD : GC_CLEAN].items[i >= clean_len ? i - clean_len : i];
@@ -1268,30 +1288,6 @@ int gc_slot_to_arrayidx(void *obj, void *_slot) JL_NOTSAFEPOINT
12681288
return (slot - start) / elsize;
12691289
}
12701290

1271-
// Print a backtrace from the `mq->start` of the mark queue up to `mq->current`
1272-
// `offset` will be added to `mq->current` for convenience in the debugger.
1273-
NOINLINE void gc_mark_loop_unwind(jl_ptls_t ptls, jl_gc_markqueue_t *mq, int offset)
1274-
{
1275-
jl_jmp_buf *old_buf = jl_get_safe_restore();
1276-
jl_jmp_buf buf;
1277-
jl_set_safe_restore(&buf);
1278-
if (jl_setjmp(buf, 0) != 0) {
1279-
jl_safe_printf("\n!!! ERROR when unwinding gc mark loop -- ABORTING !!!\n");
1280-
jl_set_safe_restore(old_buf);
1281-
return;
1282-
}
1283-
jl_value_t **start = mq->start;
1284-
jl_value_t **end = mq->current + offset;
1285-
for (; start < end; start++) {
1286-
jl_value_t *obj = *start;
1287-
jl_taggedvalue_t *o = jl_astaggedvalue(obj);
1288-
jl_safe_printf("Queued object: %p :: (tag: %zu) (bits: %zu)\n", obj,
1289-
(uintptr_t)o->header, ((uintptr_t)o->header & 3));
1290-
jl_((void*)(jl_datatype_t *)(o->header & ~(uintptr_t)0xf));
1291-
}
1292-
jl_set_safe_restore(old_buf);
1293-
}
1294-
12951291
static int gc_logging_enabled = 0;
12961292

12971293
JL_DLLEXPORT void jl_enable_gc_logging(int enable) {

0 commit comments

Comments
 (0)