Fix buffer overruns in GC code #74847

AntonLapounov · 2022-08-30T23:17:32Z

Under some conditions we may either create or try to create too many GC heaps, which may lead to buffer overruns in GC code. For example, a user may override the number of processors available to the process (e.g., by setting DOTNET_PROCESSOR_COUNT=100 and DOTNET_GCNoAffinitize=1). Or an app may be running under Windows 11 on a machine with asymmetric processor groups, where GetSystemInfo and GetProcessAffinityMask calls may return inconsistent results¹. In those cases, g_num_active_processors (the number of heaps we create if not additionally overridden) may become greater than g_num_processors (the number of slots we allocate for handle tables). That leads to buffer overruns when a thread with the home heap number that is greater or equal to g_num_processors tries to create a GC handle.

Additionally, the heap_select::init function in gc.cpp may overrun the proc_no_to_numa_node array at this line:

                proc_no_to_numa_node[proc_no[i]] = cur_node_no;

in case we exited the previous loop early and left the proc_no[i] value uninitialized.

The fix is to ensure that g_num_active_processors is never greater than g_num_processors and to iterate only through the initialized part of the proc_no array.

We may want to address that separately. ↩

ghost · 2022-08-30T23:17:46Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Under some conditions we may either create or try to create too many GC heaps, which may lead to buffer overruns in GC code. For example, a user may override the number of processors available to the process (e.g., by setting DOTNET_PROCESSOR_COUNT=100 and DOTNET_GCNoAffinitize=1). Or an app may be running under Windows 11 on a machine with asymmetric processor groups, where GetSystemInfo and GetProcessAffinityMask calls may return inconsistent results¹. In those cases, g_num_active_processors (the number of heaps we create if not additionally overridden) may become greater than g_num_processors (the number of slots we allocate for handle tables). That leads to buffer overruns when a thread with the home heap number that is greater or equal to g_num_processors tries to create a GC handle. Additionally, the heap_select::init function in gc.cpp may overrun the proc_no_to_numa_node array at this line:

                proc_no_to_numa_node[proc_no[i]] = cur_node_no;

in case we exited the previous loop early and left the proc_no[i] value uninitialized.

The fix is to ensure that g_num_active_processors is never greater than g_num_processors and to iterate only through the initialized part of the proc_no array.

Author:	AntonLapounov
Assignees:	-
Labels:	`area-GC-coreclr`
Milestone:	7.0.0

We may want to address that separately. ↩

AntonLapounov · 2022-08-30T23:19:12Z

src/coreclr/gc/windows/gcenv.windows.cpp

    uint16_t procIndex = 0;
    size_t cnt = heap_number;
-    for (uint16_t i = 0; i < GCToOSInterface::GetTotalProcessorCount(); i++)
+    for (uint16_t i = 0; i < MAX_SUPPORTED_CPUS; i++)


For consistency with the implementation in gcenv.os.cpp; see also #206 for the context.

AntonLapounov · 2022-08-31T06:36:16Z

The failure in the Linux_musl arm leg seems unrelated; it has been reported ([1], [2]) in the past.

Assert failure(PID 22 [0x00000016], Thread: 59 [0x003b]): RawGetMethodTable()
    File: /__w/1/s/src/coreclr/gc/gc.cpp Line: 4539
    Image: /root/helix/work/correlation/dotnet

jkotas · 2022-08-31T14:56:37Z

The failure in the Linux_musl arm leg seems unrelated; it has been reported (#69927 (comment), #69204 (comment)) in the past.

This assert is a generic sign of GC heap corruption that can have many different root causes. All past instances of this crash are believed to be fixed. We do not have an active issue on it.

If you believe that this is unrelated, you should get a new active issue created on it.

AntonLapounov · 2022-08-31T23:12:23Z

I have opened #74895, but I cannot get the call stack from the dump. LLDB on Ubuntu ARM64 displays error: Don't know how to parse core file. Unsupported OS. and WinDbg displays just these two frames:

00  ld_musl_armhf_so!sigsetjmp+0x35
01  ld_musl_armhf_so!raise+0x2e

janvorli · 2022-09-01T21:25:18Z

You cannot open a dump taken on a MUSL based distro on a Glibc based ones. And in fact, to get meaningful stack trace, you need to use exactly the same distro and version of the distro (it also has to have the same version of glibc) where the core was taken.

janvorli · 2022-09-01T21:25:42Z

Docker is the way I use in these cases.

janvorli

LGTM, thank you!

AntonLapounov · 2022-09-02T00:01:28Z

/backport to release/7.0

github-actions · 2022-09-02T00:02:48Z

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/2975767222

AntonLapounov · 2022-09-02T00:30:16Z

And in fact, to get meaningful stack trace, you need to use exactly the same distro and version of the distro (it also has to have the same version of glibc) where the core was taken.

Do we have instructions written anywhere? If not, we should document them.

Fix buffer overruns in GC code

26d0af4

AntonLapounov added the area-GC-coreclr label Aug 30, 2022

AntonLapounov added this to the 7.0.0 milestone Aug 30, 2022

AntonLapounov requested a review from Maoni0 August 30, 2022 23:17

ghost assigned AntonLapounov Aug 30, 2022

AntonLapounov commented Aug 30, 2022

View reviewed changes

AntonLapounov requested a review from cshung August 30, 2022 23:29

AntonLapounov requested review from janvorli and mangod9 September 1, 2022 20:54

mangod9 approved these changes Sep 1, 2022

View reviewed changes

janvorli approved these changes Sep 1, 2022

View reviewed changes

AntonLapounov merged commit 2184b18 into dotnet:main Sep 2, 2022

AntonLapounov deleted the FixBufferOverrunsInGC branch September 2, 2022 00:00

github-actions bot mentioned this pull request Sep 2, 2022

[release/7.0] Fix buffer overruns in GC code #74974

Merged

ghost locked as resolved and limited conversation to collaborators Oct 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix buffer overruns in GC code #74847

Fix buffer overruns in GC code #74847

Uh oh!

AntonLapounov commented Aug 30, 2022 •

edited

Loading

Uh oh!

ghost commented Aug 30, 2022

Uh oh!

AntonLapounov Aug 30, 2022

Uh oh!

AntonLapounov commented Aug 31, 2022 •

edited

Loading

Uh oh!

jkotas commented Aug 31, 2022

Uh oh!

AntonLapounov commented Aug 31, 2022 •

edited

Loading

Uh oh!

janvorli commented Sep 1, 2022

Uh oh!

janvorli commented Sep 1, 2022

Uh oh!

janvorli left a comment

Uh oh!

AntonLapounov commented Sep 2, 2022

Uh oh!

github-actions bot commented Sep 2, 2022

Uh oh!

AntonLapounov commented Sep 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix buffer overruns in GC code #74847

Fix buffer overruns in GC code #74847

Uh oh!

Conversation

AntonLapounov commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

ghost commented Aug 30, 2022

Footnotes

Uh oh!

AntonLapounov Aug 30, 2022

Choose a reason for hiding this comment

Uh oh!

AntonLapounov commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Aug 31, 2022

Uh oh!

AntonLapounov commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janvorli commented Sep 1, 2022

Uh oh!

janvorli commented Sep 1, 2022

Uh oh!

janvorli left a comment

Choose a reason for hiding this comment

Uh oh!

AntonLapounov commented Sep 2, 2022

Uh oh!

github-actions bot commented Sep 2, 2022

Uh oh!

AntonLapounov commented Sep 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AntonLapounov commented Aug 30, 2022 •

edited

Loading

AntonLapounov commented Aug 31, 2022 •

edited

Loading

AntonLapounov commented Aug 31, 2022 •

edited

Loading