-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Fix getting affinity set on MUSL on Jetson TX2 #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the number of processors reported as configured in the system is always larger than the maximum CPU index. However, it turns out that it is not true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and 5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports 6 as the number of configured CPUs, however MUSL reports just 4. The PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as configured as the upper bound for scanning affinity set, so on Jetson TX2, the affinity mask returned had just two bits set while there were 4 CPUs. That triggered an assert in the GCToOSInterface::Initialize. This change fixes that by reading the maximum CPU index from the /proc/cpuinfo. It falls back to using the number of processors configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...)
|
cc: @jashook |
| { | ||
| const SIZE_T BitsPerBitsetEntry = 8 * sizeof(UINT_PTR); | ||
| int nrcpus = PAL_GetTotalCpuCount(); | ||
| int maxCpuIndex = GetMaxCpuIndex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use CPU_COUNT macro instead of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU_COUNT counts number of set bits in the set, that's not what we want. We could possibly use the CPU_SETSIZE macro, however it would mean that we would scan 1024 bits (the current value of the macro). But maybe you are right, we call this function just once in the whole process lifetime, so it wouldn't hurt. And if we ever need to use it more, we can always add this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU_COUNT counts number of set bits in the set
You can potentially use it to exit the loop once all set bits are processed, without going all the way to 1024.
But I agree that looping till CPU_SETSIZE should be ok for one-time initialization like this.
|
Also add matching change to |
|
|
||
| if (maxCpuIndex == -1) | ||
| { | ||
| FILE* cpuInfoFile = fopen("/proc/cpuinfo", "r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be under #ifdef __linux__ or something like this. (procfs exists in other Unices but cpuinfo is Linux-only.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intentionally not like that. I try to avoid checking for specific platform if possible. This code is run just once and on non-linux platforms, it fails to open the file and uses the fallback path.
| char *line = nullptr; | ||
| size_t lineLen = 0; | ||
|
|
||
| while (getline(&line, &lineLen, cpuInfoFile) != -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not use fscanf() directly? Saves a malloc roundtrip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run this just once, so couple of mallocs don't hurt. And this is a pattern that we use at quite a number of places, so I wanted to be consistent. Also, for fscanf, I would need to use a fixed line buffer, assume max line size etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No need to pass a buffer to fscanf() if all you're looking for is an integer with %d.)
| while (getline(&line, &lineLen, cpuInfoFile) != -1) | ||
| { | ||
| int cpuIndex; | ||
| int fieldsParsed = sscanf(line, "processor : %d", &cpuIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I looked at this, but the format of /proc/cpuinfo is completely different for each and every architecture supported by Linux. It just so happens that both x86 (32 and 64-bit) and ARM (32 and 64 bit) have similar structure there as far as lines matching that scanf() format string, but this is quite fragile.
A better way on Linux is to read /sys/devices/system/cpu/online (for the online CPUs) or /sys/devices/system/cpu/possible (for the possible CPUs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I didn't know the format is so architecture specific that it may not contain the "processor:" line. Thank for pointing it out.
Right, we need it there too. |
Remove getting the max CPU index and use CPU_SETSIZE instead. Since the function is executed just once during startup, it is fine.
|
Do we also need to fix GetProcessorForHeap ? It also loops till |
You are right, we do. |
The loop was scanning only bits below the total processor count
|
Nit: We are going to actively discourage creating PR branches under dotnet/runtime. |
|
@jkotas oops, that has happened by accident. I've cloned runtime into a new folder locally and forgotten to update the remotes appropriately. |
Ports dotnet/runtime#206 to release/3.1. The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the number of processors reported as configured in the system is always larger than the maximum CPU index. However, it turns out that it is not true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and 5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports 6 as the number of configured CPUs, however MUSL reports just 4. The PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as configured as the upper bound for scanning affinity set, so on Jetson TX2, the affinity mask returned had just two bits set while there were 4 CPUs. That triggered an assert in the GCToOSInterface::Initialize. This change fixes that by reading the maximum CPU index from the /proc/cpuinfo. It falls back to using the number of processors configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...) Fixes dotnet/runtime#170
Ports dotnet/runtime#206 to release/3.1. The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the number of processors reported as configured in the system is always larger than the maximum CPU index. However, it turns out that it is not true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and 5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports 6 as the number of configured CPUs, however MUSL reports just 4. The PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as configured as the upper bound for scanning affinity set, so on Jetson TX2, the affinity mask returned had just two bits set while there were 4 CPUs. That triggered an assert in the GCToOSInterface::Initialize. This change fixes that by reading the maximum CPU index from the /proc/cpuinfo. It falls back to using the number of processors configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...) Fixes dotnet/runtime#170
The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the
number of processors reported as configured in the system is always
larger than the maximum CPU index. However, it turns out that it is not
true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and
5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports
6 as the number of configured CPUs, however MUSL reports just 4. The
PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as
configured as the upper bound for scanning affinity set, so on Jetson
TX2, the affinity mask returned had just two bits set while there were
4 CPUs. That triggered an assert in the GCToOSInterface::Initialize.
This change fixes that by reading the maximum CPU index from the
/proc/cpuinfo. It falls back to using the number of processors
configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...)
Fixes #170