Skip to content

Conversation

@janvorli
Copy link
Member

The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the
number of processors reported as configured in the system is always
larger than the maximum CPU index. However, it turns out that it is not
true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and
5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports
6 as the number of configured CPUs, however MUSL reports just 4. The
PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as
configured as the upper bound for scanning affinity set, so on Jetson
TX2, the affinity mask returned had just two bits set while there were
4 CPUs. That triggered an assert in the GCToOSInterface::Initialize.

This change fixes that by reading the maximum CPU index from the
/proc/cpuinfo. It falls back to using the number of processors
configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...)

Fixes #170

The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the
number of processors reported as configured in the system is always
larger than the maximum CPU index. However, it turns out that it is not
true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and
5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports
6 as the number of configured CPUs, however MUSL reports just 4. The
PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as
configured as the upper bound for scanning affinity set, so on Jetson
TX2, the affinity mask returned had just two bits set while there were
4 CPUs. That triggered an assert in the GCToOSInterface::Initialize.

This change fixes that by reading the maximum CPU index from the
/proc/cpuinfo. It falls back to using the number of processors
configured when the /proc/cpuinfo is not available (on macOS, FreeBSD,
...)
@janvorli janvorli added this to the 5.0 milestone Nov 22, 2019
@janvorli janvorli requested review from jkotas and lpereira November 22, 2019 00:28
@janvorli janvorli self-assigned this Nov 22, 2019
@janvorli
Copy link
Member Author

cc: @jashook

{
const SIZE_T BitsPerBitsetEntry = 8 * sizeof(UINT_PTR);
int nrcpus = PAL_GetTotalCpuCount();
int maxCpuIndex = GetMaxCpuIndex();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use CPU_COUNT macro instead of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU_COUNT counts number of set bits in the set, that's not what we want. We could possibly use the CPU_SETSIZE macro, however it would mean that we would scan 1024 bits (the current value of the macro). But maybe you are right, we call this function just once in the whole process lifetime, so it wouldn't hurt. And if we ever need to use it more, we can always add this optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU_COUNT counts number of set bits in the set

You can potentially use it to exit the loop once all set bits are processed, without going all the way to 1024.

But I agree that looping till CPU_SETSIZE should be ok for one-time initialization like this.

@jkotas
Copy link
Member

jkotas commented Nov 22, 2019

Also add matching change to src\gc\unix\gcenv.unix.cpp ?


if (maxCpuIndex == -1)
{
FILE* cpuInfoFile = fopen("/proc/cpuinfo", "r");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under #ifdef __linux__ or something like this. (procfs exists in other Unices but cpuinfo is Linux-only.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intentionally not like that. I try to avoid checking for specific platform if possible. This code is run just once and on non-linux platforms, it fails to open the file and uses the fallback path.

char *line = nullptr;
size_t lineLen = 0;

while (getline(&line, &lineLen, cpuInfoFile) != -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not use fscanf() directly? Saves a malloc roundtrip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run this just once, so couple of mallocs don't hurt. And this is a pattern that we use at quite a number of places, so I wanted to be consistent. Also, for fscanf, I would need to use a fixed line buffer, assume max line size etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to pass a buffer to fscanf() if all you're looking for is an integer with %d.)

while (getline(&line, &lineLen, cpuInfoFile) != -1)
{
int cpuIndex;
int fieldsParsed = sscanf(line, "processor : %d", &cpuIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I looked at this, but the format of /proc/cpuinfo is completely different for each and every architecture supported by Linux. It just so happens that both x86 (32 and 64-bit) and ARM (32 and 64 bit) have similar structure there as far as lines matching that scanf() format string, but this is quite fragile.

A better way on Linux is to read /sys/devices/system/cpu/online (for the online CPUs) or /sys/devices/system/cpu/possible (for the possible CPUs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't know the format is so architecture specific that it may not contain the "processor:" line. Thank for pointing it out.

@janvorli
Copy link
Member Author

Also add matching change to src\gc\unix\gcenv.unix.cpp ?

Right, we need it there too.

Remove getting the max CPU index and use CPU_SETSIZE instead. Since the
function is executed just once during startup, it is fine.
@jkotas
Copy link
Member

jkotas commented Nov 22, 2019

Do we also need to fix GetProcessorForHeap ? It also loops till GetTotalProcessorCount only.

@janvorli
Copy link
Member Author

Do we also need to fix GetProcessorForHeap ? It also loops till GetTotalProcessorCount only.

You are right, we do.

The loop was scanning only bits below the total processor count
@janvorli
Copy link
Member Author

@jkotas, I've fixed the GetProcessorForHeap too.
@lpereira In the end, I've removed getting the max cpu index and scan the whole set instead.

@janvorli janvorli merged commit 82d1049 into master Nov 22, 2019
@janvorli janvorli deleted the fix-getcurrenthreadaffinityset-jetsontx2 branch November 22, 2019 16:53
@jkotas
Copy link
Member

jkotas commented Nov 22, 2019

Nit: We are going to actively discourage creating PR branches under dotnet/runtime.

https://github.com/dotnet/consolidation/blob/master/Documentation/issues-pr-management.md#what-will-be-dotnetruntimes-branch-policy

@janvorli
Copy link
Member Author

@jkotas oops, that has happened by accident. I've cloned runtime into a new folder locally and forgotten to update the remotes appropriately.

janvorli added a commit to janvorli/coreclr that referenced this pull request Nov 29, 2019
Ports dotnet/runtime#206 to release/3.1.

The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the
number of processors reported as configured in the system is always
larger than the maximum CPU index. However, it turns out that it is not
true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and
5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports
6 as the number of configured CPUs, however MUSL reports just 4. The
PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as
configured as the upper bound for scanning affinity set, so on Jetson
TX2, the affinity mask returned had just two bits set while there were
4 CPUs. That triggered an assert in the GCToOSInterface::Initialize.

This change fixes that by reading the maximum CPU index from the
/proc/cpuinfo. It falls back to using the number of processors
configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...)

Fixes dotnet/runtime#170
Anipik pushed a commit to dotnet/coreclr that referenced this pull request Jan 14, 2020
Ports dotnet/runtime#206 to release/3.1.

The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the
number of processors reported as configured in the system is always
larger than the maximum CPU index. However, it turns out that it is not
true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and
5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports
6 as the number of configured CPUs, however MUSL reports just 4. The
PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as
configured as the upper bound for scanning affinity set, so on Jetson
TX2, the affinity mask returned had just two bits set while there were
4 CPUs. That triggered an assert in the GCToOSInterface::Initialize.

This change fixes that by reading the maximum CPU index from the
/proc/cpuinfo. It falls back to using the number of processors
configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...)

Fixes dotnet/runtime#170
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Linux][Musl][Arm64] Unable to run on testing on ubuntu.1804.arm64.armarch queues

4 participants