Skip to content

[NO MERGE - FOR NET 11] Updating the managed baselines to x86-64-v2 and armv8-a + lse #118101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tannergooding
Copy link
Member

This updates the baseline ISA targeted by NAOT and the JIT to more modern baselines, allowing a significant simplification of the logic.

This is setup in a staged manner to help reduce support and troubleshooting costs. Namely while it updates what the JIT and NAOT output defaults to, it doesn't update what the native code of the runtime itself is compiled against. This allows us to more readily output an appropriate warning that the hardware is out of support and why. We will be able to migrate the native side forward in a release or two and remove the error message at that time.

For xarch, the baseline is changing from SSE2 (x86-64-v1, an ~2004 baseline) to SSE4.2+POPCNT (x86-64-v2, an ~2008 baseline). The last CPUs that didn't provide x86-64-v2 support were discontinued (went out of support) around 2013 (Bonnell microarchitecture, which was part of the Intel Atom lineup for low end machines). Windows 11 has correspondingly required SSE4.1 or later since at least 2021 (https://learn.microsoft.com/en-us/windows-hardware/design/minimum/minimum-hardware-requirements-overview).

For Arm64, the baseline is changing from armv8-a (which includes neon) to also include lse (built-in atomics support). Such a feature is required on Windows (see the above doc) and on MacOS already.

It would be possible to raise the baseline just for specific OS, but at the potential cost of an increased complexity in the testing matrix. We notably have this somewhat already as MacOS defaults to armv8.5-a, while Windows and Linux default to armv8-a. These new baselines are still low enough that we should wait on customer feedback before taking further considerations in this space. Most notably, raising the baseline Arm64 baseline on Linux to include LSE would drop support for the Raspberry Pi, which may be undesirable. -- For xarch, there aren't really many concerns about the newer baseline requirement with Windows 10 going out of support Oct 14, 2025. Azure exclusively provides x86-64-v3 and later CPUs, as do most other major cloud providers. Likewise, resources like Steam Hardware Survey (https://store.steampowered.com/hwsurvey) show that 99.79% of reporting hardware supports the new baseline. It is expected that a very small percentage of client users on over decade old, out of support, CPUs might be impacted.

@tannergooding tannergooding added this to the 11.0.0 milestone Jul 27, 2025
@tannergooding tannergooding added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jul 27, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 27, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member Author

tannergooding commented Jul 27, 2025

CC. @jkotas, @MichalStrehovsky

I've put this up for early review/feedback so that when approved, it can go in as early as possible in .NET 11 and allow us to get the most feedback. There is likely still more cleanup possible; the initial PR here was focused on getting the bulk obvious cleanup/simplification done without looking deeper into possible refactorings.

@risc-vv
Copy link

risc-vv commented Jul 27, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@jkotas
Copy link
Member

jkotas commented Jul 27, 2025

For Arm64, we need to worry about Android too. Android baseline is not anywhere near including LSE as far as I can tell. We treat Android as a Linux flavor wrt codegen. If we start diverging codegen between Linux and Android, things will get more complicated.

I think the x64 and Windows/macOS arm64 part of this change is fine, Linux Arm64 baseline should stay as is.

@tannergooding
Copy link
Member Author

I think the x64 and Windows/macOS arm64 part of this change is fine, Linux Arm64 baseline should stay as is.

👍, looks like even Android 16 still presumes 8.0-a.

Do you think it is worth bumping Windows fully to armv8.1-a (roughly equivalent to armv8.0-a + lse + crc + rdm) or armv8.2-a + rcpc (what was indicated was required for Win11 24H2) for that case then? -- MacOS is already targeting apple-m1, which is armv8.5-a.

We notably won't get any of the JIT simplifications for Arm64 while Linux still targets older. But it may improve the NAOT codegen, minimally. For Windows, just the LSE baseline would allow us to remove some of the branching or explicit barriers in various helpers.


There's also the consideration for crossgen/ready-to-run. This has effectively targeted SSE4.2+POPCNT on x64 for a while. Do we want to keep that as is, just targeting the baseline, or do we feel that having it target x86-64-v3 (AVX2 capable hardware) is worthwhile? -- This one of course won't restrict what hardware users can run against, it just means slower startup for devs without AVX2 support.

@jkotas
Copy link
Member

jkotas commented Jul 27, 2025

Do you think it is worth bumping Windows fully to armv8.1-a (roughly equivalent to armv8.0-a + lse + crc + rdm) or armv8.2-a + rcpc (what was indicated was required for Win11 24H2) for that case then?

It is fine to bump the baseline up to the oldest supported Windows Arm64 hardware. Now, I am not sure what the oldest supported Windows Arm64 hardware is going to be for .NET 11. https://github.com/dotnet/core/blob/main/release-notes/10.0/supported-os.md#windows lists 10 1809 (E) that I believe runs on the first-generation Windows Arm64 laptops. If this line stays as is and includes Arm64 support, it should be our baseline.

do we feel that having it target x86-64-v3 (AVX2 capable hardware) is worthwhile?

I would check whether changing the baseline reduces number of JITed methods for a simple ASP.NET app. If it does by say 10 methods or more, I would say that it is worth it. We use R2R as first tier, so the number of JITed methods is what matters the most, the code quality does not matter as much.

@tannergooding
Copy link
Member Author

It is fine to bump the baseline up to the oldest supported Windows Arm64 hardware. Now, I am not sure what the oldest supported Windows Arm64 hardware is going to be for .NET 11. https://github.com/dotnet/core/blob/main/release-notes/10.0/supported-os.md#windows lists 10 1809 (E) that I believe runs on the first-generation Windows Arm64 laptops. If this line stays as is and includes Arm64 support, it should be our baseline.

It looks to be Snapdragon 850 (armv8.2-a) and later (https://learn.microsoft.com/en-us/windows-hardware/design/minimum/supported/windows-10-1809-supported-qualcomm-processors). Only 1803 and earlier had support for the "1st gen" which was the Snapdraon 835 (armv8.0-a). This matches the internal thread we had last December (which covered requirements and recommendations from Arm SBSA). Formally per Arm SBSA, Win 11 22621+ (22H2) requires armv8.0a + LSE and 25188+ (24H2) requires armv8.2-a + LRCPC. The actual CPUs supported in practice are all armv8.2-a or later.

Given that, I'll leave it at just armv8.0-a + LSE for now, to be on the safe side. While there's a few nice APIs in armv8.2-a, the big things mostly likely to benefit the runtime and simplify typical user apps are LSE and LRCPC. The latter we can't strictly guarantee for the oldest version's we're still supporting.

@tannergooding
Copy link
Member Author

tannergooding commented Jul 28, 2025

I would check whether changing the baseline reduces number of JITed methods for a simple ASP.NET app. If it does by say 10 methods or more, I would say that it is worth it. We use R2R as first tier, so the number of JITed methods is what matters the most, the code quality does not matter as much.

I get 8 methods difference for the JittedMethodsCountingTest. We no longer JIT the following methods:

   JIT compiled System.Guid:FormatGuidVector128Utf8(System.Guid,bool) [Tier0, IL size=304, code size=531]
   JIT compiled System.HexConverter:AsciiToHexVector128(System.Runtime.Intrinsics.Vector128`1[byte],System.Runtime.Intrinsics.Vector128`1[byte]) [Tier0, IL size=76, code size=285]
   JIT compiled System.Buffers.AsciiCharSearchValues`2[System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst]:IndexOfAny(System.ReadOnlySpan`1[char]) [Tier0, IL size=30, code size=98]
   JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAny[System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst](byref,int,byref) [Tier0, IL size=9, code size=45]
   JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyCore[int,System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst,System.Buffers.IndexOfAnyAsciiSearcher+IndexOfAnyResultMapper`1[short]](byref,int,byref) [Instrumented Tier0, IL size=572, code size=1387]
   JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyLookup[System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[byte]) [Tier0, IL size=41, code size=181]
   JIT compiled System.Buffers.IndexOfAnyAsciiSearcher+Default:PackSources(System.Runtime.Intrinsics.Vector256`1[ushort],System.Runtime.Intrinsics.Vector256`1[ushort]) [Tier0, IL size=18, code size=49]
  JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyLookupCore[System.Buffers.SearchValues+FalseConst](System.Runtime.Intrinsics.Vector256`1[byte],System.Runtime.Intrinsics.Vector256`1[byte]) [Tier0, IL size=77, code size=189]

Namely the methods that use Vector<T> or Vector256<T> are no longer required to be jitted on startup for typical hardware. The same goes for methods that were opportunistically using SSE3-SSE4.2 ISAs (mostly due to a dependency on Ssse3.Shuffle)

What remains being jitted is namely the (dynamicClass) stuff for InvokeStub_EventAttribute.set_*(...). For this test in particular, we also end up jitting System.Buffers.SearchValues:TryGetSingleRange which is listed as [Tier-0 switched to FullOpts]

@jkotas
Copy link
Member

jkotas commented Jul 28, 2025

I get 8 methods difference for the JittedMethodsCountingTest.

I think it is worth changing the default baseline for crossgen2 then. I assume that the default is overridable, and self-contained apps published with R2R can lower it. Is that right?

@tannergooding
Copy link
Member Author

I think it is worth changing the default baseline for crossgen2 then. I assume that the default is overridable, and self-contained apps published with R2R can lower it. Is that right?

Correct, the way it would work is essentially specifying --instruction-set=-avx, which will turn off the VEX encoding space and would be equivalent to targeting the new baseline.

There's probably some kind of "better" UX we could provide here, if that were desirable. For example, we might special-case --instruction-set=base to mean precisely that and not something deeper. We could also just say that we only default to specifying our own ISA if no user-specified one was provided.

@jkotas
Copy link
Member

jkotas commented Jul 30, 2025

I think it is worth changing the default baseline for crossgen2 then.

We may want to keep the lower baseline for macOS x64 to avoid regressing emulation on Apple Silicon.

@EgorBo
Copy link
Member

EgorBo commented Jul 30, 2025

I think it is worth changing the default baseline for crossgen2 then.

We may want to keep the lower baseline for macOS x64 to avoid regressing emulation on Apple Silicon.

Do x64 emulators on windows-arm64 and linux-arm64 support AVX so we don't need to do the same for them?

@jkotas
Copy link
Member

jkotas commented Jul 30, 2025

x64 emulators on windows-arm64

Yes: https://winbuzzer.com/2024/11/07/windows-11-build-27744-expands-prism-emulator-for-arm-pcs-xcxwbn/

linux-arm64

We do not support QEMU, but yes: https://www.phoronix.com/news/QEMU-7.2-Released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants