Define platform testing policy #120337

agocke · 2025-10-02T22:51:03Z

This could get more complicated, but I want to make it simple to start

Copilot

Pull Request Overview

This PR introduces a new documentation file that defines a platform testing policy for the .NET repository. The policy aims to optimize testing coverage across different platforms while managing costs by strategically selecting which platform versions to test for different types of code changes.

Key changes:

Establishes a testing policy where main branch PRs test on latest supported platforms and servicing branch PRs test on oldest supported platforms
Documents the assumption that intermediate platform versions have sufficient coverage through testing the extremes
Defines the scope of .NET lifecycle maintenance covering three versions: current development, previous release, and the release before that

docs/infra/test-platforms.md

Co-authored-by: Copilot <[email protected]>

jkotas · 2025-10-03T00:04:32Z

docs/infra/test-platforms.md

+1. Latest supported
+2. Oldest supported
+
+We assume that all supported platform versions in between have sufficient coverage based on the latest and the oldest. We currently have no defined strategy for pre-release versions.


I think this is oversimplification. We have 3 categories of tests:

src\tests... - core runtime tests, JIT tests for the most part. The architecture matrix matters for these. OS flavor and OS version matrix does not matter much.

src\libraries...*Tests - libraries (API) tests. The OS and OS version matrix matters for these. Globalization, crypto, I/O, networking, ... are known to have many differences between OS flavors and versions, Architecture matrix does not matter a lot.

Other - IL linker tests, host tests, ... . Platform neutral code for the most part. OS version or architecture specific do not matter a lot.

We have 3 different strategies among these 3 categories today:

Core runtime tests: Matrix is focused on architecture coverage.

Libraries tests: Matrix is focused on OS variety coverage.

Other: . Matrix does not matter a whole lot. Also, these tests are very cheap so testing more than strictly necessary is not big deal.

The OS version mix strategy that you are proposing is fine for core runtime tests. (It is fine since OS versions do not matter for core runtime tests.)

I am not convinced that it is a good tradeoff for libraries tests. I expect that we would see quite a few OS flavor/version specific breaks to sneak through over time. It will create work for engineers on the libraries teams. They will need to remember to trigger optional legs for changes in sensitive areas and they will need to deal with breaks that sneak through. If we go with this plan, I would like to see explicit ack from @artl93 that the extra manual work is worth the saved machine costs. (IIRC, libraries tests are well structured and running them does not cost much. My mental picture is that Wasm/Browse testing costs about as much as all libraries testing on the many different OSes that we test libraries on.)

I agree that libraries tests care more about OS breadth. However, my proposal would still include running an arbitrarily large set of OSes, but limiting to the latest version in PR. My thinking there is that the likelihood of breaking a version that's not the latest, or a servicing release of an OS causing a break not seen in the latest, is fairly low. Do you disagree?

Also, I would be fine with an addendum that adds, say, a rolling build of the runtime tests against oldest supported versions. That way people ideally would not need to queue manual runs to catch these things.

My thinking there is that the likelihood of breaking a version that's not the latest

Taking accidental dependency on a new API that's unavailable in oldest supported version is plausible. I have signoff on #120358 where we did exactly that earlier today. It manifested as a build break since the dependency was from C code. If the dependency was via P/Invoke, it would be a runtime failure that would not be caught if we were testing on newest version only. I do not have a good idea about how many breaks we would see to sneak through. My guess is we would see one break per month to sneak through.

Here is the set of 3 Linux x64 flavors we run for libraries in CI in main today:

runtime/eng/pipelines/libraries/helix-queues-setup.yml

Lines 75 to 77 in 0082733

- Ubuntu.2204.Amd64.Open

- (AzureLinux.3.0.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64

- (Centos.10.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-stream-10-helix-amd64

This maximizes coverage (variety) while minimizing costs:

Redhat-based distribution vs. Debian-based distribution vs. special Azure Linux 3

Physical OS vs. containers

OpenSSL crypto provider vs. SymCrypt provider

Older (Ubuntu 22 is 2022) vs. newer

If we were to go with the plan to bump everything, AzureLinux3 and CentOS10 are latest available so no change there. Ubuntu 22 would need to be replaced by Ubuntu 25. We would give up testing on older distros by doing that. I expect that we would give up testing on physical OS too since we would not want to pay for creating Ubuntu 25 OS images with short shelf life (Ubuntu 25 is not LTS). At that point, we may give up Ubuntu completely since it is not differentiated enough from Azure Linux 3 and CentOS anymore, and we can reduce the set down to just AzureLinux3 and CentOS10. I am sharing this thought process to show that there is interplay between the different dimensions of the matrix.

I like this example.

AzureLinux3 and CentOS10 are latest available so no change there.

Agreed, I think those should stay as-is.

Ubuntu 22 would need to be replaced by Ubuntu 25. We would give up testing on older distros by doing that

Yup. I think we have a choice here on whether we prefer to do the absolute latest Ubuntu, regardless of LTS status, or the latest LTS. I would be fine with either one. I don't particularly like the current choice of 22.04, as it is the oldest supported and doesn't give us coverage of 24.04, which is probably the most commonly used Ubuntu by now. I'd rather main catch problems on the leading edge vs the trailing edge.

I expect that we would give up testing on physical OS too since we would not want to pay for creating Ubuntu 25 OS images with short shelf life (Ubuntu 25 is not LTS)

I'm fine with giving up testing physical images entirely.

At that point, we may give up Ubuntu completely since it is not differentiated enough from Azure Linux 3 and CentOS anymore

I think we should still have a Debian distribution, since they carry their own patches to common base libraries.

I am sharing this thought process to show that there is interplay between the different dimensions of the matrix.

Agreed. Different distributions decide that their versions mean different things so it's hard to pick just one policy. Nevertheless, "pick the latest" seems like a decent rule of thumb for main.

Taking accidental dependency on a new API that's unavailable in oldest supported version is plausible. I have signoff on #120358 where we did exactly that earlier today. It manifested as a build break since the dependency was from C code. If the dependency was via P/Invoke, it would be a runtime failure that would not be caught if we were testing on newest version only. I do not have a good idea about how many breaks we would see to sneak through. My guess is we would see one break per month to sneak through.

This sounds like a +1 to "rolling build of oldest version" for main. This doesn't sound common enough that I feel the need to check it every PR, but common enough that I wouldn't want this to slip too far without us noticing. Weekly seems reasonable, although we could see what "daily" does to the budget.

I think we should still have a Debian distribution, since they carry their own patches to common base libraries.

Should it be rolling based on the same reasoning that pushes older versions to rolling?

Debian-specific patches are minor compared to several years' worth of changes in Linux ecosystem. Assuming we are testing on some latest era Linux distro, it is more likely for us to introduce an issue that is specific to older Linux than an issue specific to Debian. If we are moving the earlier as not worth having in CI to rolling, we should move the latter to rolling as well.

Works for me. That is, at least for runtime. I'll defer to libraries on what they think is important for their test suite.

docs/infra/test-platforms.md

Expanded the definition of a platform to include architectures, OSes, OS flavors, and crypto stacks. Updated the testing policy to clarify versioning strategy for platform coverage.

jkotas · 2025-10-04T03:26:54Z

docs/infra/test-platforms.md

+We want to mix and match platform versions and .NET versions to produce good platform coverage without too much cost. This means we want to catch breaks on each platform as quickly as possible, and prioritize catching the type of platform breaks that are most likely to affect the specific version being tested.
+
+* `main` - PRs run on the *latest supported* platform.
+* `servicing` - PRs run on the *oldest supported* platform.


If we change what we test during servicing vs. main, it is likely that we would need to spend time around every release to stabilize the servicing tree on new set of OSes.

Yup, I think that's desirable since we want to watch the oldest support versions somewhere, and servicing is where it matters the most. That's because servicing releases can have older platforms go out of support in the middle of the servicing lifetime.

Added some language below to describe what I think should happen in vary broad terms whenever we release a new version.

I expect this to be more detailed when we actually refactor the helix queue definitions to follow this policy.

Seems reasonable for PRs. I wonder how we can make sure they stay clean during development though. We don't want to wait until release to learn about all the ways we broke customers on the oldest supported platforms (who are less likely to try previews).

Our needs for macOS are arguably inverted here, where we need to be more proactive about testing the newest version of macOS on the servicing releases of .NET. Customers who update macOS or acquire new hardware after the macOS release cannot downgrade but their apps will still be runnning on the previous STS and LTS releases.

For example, customers are currently running .NET 8 apps on macOS 26, so getting .NET 8 coverage was at least as valuable as .NET 10 coverage, and arguably more so.

To @ericstj's point, this could be handled differently between PRs and rolling builds, with rolling builds of the servicing branches running on the latest supported platform along with PRs into main. Then perhaps using the oldest supported platform for servicing PRs and the rolling builds for main.

jkotas · 2025-10-05T22:17:58Z

docs/infra/test-platforms.md

+* `main` - PRs run on the *latest supported* platform.
+* `servicing` - PRs run on the *oldest supported* platform.
+
+The above policy only applies to PRs. Scheduled or incidental runs can be queued against other platform definitions, if deemed necessary.


Should we have automatic triggers of the optional legs that cover more versions for areas that are known to have significant differences between OS versions, such as crypto, so that devs working in these areas do not have think about it?

Added as an open question. I don't have an opinion either way.

We should also have rolling builds with someone responsible for ensuring they stay clean. Otherwise they will accumulate debt and the folks who need them will be the only ones sorting through that noise.

Added an open question regarding area paths triggering additional version testing.

Added instructions for handling servicing releases in PR configuration.

docs/infra/test-platforms.md

dotnet-policy-service · 2025-10-11T01:57:49Z

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.

Co-authored-by: Jan Kotas <[email protected]>

agocke · 2025-10-21T17:23:18Z

@jkotas Anything else? We're being forced to upgrade versions again and I'd rather move based on a settled policy rather than bump to an arbitrary version.

jkotas · 2025-10-21T17:34:28Z

This proposal impacts libraries teams the most compared to where we are today. I expect libraries owners (@artl93, @jeffhandley, @ericstj) to sign-off on it.

I am not sure whether it is a good idea to switch how we test when going from main to servicing. The fewer ship-related activities that we have to do, the better. Again, I expect this is going to be a ship-related activity mostly for libraries. I do not have a strong opinion.

jeffhandley · 2025-10-21T17:37:02Z

Thanks; acknowledged on needing my review/sign-off. I chatted with @agocke a bit on a tangential discussion, I've reviewed. but still have some comments/suggestions to leave.

agocke · 2025-10-21T21:45:52Z

I am not sure whether it is a good idea to switch how we test when going from main to servicing

I don't have a strong reason why servicing needs to be on oldest instead of newest. My thinking is that servicing is more worried about breaking compatibility for existing platforms, rather than seeing the latest breaks coming in from newer platforms. And that we would want to catch leading-edge breaks in main first.

If this is too difficult we can stick with the same plan for main and servicing.

jkotas · 2025-10-22T00:00:29Z

My thinking is that servicing is more worried about breaking compatibility for existing platforms, rather than seeing the latest breaks coming in from newer platforms. And that we would want to catch leading-edge breaks in main first.

For servicing, we care both about oldest and newest, and we want to err on the side of running more in the CI so that we find out about any breaks as soon as possible, and not only after a while when somebody happens to look at the results of the weekend full matrix run. Servicing queues are low traffic, so running extra configurations in them does not add significant costs. It follows the same reasoning on why we have build-analysis disabled in servicing, so that failures do not sneak through unnoticed. We pay extra to reduce risk.

For main, it is about CI cost management. It would be useful to see the current breakdown of the costs to do an informed decision.

ericstj · 2025-10-22T00:11:09Z

docs/infra/test-platforms.md

+
+Each of the above combinations is considered a single "platform". New versions are not considered new platforms, but different versions of the same platform. Only if a new version modifies one of the above elements would it be considered a new platform.
+
+Testing all versions of all platforms all the time is too expensive. This document defines the testing policy for each branch of the product.


This document mentions PRs, but not rolling builds - in the past we've used this tool to help make the matrix sparser for PR while still filling the coverage elsewhere.

It also doesn't mention internal vs public and if we have differences.

I don’t plan any differences for internal vs public.

Regarding rolling builds, it sounds like we’re converging on oldest being almost or equally important as latest. Should rolling builds be on oldest?

any differences for internal vs public.

Why do the differences for internal vs. public exist in the first place? If they exist for a good reason (e.g. avoid giving anonymous access to images that cannot be exposed with anonymous access), we need to keep them.

jeffhandley · 2025-10-22T01:08:48Z

docs/infra/test-platforms.md

+
+The above policy only applies to PRs. Scheduled or incidental runs can be queued against other platform definitions, if deemed necessary.
+
+- [ ] **Open question** Should certain area paths trigger additional version testing?


Additional version testing based on area labels is a good idea. For area-System.Security, we should probably always run what's defined now as extra-platforms.

jeffhandley

I made suggestions that capture my thoughts from the Libraries perspective. As @jkotas described, for Libraries the OS versions affect us more, and we can also justify longer/costlier CI runs in servicing compared to main. I don't know if it's too much, but I think it would offer a good balance on comprehensiveness and confidence.

jeffhandley · 2025-10-22T03:47:58Z

docs/infra/test-platforms.md

+We want to mix and match platform versions and .NET versions to produce good platform coverage without too much cost. This means we want to catch breaks on each platform as quickly as possible, and prioritize catching the type of platform breaks that are most likely to affect the specific version being tested.
+
+* `main` - PRs run on the *latest supported* platform.
+* `servicing` - PRs run on the *oldest supported* platform.
+
+The above policy only applies to PRs. Scheduled or incidental runs can be queued against other platform definitions, if deemed necessary.
+
+- [ ] **Open question** Should certain area paths trigger additional version testing?
+
+## Details
+
+### Platform version
+
+There are two platform versions we want to test on, depending on the .NET support lifecycle:
+
+1. Latest supported
+2. Oldest supported
+
+We assume that all supported platform versions in between have sufficient coverage based on the latest and the oldest. We currently have no defined strategy for pre-release versions.
+


Suggested change

We want to mix and match platform versions and .NET versions to produce good platform coverage without too much cost. This means we want to catch breaks on each platform as quickly as possible, and prioritize catching the type of platform breaks that are most likely to affect the specific version being tested.

* `main` - PRs run on the *latest supported* platform.

* `servicing` - PRs run on the *oldest supported* platform.

The above policy only applies to PRs. Scheduled or incidental runs can be queued against other platform definitions, if deemed necessary.

- [ ] **Open question** Should certain area paths trigger additional version testing?

## Details

### Platform version

There are two platform versions we want to test on, depending on the .NET support lifecycle:

1. Latest supported

2. Oldest supported

We assume that all supported platform versions in between have sufficient coverage based on the latest and the oldest. We currently have no defined strategy for pre-release versions.

We want to mix and match platform versions and .NET versions to produce good platform coverage without too much cost. This means we want to catch breaks on each platform as quickly as possible, and prioritize catching the type of platform breaks that are most likely to affect the specific version being tested.

For `main`, PR traffic is very high and there is ample time for stablizing on every platform definition. For `servicing`, PR traffic is much lower and time available to react to platform breaks is much shorter. Additionally for `servicing, platform breaks introduced in the newest or upcoming platform versions should be detected as early as possible.

* `main`

* PRs run on the *latest supported* platform.

* Daily scheduled builds run on the *oldest supported* platform.

* Weekly scheduled builds run on *every supported* platform.

* Select *pre-release versions* can be configured with scheduled builds at the appropriate frequency.

* `servicing`

* PRs run on the *oldest supported* and *latest supported* platforms.

* Daily scheduled builds run on *every supported* platform if there have been changes committed, including changes to platform definitions.

* Weekly scheduled builds run on *every supported* platform, even if there have not been any changes committed.

* Select *pre-release versions* can be configured with scheduled builds at the appropriate frequency.

For both `main` and `servicing` branches, incidental runs can be queued against other platform definitions on demand. Automatic runs of additional platforms can be configured by area label or file path globs.

## Details

Weekly scheduled builds run on every supported platform.

We don’t currently have weekly runs as far as I know. I also think we’re close to load capacity on our pools. I’d like to set a ground rule for this PR that we don’t add significant new load, since we might not actually have the capacity for it. I’m very willing to try that next, but let’s start by just defining the metric for what we have.

I’d like to set a ground rule for this PR that we don’t add significant new load

You may want to include the nuance that we have developed to optimize the costs then.

For example, if we cover linux-x64, linux-arm64 and linux-musl-x64 and the run is on the expensive side (e.g. runs on every PR), we skip the coverage for linux-musl-arm64 since it is very unlikely for musl-arm64 bug to exist.

docs/infra/test-platforms.md

jeffhandley · 2025-10-22T04:01:37Z

docs/infra/test-platforms.md

+1. [ ] Determine the **oldest** supported version of each test platform. The policy to determine this version is currently out of scope of this document. Refer to https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core
+2. [ ] Change PR definitions in CI to use oldest versions.
+3. [ ] Stabilize CI for oldest OS versions.


Suggested change

1. [ ] Determine the **oldest** supported version of each test platform. The policy to determine this version is currently out of scope of this document. Refer to https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core

2. [ ] Change PR definitions in CI to use oldest versions.

3. [ ] Stabilize CI for oldest OS versions.

1. [ ] Determine the **oldest** supported version of each test platform. The policy to determine this version is currently out of scope of this document. Refer to https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core and https://github.com/dotnet/core/blob/main/os-lifecycle-policy.md.

2. [ ] Change PR definitions in CI to add coverage for the **oldest supported** version alongside the **newest supported** version.

3. [ ] Change the daily scheduled builds to run on the **every supported** version if changes have been committed (including platform definition changes).

4. [ ] Stabilize CI for each supported version.

Co-authored-by: Jeff Handley <[email protected]>

Define platform testing policy

f37f00f

Copilot AI review requested due to automatic review settings October 2, 2025 22:51

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 2, 2025

Copilot AI reviewed Oct 2, 2025

View reviewed changes

docs/infra/test-platforms.md Outdated Show resolved Hide resolved

dotnet-policy-service bot assigned agocke Oct 2, 2025

agocke requested review from artl93, ericstj, jeffschwMSFT, jkotas and steveisok October 2, 2025 22:51

Update docs/infra/test-platforms.md

3206c03

Co-authored-by: Copilot <[email protected]>

jkotas reviewed Oct 3, 2025

View reviewed changes

docs/infra/test-platforms.md Show resolved Hide resolved

agocke added 2 commits October 3, 2025 10:35

Enhance platform definition and testing policy

cc16724

Expanded the definition of a platform to include architectures, OSes, OS flavors, and crypto stacks. Updated the testing policy to clarify versioning strategy for platform coverage.

Clarify testing policy for platform versions

aa413a6

jkotas reviewed Oct 4, 2025

View reviewed changes

jkotas reviewed Oct 5, 2025

View reviewed changes

agocke added 2 commits October 5, 2025 19:32

Add open question about version testing triggers

180c83c

Added an open question regarding area paths triggering additional version testing.

Update test-platforms.md with servicing release policy

78c0dea

Added instructions for handling servicing releases in PR configuration.

jkotas reviewed Oct 6, 2025

View reviewed changes

docs/infra/test-platforms.md Show resolved Hide resolved

teo-tsirpanis added area-Meta and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 11, 2025

Update docs/infra/test-platforms.md

be8d59e

Co-authored-by: Jan Kotas <[email protected]>

agocke mentioned this pull request Oct 17, 2025

Add centralized Helix platform alias definitions file #120817

Open

Merge branch 'main' into test-policy

eb48961

ericstj reviewed Oct 22, 2025

View reviewed changes

jeffhandley reviewed Oct 22, 2025

View reviewed changes

Update docs/infra/test-platforms.md

6441bbc

Co-authored-by: Jeff Handley <[email protected]>

	- Ubuntu.2204.Amd64.Open
	- (AzureLinux.3.0.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64
	- (Centos.10.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-stream-10-helix-amd64


		Each of the above combinations is considered a single "platform". New versions are not considered new platforms, but different versions of the same platform. Only if a new version modifies one of the above elements would it be considered a new platform.

		Testing all versions of all platforms all the time is too expensive. This document defines the testing policy for each branch of the product.


		The above policy only applies to PRs. Scheduled or incidental runs can be queued against other platform definitions, if deemed necessary.

		- [ ] Open question Should certain area paths trigger additional version testing?

-. [ ] Determine the **oldest** supported version of each test platform. The policy to determine this version is currently out of scope of this document. Refer to https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core
-. [ ] Change PR definitions in CI to use oldest versions.
-. [ ] Stabilize CI for oldest OS versions.
+. [ ] Determine the **oldest** supported version of each test platform. The policy to determine this version is currently out of scope of this document. Refer to https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core and https://github.com/dotnet/core/blob/main/os-lifecycle-policy.md.
+. [ ] Change PR definitions in CI to add coverage for the **oldest supported** version alongside the **newest supported** version.
+. [ ] Change the daily scheduled builds to run on the **every supported** version if changes have been committed (including platform definition changes).
+. [ ] Stabilize CI for each supported version.

Define platform testing policy #120337

Are you sure you want to change the base?

Define platform testing policy #120337

Conversation

agocke commented Oct 2, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

jkotas Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dotnet-policy-service bot commented Oct 11, 2025

Uh oh!

agocke commented Oct 21, 2025

Uh oh!

jkotas commented Oct 21, 2025

Uh oh!

jeffhandley commented Oct 21, 2025

Uh oh!

agocke commented Oct 21, 2025

Uh oh!

jkotas commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffhandley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jkotas Oct 3, 2025 •

edited

Loading

jkotas Oct 5, 2025 •

edited

Loading

jkotas commented Oct 22, 2025 •

edited

Loading

jkotas Oct 22, 2025 •

edited

Loading