Skip to content

Conversation

@BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Oct 31, 2025

A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client.

This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has.

Fixes assertions like:

* Assertion at /home/vbrezae/Xamarin/repos/runtime/src/mono/mono/metadata/sgen-tarjan-bridge.c:1151, condition `color_visible_to_client (cd)' not met

A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client.

This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has.
@BrzVlad BrzVlad requested a review from steveisok as a code owner October 31, 2025 14:29
Copilot AI review requested due to automatic review settings October 31, 2025 14:29
@BrzVlad BrzVlad requested a review from vitek-karas as a code owner October 31, 2025 14:29
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 31, 2025
@BrzVlad
Copy link
Member Author

BrzVlad commented Oct 31, 2025

graph TD;
    BL0-->NBMID;
    BL1-->NBMID;
    BL2-->NBMID;
    BL3-->NBMID;
    BL4-->NBMID;
    BL5-->NBMID;
    BL6-->NBMID;
    BL7-->NBMID;
    NBMID-->BR0;
    NBMID-->BR1;
    NBMID-->BR2;
    NBMID-->BR3;
    NBMID-->BR4;
    NBMID-->BR5;
    NBMID-->NBR7;
    NBMID-->BR6;
    NBR7-->BR5;
    NBR7-->BR6;
Loading

Consider the following graph, that is identical to the one from the added testcase. B prefix is for bridge objects, NB is for normal objects. Every object is an SCC in this scenario. The optimization passing SCCs containing non bridge objects is meant to prevent the addition of excessive links on the java side. This graph leads to the addition of 8+7 = 15 refs on java. If we didn't allow to pass NBMID SCC over to the java side, we would need to add 7 refs for each one of the BLx objects, totalling 56 refs!

NBMID is initially considered to be a visible to client color, because it has 8 xrefs_in and 8 xrefs_out (8x8 > 60). However, when computing the final xrefs for this color, NBR7 is not included (because it is a color that doesn't contain any bridges and it doesn't have enough links). It will instead be traversed, ending up with BR5 and BR6 xrefs that were already present. This means that NBMID will only have 7 xrefs_out and it would no longer satisfy the condition of being a color visible to client.

@BrzVlad BrzVlad added area-GC-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 31, 2025
@BrzVlad BrzVlad requested a review from filipnavara October 31, 2025 14:29
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the GC bridge processing where a color's visibility to the client could change during processing, causing incorrect behavior. The fix introduces a visibleToClient flag that pins a color as visible once detected, preventing it from becoming invisible later even if it loses xrefs.

Key changes:

  • Added a visibleToClient flag to ColorData structures in both CoreCLR and Mono runtimes
  • Modified ColorVisibleToClient/color_visible_to_client functions to cache visibility status
  • Added a test case BridgelessHeavyColorChanging to verify the fix using inline arrays

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/gc/gcbridge.cpp Added visibleToClient flag to ColorData and updated ColorVisibleToClient to cache visibility determination
src/mono/mono/metadata/sgen-tarjan-bridge.c Added visible_to_client flag to ColorData and updated color_visible_to_client to cache visibility determination
src/tests/GC/Features/Bridge/Bridge.cs Added InlineData struct, updated NonBridge14 to use it, and added BridgelessHeavyColorChanging test method

// is problematic. We fix this by setting this flag when a color is detected as visible to client.
// Once the flag is set, the color is pinned to being visible to client, even though it could lose
// some xrefs, making it not satisfy the BridgelessColorIsHeavy condition.
unsigned visibleToClient : 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to tweak API_INDEX_BITS/INCOMING_COLORS_BITS to fit into the same 32-bit integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this has any relevance, looks like an old size optimization that was maybe relevant on 32bit and just survived through the years. I was thinking about compacting but then I noticed that on my 64bit machine, the size of ColorData was 40 bytes regardless of my change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous bit sizes added up to 32 bits. I assumed it was relevant, at least on 32-bit platforms.

Copy link
Member

@filipnavara filipnavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM and makes it much easier to reason about the code, just one nit about the data structure layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants