Skip to content

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Mar 2, 2025

This code path can introduce duplicates into the color->other_colors array:

// Maybe we should make sure we are not adding duplicates here. It is not really a problem
// since we will get rid of duplicates before submitting the SCCs to the client in gather_xrefs
if (color_data)
add_other_colors (color_data, &other->xrefs);

Then we get to the code that counts xref_count:

// Eliminate non-visible SCCs from the SCC list and redistribute xrefs
for (cur = root_color_bucket; cur; cur = cur->next) {
ColorData *cd;
for (cd = &cur->data [0]; cd < cur->next_data; ++cd) {
if (!color_visible_to_client (cd))
continue;
color_merge_array_empty ();
gather_xrefs (cd);
reset_xrefs (cd);
dyn_array_ptr_set_all (&cd->other_colors, &color_merge_array);
xref_count += dyn_array_ptr_size (&cd->other_colors);
}
}

It uses the color_visible_to_client helper method which in turn calls bridgeless_color_is_heavy:

static gboolean
bridgeless_color_is_heavy (ColorData *data) {
if (disable_non_bridge_scc)
return FALSE;
int fanin = data->incoming_colors;
int fanout = dyn_array_ptr_size (&data->other_colors);
return fanin > HEAVY_REFS_MIN && fanout > HEAVY_REFS_MIN
&& fanin*fanout >= HEAVY_COMBINED_REFS_MIN;
}

The fanout metric is computed from the count of other_colors which contain duplicates. Once the references are gathered with gather_xrefs they get deduplicated. This can change the number of fanout nodes just enough that it no longer satisfies the "heavy" property. We still increase the xref_count but in the second loop that iterates over the same data it will get skipped and result in a mismatch:

for (cur = root_color_bucket; cur; cur = cur->next) {
ColorData *src;
for (src = &cur->data [0]; src < cur->next_data; ++src) {
if (!color_visible_to_client (src))
continue;

Repro: https://github.com/user-attachments/files/19044193/Archive.zip

Fixes #106410

@ghost ghost added the area-GC-mono label Mar 2, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 2, 2025
@filipnavara
Copy link
Member Author

FYI @AaronRobinsonMSFT

@filipnavara
Copy link
Member Author

filipnavara commented Mar 2, 2025

The fix is incomplete, we would likely report incorrect SCCs now instead of hitting the assert, and previous XREFs would likely be wrong too.

I'm keeping this as draft until I come up with a fix.

@filipnavara filipnavara closed this Mar 2, 2025
@filipnavara
Copy link
Member Author

The fix needs to be done differently. We basically have three options:

  1. Get rid of the duplicates early. This may be not be too costly if done right.
  2. Lock it the heaviness at certain point and report the nodes anyway even if they are not really heavy in the end.
  3. Do the deduplication late, possibly at the beginning of processing_build_callback_data.

…duplication

Do early deduplication

Fix Windows build

Add test cases to sgen-bridge-pathologies
@filipnavara
Copy link
Member Author

Rebased and updated for the new test infrastructure.

@BrzVlad BrzVlad merged commit 8f94d2e into dotnet:main Apr 2, 2025
112 of 115 checks passed
@filipnavara filipnavara deleted the patch-19 branch April 2, 2025 21:16
filipnavara added a commit to filipnavara/runtime that referenced this pull request Apr 8, 2025
…duplication (dotnet#113044)

* [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication

Do early deduplication

Fix Windows build

Add test cases to sgen-bridge-pathologies

* Move test code

* Remove old code

* Add extra check (no change to functionality)
akoeplinger added a commit that referenced this pull request Apr 11, 2025
* Fix dump_processor_state debug code to compile and work on Android (#112970)

Fix typo in GC bridge comparison message (SCCS -> XREFS)

* [mono] Add a few bridge tests (#113703)

* [mono][sgen] Fix DUMP_GRAPH debug option build for tarjan bridge

* [mono][sgen] Don't create ScanData* during debug dumping of SCCs

It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place.

* [mono][sgen] Fix some null deref crashes in DUMP_GRAPH debug option

* [mono][tests] Add bridge tests

These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.

* Fix an edge case in the Tarjan GC bridge that leads to losing xref information (#112825)

* Fix an edge case in the Tarjan SCC that lead to losing xref information

In the Tarjan SCC bridge processing there's a color graph used to find out
connections between SCCs. There was a rare case which only manifested when
a cycle in the object graph points to another cycle that points to a bridge
object. We only recognized direct bridge pointers but not pointers to other
non-bridge SCCs that in turn point to bridges and where we already calculated
the xrefs. These xrefs were then lost.

* Add test case to sgen-bridge-pathologies and add an assert to catch the original bug

* Add review

---------

Co-authored-by: Vlad Brezae <[email protected]>

* [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication (#113044)

* [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication

Do early deduplication

Fix Windows build

Add test cases to sgen-bridge-pathologies

* Move test code

* Remove old code

* Add extra check (no change to functionality)

* Disable test on wasm

---------

Co-authored-by: Vlad Brezae <[email protected]>
Co-authored-by: Alexander Köplinger <[email protected]>
filipnavara added a commit to filipnavara/runtime that referenced this pull request Apr 15, 2025
…duplication (dotnet#113044)

* [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication

Do early deduplication

Fix Windows build

Add test cases to sgen-bridge-pathologies

* Move test code

* Remove old code

* Add extra check (no change to functionality)
@github-actions github-actions bot locked and limited conversation to collaborators May 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-GC-mono community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Android] App crashing with condition `xref_count == xref_index' not met

2 participants