Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Oct 4, 2023

Backport of #75088 to release/7.0

Fixes #81211

/cc @lambdageek @lateralusX

Customer Impact

Customers targeting Apple platforms using LLVM AOT codegen (the default) in highly concurrent settings (such as firing off multiple simultaneous async HTTP requests) may experience unexpected behavior such as InvalidCastExceptions, NullReferenceExceptions or crashes.

Testing

Manual testing

Risk

Low. This code has been running on .NET 8 main for over a year in CI, as well as on some other non-mobile platforms

vseanreesermsft and others added 3 commits October 3, 2023 18:35
When using LLVM AOT codegen, init_method updates two GOT slots.
These slots are initialized as part of init_method,
but there is a race between initialization of the two slots. Current
implementation can have two threads running init_method for the same
method, but as soon as:

[got_slots [pindex]] = addr

store is visible, it will trigger other threads to return back from
init_method, but since that could happen before the corresponding
LLVM AOT const slot is set, second thread will return to method
calling init_method, load the LLVM aot const, and crash when
trying to use it (since its still NULL).

This crash is very rare but have been identified on x86/x64 CPU's,
when one thread is either preempted between updating regular GOT slot
and LLVM GOT slot or store into LLVM GOT slot gets delayed in
store buffer. I have also been able to emulate the scenario in debugger,
triggering the issue and crashing in the method loading from LLVM aot
const slot.

Fix change order of updates and make sure the update of LLVM aot const
slot happens before memory_barrier, since:

got [got_slots [pindex]] = addr;

have release semantics in relation to addr and update of LLVM aot const
slot. Fix also add acquire/release semantics for ji->type in init_method
since it is used to guard if a thread ignores a patch or not and it
should not be re-ordered with previous stores, since it can cause
similar race conditions with updated slots.
@ghost ghost added the area-Codegen-AOT-mono label Oct 4, 2023
@lambdageek lambdageek added this to the 7.0.x milestone Oct 4, 2023
@lambdageek lambdageek added the Servicing-consider Issue for next servicing release review label Oct 4, 2023
@lambdageek lambdageek added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 4, 2023
@lambdageek
Copy link
Member

Approved by tactics in email

@lambdageek lambdageek changed the base branch from release/7.0 to release/7.0-staging October 5, 2023 13:56
@lambdageek
Copy link
Member

Ah crud, I forgot that the backports should be aiming at the release/7.0-staging branch. Updated the PR base.

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lambdageek
Copy link
Member

Failures don't seem related

@lambdageek lambdageek merged commit 7ee35b9 into release/7.0-staging Oct 6, 2023
@lewing lewing deleted the backport/pr-75088-to-release/7.0 branch October 8, 2023 01:26
@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-Codegen-AOT-mono Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants