Skip to content

Conversation

@janvorli
Copy link
Member

There were two issues. One is that the SfiNextWorker was incorrectly advancing the stack frame iterator in case the interpreter was called from CallDescrWorkerInternal in some cases.
The second was that the call to MethodDesc::GetMethodDescOfVirtualizedCode that is made by the InterpExecMethod can end up running managed code and throw managed exception. So it needs to have PAL_TRY/PAL_EXCEPT around the invocation, since the main loop of the interpreter can only catch and process C++ exceptions.

This fixes the Loader\classloader\regressions\vsw529206\vsw529206ModuleCctor test.

There were two issues. One is that the SfiNextWorker was incorrectly
advancing the stack frame iterator in case the interpreter was called
from CallDescrWorker in some cases.
The second was that the call to MethodDesc::GetMethodDescOfVirtualizedCode
that is made by the InterpExecMethod can end up running managed code and
throw managed exception. So it needs to have PAL_TRY/PAL_EXCEPT around
the invocation, since the main loop of the interpreter can only catch
and process C++ exceptions.
@janvorli janvorli added this to the 11.0.0 milestone Oct 30, 2025
@janvorli janvorli self-assigned this Oct 30, 2025
Copilot AI review requested due to automatic review settings October 30, 2025 18:51
@janvorli janvorli requested review from BrzVlad and kg as code owners October 30, 2025 18:51
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves exception handling in the CoreCLR interpreter by:

  1. Wrapping calls to GetMethodDescOfVirtualizedCode in a new helper function that properly handles both C++ and managed exceptions
  2. Refining the stack frame iteration logic to correctly distinguish between managed and native callers during exception unwinding

Key changes:

  • Introduces CallGetMethodDescOfVirtualizedCode wrapper function for proper exception handling
  • Updates virtual call and delegate invocation sites to use the new wrapper
  • Improves exception handling logic to check if return addresses point to managed code before advancing the stack frame iterator

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/vm/interpexec.cpp Adds CallGetMethodDescOfVirtualizedCode wrapper function and updates two call sites to use it for proper exception handling
src/coreclr/vm/exceptionhandling.cpp Refines stack frame unwinding logic to properly distinguish managed from native callers using ExecutionManager::IsManagedCode

@davidwrighton
Copy link
Member

@janvorli it looks like the stackoverflowtester failure may be related. I saw this fail on the previous iteration of this PR, and my PRs have not been failing with that issue.

@janvorli
Copy link
Member Author

janvorli commented Nov 1, 2025

@janvorli it looks like the stackoverflowtester failure may be related. I saw this fail on the previous iteration of this PR, and my PRs have not been failing with that issue.

This change doesn't touch any code path that is executed without the interpreter. This stack overflow test issue has been occurring for more than a year on regular basis in the ci, but I was never able to repro it locally on any of my devices.

@am11
Copy link
Member

am11 commented Nov 1, 2025

This change doesn't touch any code path that is executed without the interpreter. This stack overflow test issue has been occurring for more than a year on regular basis in the ci, but I was never able to repro it locally on any of my devices.

@janvorli, is it #110173? Just curious, if async handling is the culprit, should we queue and process SIGSEGVs in FIFO order? The end result should still feel organic in both scenarios; either one of the SIGSEGVs will eventually abort the process (the most likely outcome), or it will be handled gracefully and execution will continue.

@janvorli
Copy link
Member Author

janvorli commented Nov 3, 2025

@janvorli, is it #110173? Just curious, if async handling is the culprit, should we queue and process SIGSEGVs in FIFO order? The end result should still feel organic in both scenarios; either one of the SIGSEGVs will eventually abort the process (the most likely outcome), or it will be handled gracefully and execution will continue.

Yes, it is that one. What we do is that we actually handle only the first SIGSEGV and let it dump the stack and then abort the process. Handler for all subsequent SIGSEGVs on other threads are just left spinning in a loop.

The issue that happens is that we actually get SIGSEGV on the same thread again (there is an added logging that shows that). I can see that the secondary thread we start to dump the stack trace dumped all the frames. So the main thread gets SIGSEGV while waiting for this thread to complete or right after the wait completes. I was thinking that the activation async signal might occur during that wait and require more space then there is available for the stack overflow handling. But the issue persisted even after that signal was left disabled for the stack overflow handling.

This issue never reproduces on any of my local devices. I've left it running in a loop for a week with no repro. So it is something that only happens in the CI for some reason.

@janvorli
Copy link
Member Author

janvorli commented Nov 3, 2025

/ba-g the arm64 legs are down

@janvorli janvorli merged commit 60d14f8 into dotnet:main Nov 3, 2025
92 of 98 checks passed
@janvorli janvorli deleted the fix-eh-flowing-to-native-in-interpreter branch November 3, 2025 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants