-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Fix handling exceptions from native code when interpreted #121212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling exceptions from native code when interpreted #121212
Conversation
There were two issues. One is that the SfiNextWorker was incorrectly advancing the stack frame iterator in case the interpreter was called from CallDescrWorker in some cases. The second was that the call to MethodDesc::GetMethodDescOfVirtualizedCode that is made by the InterpExecMethod can end up running managed code and throw managed exception. So it needs to have PAL_TRY/PAL_EXCEPT around the invocation, since the main loop of the interpreter can only catch and process C++ exceptions.
|
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves exception handling in the CoreCLR interpreter by:
- Wrapping calls to
GetMethodDescOfVirtualizedCodein a new helper function that properly handles both C++ and managed exceptions - Refining the stack frame iteration logic to correctly distinguish between managed and native callers during exception unwinding
Key changes:
- Introduces
CallGetMethodDescOfVirtualizedCodewrapper function for proper exception handling - Updates virtual call and delegate invocation sites to use the new wrapper
- Improves exception handling logic to check if return addresses point to managed code before advancing the stack frame iterator
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/coreclr/vm/interpexec.cpp | Adds CallGetMethodDescOfVirtualizedCode wrapper function and updates two call sites to use it for proper exception handling |
| src/coreclr/vm/exceptionhandling.cpp | Refines stack frame unwinding logic to properly distinguish managed from native callers using ExecutionManager::IsManagedCode |
|
@janvorli it looks like the stackoverflowtester failure may be related. I saw this fail on the previous iteration of this PR, and my PRs have not been failing with that issue. |
This change doesn't touch any code path that is executed without the interpreter. This stack overflow test issue has been occurring for more than a year on regular basis in the ci, but I was never able to repro it locally on any of my devices. |
@janvorli, is it #110173? Just curious, if async handling is the culprit, should we queue and process SIGSEGVs in FIFO order? The end result should still feel organic in both scenarios; either one of the SIGSEGVs will eventually abort the process (the most likely outcome), or it will be handled gracefully and execution will continue. |
Yes, it is that one. What we do is that we actually handle only the first SIGSEGV and let it dump the stack and then abort the process. Handler for all subsequent SIGSEGVs on other threads are just left spinning in a loop. The issue that happens is that we actually get SIGSEGV on the same thread again (there is an added logging that shows that). I can see that the secondary thread we start to dump the stack trace dumped all the frames. So the main thread gets SIGSEGV while waiting for this thread to complete or right after the wait completes. I was thinking that the activation async signal might occur during that wait and require more space then there is available for the stack overflow handling. But the issue persisted even after that signal was left disabled for the stack overflow handling. This issue never reproduces on any of my local devices. I've left it running in a loop for a week with no repro. So it is something that only happens in the CI for some reason. |
|
/ba-g the arm64 legs are down |
There were two issues. One is that the
SfiNextWorkerwas incorrectly advancing the stack frame iterator in case the interpreter was called fromCallDescrWorkerInternalin some cases.The second was that the call to
MethodDesc::GetMethodDescOfVirtualizedCodethat is made by theInterpExecMethodcan end up running managed code and throw managed exception. So it needs to have PAL_TRY/PAL_EXCEPT around the invocation, since the main loop of the interpreter can only catch and process C++ exceptions.This fixes the
Loader\classloader\regressions\vsw529206\vsw529206ModuleCctortest.