-
-
Notifications
You must be signed in to change notification settings - Fork 679
Fix another libgap segmentation fault #40728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
22c9f77 to
7eeab93
Compare
|
Documentation preview for this PR (built with commit 97f9af0; changes) is ready! 🎉 |
|
Argh, I noticed this while working on #40585 but it's not what was causing that reproducible segfault so I eventually reverted my changes and forgot about it. I think you'll find the same issue in |
|
The CI failures are #40715, you'll probably have to pad the |
|
I suspect the timeout to be more likely #40556 . Anyway… |
|
more mysterious failures, looks worth investigating. (probably not caused by this pull request?) → with the latest commit it becomes less common, but nonetheless still happen sometimes. I'm tempted to conclude it's #40734 but I haven't looked at it yet. although this is also interesting since it may give some clue on how the alarm disappear (looking at polynomial_element.pyx etc.) |
f8e4454 to
97f9af0
Compare
Follow-up to #40613.
Save the following into
b.sagethen run
sage b.sage. It should segmentation fault soon.The cause
As is well known, if Cython generates code of the following form
then the
__Pyx_XDECREF(__pyx_t_8)gets optimized toif (__pyx_t_8 != 0) SEGMENTATION_FAULT;,and if GAP error is raised, we get null pointer dereference.
Before #40613,
KeyboardInterruptorAlarmInterruptjumps tosig_on(),sig_GAP_Enter()is used, first jumps toGAP_Enter(), then asig_error()call jumps tosig_on()to raise a Python exception,GAP_Enter()is used, jumps toGAP_Enter(), which then raises a Python exception.After #40613, in all cases, the outermost
GAP_Enter()is jumped to.Debug guide
There are 3 parts of debugging:
__pyx_t_8,(B),(A).Finding the variable name is easy: it suffices to run
sage --gdbthen run the script.When it segmentation fault, look at the instruction that causes the segmentation fault.
In this case, it is
therefore the variable is
[rbp-0x90].How to map back from assembly to
.cfilecompile_commands.jsonmeson-generated_src_sage_libs_gap_element.pyx.c.ofromgap/element.pyx.c.oto.sthen add-S -fverbose-asm -masm=inteland recompile, then read the.sfile generatedalternatively I believe adding
-gat appropriate places also give you this.How to find the call
Break at either
__pyx_f_4sage_4libs_3gap_4util_gap_interrupt_asap,GAP_THROW, or__longjmp_chk.Then
CYSIGNALS_CRASH_QUIET=1 rr recordit,rr replay -eandreverse-continuefrom the point of segmentation fault.Relevant backtrace:
run
frame 48and see the assembly iswhich correspond to (look at
element.pyx.c:8996and scroll a few lines up)We determine this is the call
(B).How to find the assignment instruction
We want to find
(A).Assume we're in
rr, runcontinuethenreverse-stepia few times until thecmpinstruction is executed, then runwhere the argument to
watchis copied from the output ofp.Then
reverse-continue. Whichever instruction last set the memory address is the instruction(A).Turns out it's the
__Pyx_GetModuleGlobalName(to fetch thelibgapPython global variable).Conclusion
Don't access any Python object between
GAP_Enter...GAP_Leave. Not even accessing the Python global variablelibgap.Side note, if I add
cdef libgapright before thefrom ... import libgapline, then this particular cause is eliminated, but it still segmentation fault because of theenumerate()plus some complex combination of compiler optimization.This fix creates a temporarily variable, but I think the overhead of creating a Python
tupleis small.We're still accessing
gap_elements[i], which is a Python API call (!!!), but as it happens,GAP_AssListnever throws, so we're safe.📝 Checklist
⌛ Dependencies