Integrate interpreter frames into backtraces #23973

Keno · 2017-10-02T22:09:53Z

Before:

julia> let
       error()
       end
ERROR:
Stacktrace:
 [1] error() at ./error.jl:44

After:

julia> let
       error()
       end
ERROR:
Stacktrace:
 [1] error() at ./error.jl:44
 [2] macro expansion at REPL[0]:2 [inlined]
 [3] In toplevel scope

The extra macro expansion frame is a pre-existing julia bug (#23971)
that this PR doesn't address.

The mechanism used here is to add a no-inline enter_interpreter stack frame that has
a known stack layout (since it only has one local variable) and can thus be used
to retrieve the interpreter state. I do not believe that this is guaranteed by the
C standard, so depending on the whims of the compiler we may have to eventually write
this function in assembly, but it seems to work well enough for now.

One significant complication is that the backtrace buffer may now contain pointers
to gc-managed data, so we need to make the gc aware that and can't be as non-chalant
about copying the buffer around anymore.

yuyichao

👍

yuyichao · 2017-10-02T22:22:05Z

base/error.jl

 Get the backtrace of the current exception, for use within `catch` blocks.
 """
-catch_backtrace() = ccall(:jl_get_backtrace, Array{Ptr{Void},1}, ())
+function catch_backtrace()


We should be able to just avoid exposing the bare pointer information?

Yeah, we can probably return two arrays here and keep a sentinel value to decide where to insert them.

yuyichao · 2017-10-02T22:36:32Z

src/interpreter.c

+extern uintptr_t __stop_jl_interpreter_frame;
+
+#define SECT_INTERP JL_SECTION("jl_interpreter_frame")
+#define SECT_ENTER_INTERP JL_SECTION("jl_enter_interpreter_frame")


... I got confused what the linux version is doing here. I assume this is the started but not finished version?

Right, haven't tried on linux yet.

yuyichao · 2017-10-02T22:38:19Z

src/interpreter.c

+#define SECT_INTERP JL_SECTION("__TEXT,__jif")
+#define SECT_ENTER_INTERP JL_SECTION("__TEXT,__jeif")
+#else
+JL_DLLEXPORT int jl_is_interpreter_frame(uintptr_t ip)


Is this function used anywhere? Is this used for removing C frames in the interpreter?

That's the idea, yes. Not implemented yet, though.

Don't we do that already for all functions in libjulia?

There's a mode where we display julia and C frames. I wanted to hide the interpreter frames in those traces, since we insert the reconstructed julia frame.

yuyichao · 2017-10-02T23:24:57Z

src/interpreter.c

+
+JL_DLLEXPORT size_t jl_capture_interp_frame(uintptr_t *data, uintptr_t sp, size_t space_remaining)
+{
+    interpreter_state *s = (interpreter_state *)sp;


Just checked that this may not work all the time. This is actually architecture dependent and AArch64 puts frame pointer and link register below (lower address) local variables. Stack protector also changes this on aarch64 though it seems to not affect x86....

Anyway, I think an asm version will be much better. I'll write the arm and aarch64 version when you have the x86 version.

Actually it seems that on aarch64, gcc follows the aapcs64 recommendation and put local variable above fp and lr while clang/llvm 5.0 put the fp and lr above local variables. Both put stack protection slot at the bottom though..

vtjnash · 2017-10-03T02:12:20Z

src/interpreter.c

+// before the interpreter state. If that's not the case on some platform, that platform
+// will need this function written in assembly.
+void *NOINLINE SECT_ENTER_INTERP enter_interpreter_frame(void *(*callback)(interpreter_state *, void *), void *arg) {
+    interpreter_state state = {};


wouldn't it be a lot faster to represent this explicitly, and teach jl_throw how to unwind it?

struct interpreter_state_chain { uintptr_t sp; // = __builtin_frame_address(0); jl_code_info_t src; uint32_t ip; struct interpreter_state_chain *next; } global;

You don't even need the sp field? (&global would be good enough.) I've thought about this before but don't know how reliable sp is. This PR also rely on that of course....

@JeffBezanson and I thought about it, but it would mean more state to manage everywhere, which didn't seem worth the trade off.

vtjnash · 2017-10-03T02:53:02Z

src/interpreter.c

+
+SECT_INTERP jl_value_t *jl_interpret_toplevel_thunk(jl_module_t *m, jl_code_info_t *src)
+{
+    struct jl_interpret_toplevel_thunk_args args = {.m = m, .src = src};


This C99 syntax is not present in C++

Are we still trying to write code that can be compiled by MSVC?

Yes. Also supports the JULIA_TIMING option.

JeffBezanson · 2017-10-05T16:52:44Z

src/interpreter.c

+    ".type enter_interpreter_frame,@function\n"
+    "enter_interpreter_frame:\n"
+    ".cfi_startproc\n"
+    // sizeof() is 44, but we need to be 8 byte aligned, so subtract 48


sizeof(struct interpreter_state)?

vtjnash · 2017-10-05T16:58:50Z

src/interpreter.c

+    "\tsubq $8, %rsp\n"
+    ".cfi_def_cfa_offset 64\n"
+    "enter_interpreter_frame_start_val:\n"
+    "\tcallq *%rax\n"


I'm not a fan of avoiding a (fast, simple) shadow stack in favor of using this (very slow, complicated) real stack design.

The concern is that you need to add code to maintain the shadow stack in task switching, exception handling, etc. Also, I don't see why this would be slower than the shadow stack mechanism?

One register to load/store instead of 10-20. No jmps, vs. 2 (where one is indirect).

The load/store cost is (nearly) zero (likely to be speculated executed). The jmp cost is high or very high depending on the quality of the processor (Recent Intel processors have optimizations so that this callq *%raq no longer causes a pipeline stall always)

Do you really believe that the cost of one extra jmp is anything but negligible compared to the rest of the cost of function entry in the interpreter?

Two jmps, and both unnecessary and highly complex to maintain

As long as the function being called is not inlined (which is important for predictable value of $sp) that doesn't really save instructions. Calling functions with an address in the register is also in general easier to write....

What would win you this argument is to convince me that the shadow stack solution is simpler

When I implemented this in the past, I used the shadow stack we need for #12205 anyways (e.g. specially marked gc frames). It only took a few minutes to implement, and I hadn't done anything with it (because you and Oscar had Gallium working), so I never made any PRs.

I'll make you a deal. Since this is almost good to go, I'll fix this up and get it merged. Then I'll put up a PR to replace it with the shadow stack approach at which point we can debate which is better.

Still failing on several systems and needs a reviewer, so it appears there's non-trivial work left.

Perhaps, but the backtracing mechanism itself (which is the part that a shadow stack would replace) is implemented and working on all CI tested architectures. Something else still has a bug I believe (though I haven't seen it locally yet), but that part seems to work fine.

Keno · 2017-10-10T17:52:25Z

I understand the win32 failure here (though the lack of rr made this take an unfortunate amount of time). On win32, we set -mincoming-stack-boundary=2 which makes gcc emit a stack realignment prologue. However, under certain circumstances (which may be compiler version specific), it performs the stack realignment before the main prologue [1], which throws off getting the interpreter struct from the stack. What made debugging this worse is that most of the callback do not have the pre-prologue realignment and GCC 6.2.0 doesn't seem to emit it at all for this code. Anyway, should be fairly straightforward to fix.

[1] https://gcc.gnu.org/ml/gcc/2007-12/msg00503.html

Keno · 2017-10-10T21:42:51Z

This is done from my perspective. Once CI finishes running through, I'll push one more commit to fix the C++ compatibility concern. Review away.

Keno · 2017-11-04T01:22:03Z

The Win64 failure looks unrelated to me. @JeffBezanson I think this is good to go at your convenience.

Before: ``` julia> let error() end ERROR: Stacktrace: [1] error() at ./error.jl:44 ``` After: ``` julia> let error() end ERROR: Stacktrace: [1] error() at ./error.jl:44 [2] macro expansion at REPL[0]:2 [inlined] [3] In toplevel scope ``` The extra `macro expansion` frame is a pre-existing julia bug (#23971) that this PR doesn't address. The mechanism used here is to add a no-inline enter_interpreter stack frame that has a known stack layout (since it only has one local variable) and can thus be used to retrieve the interpreter state. I do not believe that this is guaranteed by the C standard, so depending on the whims of the compiler we may have to eventually write this function in assembly, but it seems to work well enough for now. One significant complication is that the backtrace buffer may now contain pointers to gc-managed data, so we need to make the gc aware that and can't be as non-chalant about copying the buffer around anymore.

Keno requested review from JeffBezanson and yuyichao October 2, 2017 22:10

yuyichao reviewed Oct 2, 2017

View reviewed changes

vtjnash reviewed Oct 3, 2017

View reviewed changes

Keno force-pushed the kf/interpretertraces branch from 6aece2d to 4a1cf49 Compare October 3, 2017 20:40

JeffBezanson reviewed Oct 5, 2017

View reviewed changes

vtjnash reviewed Oct 5, 2017

View reviewed changes

JeffBezanson approved these changes Oct 5, 2017

View reviewed changes

Keno force-pushed the kf/interpretertraces branch 5 times, most recently from 8c71c0b to 4476c1b Compare October 9, 2017 19:21

Keno changed the title ~~WIP: Integrate interpreter frames into backtraces~~ Integrate interpreter frames into backtraces Oct 10, 2017

Keno force-pushed the kf/interpretertraces branch from 6e30283 to 3a27f8c Compare October 12, 2017 21:48

Keno force-pushed the kf/interpretertraces branch 4 times, most recently from ddd9b06 to fffc366 Compare November 3, 2017 19:22

JeffBezanson mentioned this pull request Nov 4, 2017

use interpreter when possible in --compile=no mode #24473

Merged

JeffBezanson force-pushed the kf/interpretertraces branch from fffc366 to fa92f86 Compare November 6, 2017 20:04

JeffBezanson merged commit 4fe309a into master Nov 7, 2017

JeffBezanson deleted the kf/interpretertraces branch November 7, 2017 05:08

Uh oh!

Integrate interpreter frames into backtraces #23973

Integrate interpreter frames into backtraces #23973

Uh oh!

Conversation

Keno commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuyichao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vtjnash Oct 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Keno commented Oct 10, 2017

Uh oh!

Keno commented Oct 10, 2017

Uh oh!

Keno commented Nov 4, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Keno commented Oct 2, 2017 •

edited

Loading

vtjnash Oct 5, 2017 •

edited

Loading