gh-149202: Fix frame pointer unwinding on s390x and ARM#149362
gh-149202: Fix frame pointer unwinding on s390x and ARM#149362pablogsal wants to merge 1 commit intopython:mainfrom
Conversation
|
!buildbot ARM |
|
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 9dcfaa3 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge The command will test the builders whose names match following regular expression: The builders matched are:
|
|
!buildbot S390x |
|
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 9dcfaa3 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge The command will test the builders whose names match following regular expression: The builders matched are:
|
Documentation build overview
14 files changed ·
|
|
!buildbot S390x |
|
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 9dcfaa3 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge The command will test the builders whose names match following regular expression: The builders matched are:
|
87aa290 to
ef67c5d
Compare
-fno-omit-frame-pointer is not enough to make every target walkable by the simple manual frame pointer unwinder. The helper used by test_frame_pointer_unwind used to assume the frame pointer named a two-word record where fp[0] was the previous frame pointer and fp[1] was the return address. That is only the generic layout used by some targets. This patch keeps that default, but moves the slots behind named offsets so architecture-specific layouts can describe where the backchain and return address really live. On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is also enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames. Once backchains are present, the helper must also stop at the current thread's known C stack bounds; otherwise it can follow the final backchain far enough to dereference an invalid frame and segfault. For Linux s390x backchain frames, the documented z/Architecture stack-frame layout saves r14, the return-address register, at byte offset 112 from the frame pointer, so read the return address from that named slot instead of fp[1]. The 112-byte offset comes from Linux's s390 debugging documentation: its Stack Frame Layout table shows z/Architecture backchain frames with the backchain at offset 0 and saved r14 of the caller function at offset 112: https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout This helper remains scoped to Linux s390x backchain frames. GNU SFrame's s390x notes state that the s390x ELF ABI does not generally mandate where RA and FP are saved, or whether they are saved at all: https://sourceware.org/binutils/docs/sframe-spec.html#s390x On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb prologue keeps the saved frame pointer and link register at offsets that depend on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper. Use -marm when it is supported for frame-pointer builds, and teach the helper the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the saved LR return address is at fp[0].
|
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit c68855d 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
On s390x, it would be best to use only As background, the primary purpose of a "frame pointer" in compiler-generated code is to allow accessing local variables and spill slots in the current stack frame, if variable-sized stack allocations (alloca) is used. Normally, the stack frame is accessed via constant offsets to the stack pointer, but that doesn't work when alloca is in place, so a separate register (the frame pointer) is used instead. Now, on some architectures (like x86), that frame pointer refers to the top of the stack frame (close to the stack pointer at function entry), and therefore happens to be implicitly usable for stack backtracing. But on other platforms (like s390x) the frame pointer instead refers to the bottom of the stack frame (close to the stack pointer after the function prolog), so it is completely useless for stack backtracing. [ The main reason for this choice is that on s390x, some instructions only allow register+displacement addressing for positive values of the displacement, so a pointer to the bottom of the stack frame can be used more efficiently than a pointer to the top of the stack frame. ] So the only thing |
|
I don't have the context to review this properly. In case naive questions are helpful: Adding As PEP 831's impact analysis doesn't cover these platforms, would it be better to skip the test, and defer PEP 831 for Thumb & s390x? |
| ]) | ||
| AS_CASE([$host_cpu], [s390*], [ | ||
| AX_CHECK_COMPILE_FLAG([-mbackchain], [ | ||
| frame_pointer_cflags="$frame_pointer_cflags -mbackchain" |
There was a problem hiding this comment.
-fno-omit-frame-pointer should be avoided on s390 64-bit (s390x), as frame pointer based unwinding is not supported. When using the s390 back chain as alternative to frame pointers the use of -mbackchain should be sufficient. See my talk s390: Stack tracing using Frame Pointer, Back Chain, and SFrame for details.
Therefore the line above should be changed as follows:
frame_pointer_cflags="-mbackchain"
|
Thanks for the questions, they're not naive at all so happy to give context. To the second one first: I'd rather not skip the test and defer. If we skip the test, we don't make the underlying problem go away, we just stop noticing it: a 3.15 built with naive On 32-bit ARM, GCC can target two different instruction encodings: the classic fixed-width 32-bit "ARM" encoding ( The two modes generate materially different prologues. In ARM mode, GCC emits a fixed Forcing This is the only way to make the PEP compliant in these platforms. And if any user really doesn't want |
|
OK, makes sense. Thanks for the explanation! Looks like the Do you have time to work on this today? I can try to do the
One thing that's still not clear: is the layout predictable using in-memory info? IOW, could a more advanced walker be written, or does it need the “full DWARF unwinding” calibre the PEP rejects?
Looking at test_frame_pointer_unwind.py, the advertisement boils down to |
Yes but in the afternoon London time only :(
We don't control the unwinders. This test is just proving that the fp-based unwinder that everyone implements works here. Looks like looking at clang and gcc both assume that DWARF will be in place when ussing thumb mode because where the RA is and how to reconstruct the fp doesn't look regular.
I think is a good idea. Anything in mind? |
|
OK, taking a look. Ping me on Discord for handoff.
Something like |
|
You can ping me (on Discord) if you want me to test changes on s390x. |
|
#149409 is what I got. |
This was particularly hard to get right :(
-fno-omit-frame-pointer is not enough to make every target walkable by the simple manual frame pointer unwinder.
The helper used by test_frame_pointer_unwind used to assume the frame pointer named a two-word record where fp[0] was the previous frame pointer and fp[1] was the return address. That is only the generic layout used by some targets. This patch keeps that default, but moves the slots behind named offsets so architecture-specific layouts can describe where the backchain and return address really live.
On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is also enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames. Once backchains are present, the helper must also stop at the current thread's known C stack bounds; otherwise it can follow the final backchain far enough to dereference an invalid frame and segfault. For Linux s390x backchain frames, the documented z/Architecture stack-frame layout saves r14, the return-address register, at byte offset 112 from the frame pointer, so read the return address from that named slot instead of fp[1].
On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb prologue keeps the saved frame pointer and link register at offsets that depend on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper. Use -marm when it is supported for frame-pointer builds, and teach the helper the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the saved LR return address is at fp[0].