gh-149202: Fix frame pointer unwinding on s390x and ARM by pablogsal · Pull Request #149362 · python/cpython

pablogsal · 2026-05-04T13:10:25Z

This was particularly hard to get right :(

-fno-omit-frame-pointer is not enough to make every target walkable by the simple manual frame pointer unwinder.

The helper used by test_frame_pointer_unwind used to assume the frame pointer named a two-word record where fp[0] was the previous frame pointer and fp[1] was the return address. That is only the generic layout used by some targets. This patch keeps that default, but moves the slots behind named offsets so architecture-specific layouts can describe where the backchain and return address really live.

On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is also enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames. Once backchains are present, the helper must also stop at the current thread's known C stack bounds; otherwise it can follow the final backchain far enough to dereference an invalid frame and segfault. For Linux s390x backchain frames, the documented z/Architecture stack-frame layout saves r14, the return-address register, at byte offset 112 from the frame pointer, so read the return address from that named slot instead of fp[1].

On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb prologue keeps the saved frame pointer and link register at offsets that depend on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper. Use -marm when it is supported for frame-pointer builds, and teach the helper the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the saved LR return address is at fp[0].

Issue: Implement PEP 831 – Frame Pointers Everywhere: Enabling System-Level Observability for Python #149202

pablogsal · 2026-05-04T13:10:47Z

!buildbot ARM

bedevere-bot · 2026-05-04T13:10:51Z

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 9dcfaa3 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge

The command will test the builders whose names match following regular expression: ARM

The builders matched are:

ARM64 macOS PR
ARM64 MacOS M1 NoGIL PR
ARM Raspbian PR
iOS ARM64 Simulator PR
ARM64 MacOS M1 Refleaks NoGIL PR
ARM64 Windows Non-Debug PR
ARM64 Windows PR
ARM64 Raspbian PR
ARM64 Raspbian Debug PR

pablogsal · 2026-05-04T13:10:57Z

!buildbot S390x

bedevere-bot · 2026-05-04T13:11:00Z

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 9dcfaa3 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge

The command will test the builders whose names match following regular expression: S390x

The builders matched are:

s390x Fedora Rawhide NoGIL PR
s390x Fedora Rawhide Clang PR
s390x RHEL9 LTO + PGO PR
s390x RHEL9 PR
s390x Fedora Stable Clang Installed PR
s390x Fedora Rawhide LTO PR
s390x RHEL8 LTO + PGO PR
s390x Fedora Rawhide PR
s390x Fedora Stable PR
s390x Fedora Stable Refleaks PR
s390x Fedora Stable LTO + PGO PR
s390x Fedora Rawhide NoGIL refleaks PR
s390x Fedora Stable Clang PR
s390x RHEL8 Refleaks PR
s390x Fedora Rawhide Refleaks PR
s390x RHEL8 PR
s390x RHEL8 LTO PR
s390x Fedora Rawhide Clang Installed PR
s390x RHEL9 Refleaks PR
s390x RHEL9 LTO PR
s390x Fedora Stable LTO PR
s390x Fedora Rawhide LTO + PGO PR

read-the-docs-community · 2026-05-04T14:06:30Z

Documentation build overview

📚 cpython-previews | 🛠️ Build #32531857 | 📁 Comparing c68855d against main (ce51c18)

🔍 Preview build

14 files changed · ± 14 modified

± Modified

pablogsal · 2026-05-04T16:57:49Z

!buildbot S390x

bedevere-bot · 2026-05-04T16:57:53Z

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 9dcfaa3 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge

The command will test the builders whose names match following regular expression: S390x

The builders matched are:

s390x Fedora Rawhide NoGIL PR
s390x Fedora Rawhide Clang PR
s390x RHEL9 LTO + PGO PR
s390x RHEL9 PR
s390x Fedora Stable Clang Installed PR
s390x Fedora Rawhide LTO PR
s390x RHEL8 LTO + PGO PR
s390x Fedora Rawhide PR
s390x Fedora Stable PR
s390x Fedora Stable Refleaks PR
s390x Fedora Stable LTO + PGO PR
s390x Fedora Rawhide NoGIL refleaks PR
s390x Fedora Stable Clang PR
s390x RHEL8 Refleaks PR
s390x Fedora Rawhide Refleaks PR
s390x RHEL8 PR
s390x RHEL8 LTO PR
s390x Fedora Rawhide Clang Installed PR
s390x RHEL9 Refleaks PR
s390x RHEL9 LTO PR
s390x Fedora Stable LTO PR
s390x Fedora Rawhide LTO + PGO PR

-fno-omit-frame-pointer is not enough to make every target walkable by the simple manual frame pointer unwinder. The helper used by test_frame_pointer_unwind used to assume the frame pointer named a two-word record where fp[0] was the previous frame pointer and fp[1] was the return address. That is only the generic layout used by some targets. This patch keeps that default, but moves the slots behind named offsets so architecture-specific layouts can describe where the backchain and return address really live. On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is also enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames. Once backchains are present, the helper must also stop at the current thread's known C stack bounds; otherwise it can follow the final backchain far enough to dereference an invalid frame and segfault. For Linux s390x backchain frames, the documented z/Architecture stack-frame layout saves r14, the return-address register, at byte offset 112 from the frame pointer, so read the return address from that named slot instead of fp[1]. The 112-byte offset comes from Linux's s390 debugging documentation: its Stack Frame Layout table shows z/Architecture backchain frames with the backchain at offset 0 and saved r14 of the caller function at offset 112: https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout This helper remains scoped to Linux s390x backchain frames. GNU SFrame's s390x notes state that the s390x ELF ABI does not generally mandate where RA and FP are saved, or whether they are saved at all: https://sourceware.org/binutils/docs/sframe-spec.html#s390x On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb prologue keeps the saved frame pointer and link register at offsets that depend on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper. Use -marm when it is supported for frame-pointer builds, and teach the helper the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the saved LR return address is at fp[0].

bedevere-bot · 2026-05-04T19:16:07Z

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit c68855d 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149362%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

uweigand · 2026-05-05T08:07:06Z

On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is also enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames.

On s390x, it would be best to use only -mbackchain and not -fno-omit-frame-pointer. The latter option will introduce some performance overhead (i.e. generated code will actually maintain a frame pointer at run time even when unnecessary), without providing any benefit for unwinding.

As background, the primary purpose of a "frame pointer" in compiler-generated code is to allow accessing local variables and spill slots in the current stack frame, if variable-sized stack allocations (alloca) is used. Normally, the stack frame is accessed via constant offsets to the stack pointer, but that doesn't work when alloca is in place, so a separate register (the frame pointer) is used instead.

Now, on some architectures (like x86), that frame pointer refers to the top of the stack frame (close to the stack pointer at function entry), and therefore happens to be implicitly usable for stack backtracing. But on other platforms (like s390x) the frame pointer instead refers to the bottom of the stack frame (close to the stack pointer after the function prolog), so it is completely useless for stack backtracing. [ The main reason for this choice is that on s390x, some instructions only allow register+displacement addressing for positive values of the displacement, so a pointer to the bottom of the stack frame can be used more efficiently than a pointer to the top of the stack frame. ]

So the only thing -fno-omit-frame-pointer will do on s390x is to force the compiler to allocate and maintain the frame pointer register, even though it is unnecessary for either local variable accesses or backtracing. Just -mbackchain is enough to allow for backtracing (using only fields on the stack, with no extra register being allocated).

encukou · 2026-05-05T08:20:49Z

I don't have the context to review this properly. In case naive questions are helpful:

Adding -marm on all ARM platforms looks like a large change if it's done for tests only. Unwinders not supporting thumb mode surprises me a bit. Is it just the simple manual frame pointer unwinder (a test-only thing, right), or does this affect "real" unwinders, too?

As PEP 831's impact analysis doesn't cover these platforms, would it be better to skip the test, and defer PEP 831 for Thumb & s390x?

jremus · 2026-05-05T07:52:53Z

+      ])
+      AS_CASE([$host_cpu], [s390*], [
+        AX_CHECK_COMPILE_FLAG([-mbackchain], [
+          frame_pointer_cflags="$frame_pointer_cflags -mbackchain"


-fno-omit-frame-pointer should be avoided on s390 64-bit (s390x), as frame pointer based unwinding is not supported. When using the s390 back chain as alternative to frame pointers the use of -mbackchain should be sufficient. See my talk s390: Stack tracing using Frame Pointer, Back Chain, and SFrame for details.

Therefore the line above should be changed as follows:

frame_pointer_cflags="-mbackchain"

pablogsal · 2026-05-05T08:50:42Z

Thanks for the questions, they're not naive at all so happy to give context.

To the second one first: I'd rather not skip the test and defer. If we skip the test, we don't make the underlying problem go away, we just stop noticing it: a 3.15 built with naive --with-frame-pointers on Debian armhf or Fedora s390x would advertise frame-pointer support and then silently fail to unwind through the JIT trampoline and through any tool that walks the FP chain (perf with -Xperf_jit, py-spy, bpftrace, etc.). That's strictly worse than the status quo. python-build-standalone is already shipping -fno-omit-frame-pointer regardless of what we do upstream, so users will hit this whether we cover it or not. I prefer better that we cover it.

On 32-bit ARM, GCC can target two different instruction encodings: the classic fixed-width 32-bit "ARM" encoding (-marm), and the mixed 16/32-bit "Thumb-2" encoding (-mthumb). They're the same architecture and the same registers — just two different instruction encodings the CPU can decode. Most modern armhf toolchains (Debian, Ubuntu, Raspbian armhf) default to Thumb-2 because it produces ~25–30% smaller code at near-equivalent performance, which matters a lot for cache footprint on small ARM cores.

The two modes generate materially different prologues. In ARM mode, GCC emits a fixed push {..., fp, lr} followed by add fp, sp, #N with N chosen so that the saved frame pointer lands at fp[-1] and the saved LR at fp[0] giving a stable, walkable layout. In Thumb-2, the prologue is composed of 16-bit instructions, can be split across IT blocks, and the offsets at which fp and lr end up saved depend on what else the function spills. There is no fixed fp[N] slot you can dereference to get the previous frame and return address; the layout varies per function. That's why the manual FP-chain walker can't traverse Thumb frames because there's no single offset that works.

Forcing -marm for the frame-pointer build sidesteps this by pinning the encoding to the one mode where the FP chain has a fixed shape.

This is the only way to make the PEP compliant in these platforms.

And if any user really doesn't want -marm (or -mbackchain) in their frame-pointer build, they can unset the configure option but given that none of these targets are tier 1 or tier 2 platforms, I think adding the per-arch handling here is clearly the right call.

encukou · 2026-05-05T10:34:00Z

OK, makes sense. Thanks for the explanation!

Looks like the _testinternalcapi.c tests only demonstrate that manual frame walking is possible; they only need to be good enough to make the tests pass. With that, I'll be comfortable reviewing this :)

Do you have time to work on this today? I can try to do the mbackchain-only change, but of course I wouldn't be as effective at it.

the layout varies per function. [...] the manual FP-chain walker can't traverse Thumb frames because there's no single offset that works

One thing that's still not clear: is the layout predictable using in-memory info? IOW, could a more advanced walker be written, or does it need the “full DWARF unwinding” calibre the PEP rejects?

would advertise frame-pointer support and then silently fail to unwind

Looking at test_frame_pointer_unwind.py, the advertisement boils down to "no-omit-frame-pointer" in cflags, is that right? Would it make sense to have a more explicit source of truth?

pablogsal · 2026-05-05T10:41:01Z

Do you have time to work on this today? I can try to do the mbackchain-only change, but of course I wouldn't be as effective at it.

Yes but in the afternoon London time only :(

One thing that's still not clear: is the layout predictable using in-memory info? IOW, could a more advanced walker be written, or does it need the “full DWARF unwinding” calibre the PEP rejects?

We don't control the unwinders. This test is just proving that the fp-based unwinder that everyone implements works here. Looks like looking at clang and gcc both assume that DWARF will be in place when ussing thumb mode because where the RA is and how to reconstruct the fp doesn't look regular.

Looking at test_frame_pointer_unwind.py, the advertisement boils down to "no-omit-frame-pointer" in cflags, is that right? Would it make sense to have a more explicit source of truth?

I think is a good idea. Anything in mind?

encukou · 2026-05-05T11:27:50Z

OK, taking a look. Ping me on Discord for handoff.

Would it make sense to have a more explicit source of truth?
I think is a good idea. Anything in mind?

Something like #define _Py_WITH_FRAME_POINTERS & expose it in _testinternalcapi

vstinner · 2026-05-05T12:11:46Z

You can ping me (on Discord) if you want me to test changes on s390x.

encukou · 2026-05-05T17:01:58Z

#149409 is what I got.

pablogsal requested review from AA-Turner, corona10, emmatyping and erlend-aasland as code owners May 4, 2026 13:10

bedevere-app Bot mentioned this pull request May 4, 2026

Implement PEP 831 – Frame Pointers Everywhere: Enabling System-Level Observability for Python #149202

Closed

bedevere-app Bot added the awaiting core review label May 4, 2026

hugovk added awaiting core review and removed awaiting core review labels May 4, 2026

pablogsal force-pushed the gh-149202-2 branch 3 times, most recently from 87aa290 to ef67c5d Compare May 4, 2026 18:52

pablogsal force-pushed the gh-149202-2 branch from ef67c5d to c68855d Compare May 4, 2026 19:14

pablogsal added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 4, 2026

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 4, 2026

pablogsal mentioned this pull request May 4, 2026

gh-149202: Implement PEP 831 – Frame Pointers Everywhere: Enabling System-Level Observability for Python #149201

Merged

jremus reviewed May 5, 2026

View reviewed changes

diegorusso mentioned this pull request May 5, 2026

GH-126910: Add GNU backtrace support for unwinding JIT frames #149104

Merged

encukou mentioned this pull request May 5, 2026

gh-149202: Fix frame pointer unwinding on s390x and ARM #149409

Open

Uh oh!

Conversation

pablogsal commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented May 4, 2026

Uh oh!

bedevere-bot commented May 4, 2026

Uh oh!

pablogsal commented May 4, 2026

Uh oh!

bedevere-bot commented May 4, 2026

Uh oh!

read-the-docs-community Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

pablogsal commented May 4, 2026

Uh oh!

bedevere-bot commented May 4, 2026

Uh oh!

bedevere-bot commented May 4, 2026

Uh oh!

uweigand commented May 5, 2026

Uh oh!

encukou commented May 5, 2026

Uh oh!

jremus May 5, 2026

Choose a reason for hiding this comment

Uh oh!

pablogsal commented May 5, 2026

Uh oh!

encukou commented May 5, 2026

Uh oh!

pablogsal commented May 5, 2026

Uh oh!

encukou commented May 5, 2026

Uh oh!

vstinner commented May 5, 2026

Uh oh!

encukou commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pablogsal commented May 4, 2026 •

edited

Loading

read-the-docs-community Bot commented May 4, 2026 •

edited

Loading