Skip to content

GH-126910: Add GNU backtrace support for unwinding JIT frames#149104

Merged
diegorusso merged 8 commits intopython:mainfrom
diegorusso:jit-backtrace-support
May 5, 2026
Merged

GH-126910: Add GNU backtrace support for unwinding JIT frames#149104
diegorusso merged 8 commits intopython:mainfrom
diegorusso:jit-backtrace-support

Conversation

@diegorusso
Copy link
Copy Markdown
Contributor

@diegorusso diegorusso commented Apr 28, 2026

@diegorusso
Copy link
Copy Markdown
Contributor Author

Performance wise, I didn't see any hit. Numbers are in the noise.

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 28, 2026

@diegorusso diegorusso force-pushed the jit-backtrace-support branch from abc1072 to db967dc Compare May 2, 2026 17:59
@diegorusso
Copy link
Copy Markdown
Contributor Author

Apologies.. I had to force push.

Comment thread Include/internal/pycore_jit_unwind.h Outdated
Comment thread Include/internal/pycore_jit_unwind.h Outdated

#if defined(_Py_JIT) && defined(__linux__) && defined(__ELF__)
# define PY_HAVE_JIT_GDB_UNWIND
# if defined(HAVE_EXECINFO_H) && defined(HAVE_BACKTRACE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a separate configure link probe. HAVE_BACKTRACE only tells us that backtrace() is available, but this code also calls libgcc-specific symbols: __register_frame and __deregister_frame.

For example, Linux builds with clang -rtlib=compiler-rt can have execinfo.h / backtrace() while those frame registration symbols are not linkable. In that case thismacro gets enabled and the build fails later at link time.

Please add a configure check for __register_frame __deregister_frame and gate PY_HAVE_JIT_GNU_BACKTRACE_UNWIND on that result.

Copy link
Copy Markdown
Member

@pablogsal pablogsal May 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example this can break in:

  • Alpine / musl + libexecinfo (no libgcc_s __register_frame)
  • clang + compiler-rt without libgcc_s
  • LLVM libunwind builds (different FDE table format)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it makes sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link probe implemented here: ef9ea52

@support.requires_gil_enabled("test requires the GIL enabled")
@unittest.skipIf(support.is_wasi, "test not supported on WASI")
@unittest.skipUnless(sys.platform == "linux", "GNU backtrace unwinding test requires Linux")
class GnuBacktraceUnwindTests(unittest.TestCase):
Copy link
Copy Markdown
Member

@pablogsal pablogsal May 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test only asserts python_frames > 0 and jit_frames > 0, which a stub unwinder would pass. Two things that would help:

  • Tighten to python_frames >= 10 (the recursion depth) so the count actually means something.
  • Add a test that JIT frames disappear from backtrace() after the executor is freed: that's the property the deregister code exists to guarantee and nothing covers it today.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: 01df239

* A NULL result is valid when publication succeeded only through backends
* with no unregister step, such as perf map output.
*/
_PyJitCodeRegistration *_PyJit_RegisterCode(const void *code_addr,
Copy link
Copy Markdown
Member

@pablogsal pablogsal May 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyJit_RegisterCode returns NULL for three different reasons (perf-only success, calloc failure, all backends failed) and the caller can't tell them apart. Could you rename registered -> any_registered and add a one-liner near the perf branch noting it's intentionally not counted? The deleted comment from jit_record_code about partial-failure being non-fatal would also be nice to restore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: cf16da0

Comment thread Python/jit_publish.c Outdated
#if defined(PY_HAVE_JIT_GDB_UNWIND) \
|| defined(PY_HAVE_JIT_GNU_BACKTRACE_UNWIND)
struct _PyJitCodeRegistration {
# if defined(PY_HAVE_JIT_GDB_UNWIND)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to guard every field of the struct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version is valid, but the per-field guards are probably more defensive than necessary. I added them to keep the registration object matching the enabled backends, but since the struct is private and the fields are opaque handles, we can simplify the struct and keep the feature guards around the actual register/unregister code.

@diegorusso diegorusso requested a review from corona10 as a code owner May 3, 2026 01:06
Copy link
Copy Markdown
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this and tested it locally again and it works in all cases I can find.

Great work @diegorusso 💪

@diegorusso diegorusso merged commit 1e5d942 into python:main May 5, 2026
95 of 96 checks passed
@diegorusso diegorusso deleted the jit-backtrace-support branch May 5, 2026 08:43
@bedevere-bot
Copy link
Copy Markdown

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL9 LTO + PGO 3.x (tier-3) has failed when building commit 1e5d942.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1578/builds/4764) and take a look at the build logs.
  4. Check if the failure is related to this commit (1e5d942) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1578/builds/4764

Failed tests:

  • test_frame_pointer_unwind

Failed subtests:

  • test_manual_unwind_respects_frame_pointers - test.test_frame_pointer_unwind.FramePointerUnwindTests.test_manual_unwind_respects_frame_pointers

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/var/lib/buildbot/worker/cstratak-rhel9-s390x/3.x.cstratak-rhel9-s390x.lto-pgo/build/Lib/test/test_frame_pointer_unwind.py", line 241, in test_manual_unwind_respects_frame_pointers
    self.assertGreaterEqual(
    ~~~~~~~~~~~~~~~~~~~~~~~^
        python_frames,
        ^^^^^^^^^^^^^^
        STACK_DEPTH,
        ^^^^^^^^^^^^
        f"expected to find Python frames on {self.machine} with env {env}",
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
AssertionError: 1 not greater than or equal to 10 : expected to find Python frames on s390x with env {'PYTHON_JIT': '0'}

@diegorusso
Copy link
Copy Markdown
Contributor Author

The buildbot failure will be fixed by: #149362

@encukou
Copy link
Copy Markdown
Member

encukou commented May 5, 2026

Looks like this fails on ppc64le: https://buildbot.python.org/#/builders/451/builds/6989

@encukou
Copy link
Copy Markdown
Member

encukou commented May 5, 2026

Also Raspbian failures went from

{"length": 1, "python_frames": 0, "jit_frames": 0, "other_frames": 1, "jit_backend": null}

to

{"length": 0, "python_frames": 0, "jit_frames": 0, "other_frames": 0, "jit_backend": null, "unwinder": "gnu_backtrace_unwind"}

(even with #149362 applied)

@diegorusso
Copy link
Copy Markdown
Contributor Author

Looks like this fails on ppc64le: https://buildbot.python.org/#/builders/451/builds/6989

No, the GNU backtrace works on that platform. The one failing is test.test_frame_pointer_unwind.FramePointerUnwindTests.test_manual_unwind_respects_frame_pointers and very likely because the way we walk the fp chain.
On PPC64 the stack is defined like this: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html?utm_source=openai#STACK
The ABI stack layout explicitly has Back chain (SP + 0), CR save area (SP + 8), and LR save area (SP + 16).

We need to update @pablogsal PR (#149362) to include the shape for ppc64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants