Skip to content

fix(runtime): #833 — replace(/.../g, fn) honors SSO returns (no more empty replacements)#844

Merged
proggeramlug merged 1 commit into
mainfrom
worktree-fix-833-replace-replacer-fn
May 16, 2026
Merged

fix(runtime): #833 — replace(/.../g, fn) honors SSO returns (no more empty replacements)#844
proggeramlug merged 1 commit into
mainfrom
worktree-fix-833-replace-replacer-fn

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

Root cause

js_string_replace_regex_fn in crates/perry-runtime/src/regex.rs decoded the closure's NaN-boxed return value with a hand-rolled tag switch — it accepted STRING_TAG (0x7FFF) and POINTER_TAG (0x7FFD), but not SHORT_STRING_TAG (0x7FF9, Perry's small-string optimization where ≤ 5-byte strings live inline in the NaN-box payload). When js_string_concat_box's SSO fast path returned an SSO-tagged concat, both arms missed, the handler pushed nothing, and the match was effectively erased.

Why the bug looked weird: (m) => "Y", (m) => m, (m) => m + "!" (6 bytes) all worked because they didn't take the SSO path. Only outputs whose total byte length was ≤ 5 dropped — exactly the boundary the bug report's charAt(0) + slice(1) lands on.

Fix

Route the return value through js_get_string_pointer_unified — the existing centralized decoder that handles all four string representations (heap STRING_TAG, SSO SHORT_STRING_TAG with on-the-fly heap-materialization, POINTER_TAG, raw heap pointer) plus the JS-spec number-to-string coercion. Every other "JSValue → string ptr" call site in the runtime already uses it; the regex-replace handler was the lone holdout. Net change: 16 lines of branchy tag arithmetic → 4 lines that delegate to the unified decoder.

Test plan

  • Issue repro byte-identical to node --experimental-strip-types: [Hello World Foo] / [aYbYc] / [aZcaZc] 1,4.
  • Diagnostic matrix — g/h/i (heap path) unchanged, j/k (SSO path) now match node.
  • Gap suite 35/36 (was 34/36): test_gap_regexp_advanced flips PASS; test_gap_console_methods remains as the unchanged pre-existing console.dir/group gap.
  • cargo test --release -p perry-runtime --lib regex:: green (6/0/0).

Bonus: replacer functions returning numbers now coerce to their string form via the unified decoder's number-to-string arm (was: empty replacement).

Closes #833. Refs #793, #801.

…empty replacements)

`js_string_replace_regex_fn` decoded the replacer-function's NaN-boxed return
value with a hand-rolled tag switch that only accepted `STRING_TAG` (0x7FFF)
and `POINTER_TAG` (0x7FFD). When `js_string_concat_box`'s SSO fast path
returned a ≤ 5-byte concat tagged `SHORT_STRING_TAG` (0x7FF9) — exactly the
boundary `m.charAt(0).toUpperCase() + m.slice(1)` lands on for "hello" /
"world" / "foo" — both `if tag == ...` arms missed, the handler pushed
nothing, and every match was erased.

Route the return value through `js_get_string_pointer_unified`, the existing
centralized decoder that handles all four string representations (heap
STRING_TAG, SSO with on-the-fly heap-materialization, POINTER_TAG, raw
pointer) plus the JS-spec number-to-string coercion. Every other "JSValue →
string ptr" call site already uses it; the regex-replace handler was the lone
holdout doing manual tag dispatch.

Net change: 16 lines of branchy tag arithmetic → 4 lines that delegate to the
unified decoder. The codepath physically cannot omit a string variant if it
doesn't list them itself.

Gap suite improves 34/36 → 35/36 (`test_gap_regexp_advanced` flips PASS).

Closes #833.
@proggeramlug proggeramlug force-pushed the worktree-fix-833-replace-replacer-fn branch from 19ef2fc to 6a04461 Compare May 16, 2026 10:30
@proggeramlug proggeramlug merged commit 9912fc6 into main May 16, 2026
9 checks passed
@proggeramlug proggeramlug deleted the worktree-fix-833-replace-replacer-fn branch May 16, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

String.prototype.replace(/.../g, fn) — replacer-fn return value is dropped (replaces with empty string)

1 participant