Skip to content

docs: add benchmarking blog posts and performance reference page#254

Draft
SamBarker wants to merge 3 commits into
kroxylicious:mainfrom
SamBarker:blog/benchmarking-the-proxy
Draft

docs: add benchmarking blog posts and performance reference page#254
SamBarker wants to merge 3 commits into
kroxylicious:mainfrom
SamBarker:blog/benchmarking-the-proxy

Conversation

@SamBarker
Copy link
Copy Markdown
Member

Summary

  • Adds two blog posts about benchmarking Kroxylicious proxy overhead:
    • [May 1] "Does my proxy look big in this cluster?" — operator-focused: methodology, passthrough and encryption results, sizing guidance
    • [May 8] "Benchmarking a Kafka proxy: the engineering story" — engineer-focused: OMB harness, flamegraphs (interactive iframes), bugs found in own tooling, cluster incident
  • Adds a /performance/ reference page summarising key numbers and linking to both posts
  • Adds interactive async-profiler flamegraphs as self-contained HTML assets
  • Updates overview.markdown with headline performance figures and a link to the reference page

Status

Draft — the posts are first drafts. Known open items:

  • Per-connection scaling section in Post 1 needs the TODO placeholder replaced once 4-core sweep data is available and the scaling picture is better understood
  • Post 2 has a stub section for 4-core validation results (pending sweep completion)
  • Post 2 tone has not yet received the same voice treatment as Post 1

Test plan

  • Run ./run.sh and verify site renders at http://127.0.0.1:4000/
  • Check both blog posts render correctly including flamegraph iframes
  • Check /performance/ page renders with correct tables
  • Check cross-links between posts and to /performance/ work

🤖 Generated with Claude Code

SamBarker added 3 commits May 1, 2026 16:24
Covers methodology, test environment, passthrough proxy results,
encryption latency and throughput ceiling, the per-connection scaling
insight, and sizing guidance. Includes a TODO placeholder for the
connection sweep results before publication.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Covers why we chose OMB over Kafka's own tools, the benchmark harness
we built (Helm chart, orchestration scripts, JBang result processors),
workload design rationale, CPU flamegraphs with embedded interactive
iframes, the per-connection ceiling discovery, bugs found in our own
tooling, and the cluster recovery incident.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds /performance/ as a dedicated quick-reference page with headline
benchmark numbers, comparison tables, and sizing guidance, linked from
both blog posts. Updates the existing Performance section in overview.markdown
with the key headline numbers and a link to the full reference page.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
| Kroxylicious proxy | 1.4% |
| GC | 0.1% |

The proxy is overwhelmingly I/O-bound. 59% of CPU is in `send`/`recv` syscalls — the inherent cost of maintaining two TCP connections (client→proxy, proxy→Kafka) with data flowing through the JVM. The proxy itself accounts for 1.4%. It really is a TCP relay with protocol awareness.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much that's down to the decode predicate thing -- basically we know the filter chain, and what each filter in it wants to intercept, and I think we avoid doing the request/response decoding when we know nothing is interested. That was code that was in there from the beginning, but I don't actually know how relevant it is -- maybe some of the internal filters mean we're decoding requests and response always, in which case 1.4% is impressive. Or maybe we're acting more like a L4 proxy most of the time, in which case 1.4% is not quite as impressive.


The direct crypto cost is 13.3% (11.3% AES-GCM + 2.0% Kroxylicious filter logic). But encryption adds indirect costs too:

- **Buffer management (+5.8%)**: encrypted records need to be read into buffers, encrypted, and written to new buffers — more allocation, more copying
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ever figure out how to reuse the buffers more? I think that was a TODO at one point.


Fix: `kubectl uncordon worker0 worker1 worker2`. Once uncordoned, pods scheduled, operators recovered, and the upgrade completed.

Not a Kroxylicious bug, but it cost several hours of cluster recovery time during an active benchmark campaign. Worth knowing about if you're running OCP on Fyre.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Fyre is an IBM internal thing, this is not terribly useful to all readers. Can we generalise it to being about OpenShift more generally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants