zeemo: add pipelined + limited-conn, batch pipelined responses, shrink RSS by skylightis666 · Pull Request #727 · MDA2AV/HttpArena

skylightis666 · 2026-05-18T12:44:20Z

Description

Follow-up to #723. Three changes:

Subscribe to pipelined and limited-conn profiles. New /pipeline endpoint returning ok. limited-conn reuses /baseline11 since it just exercises connection churn with the same payload.
Fix HTTP/1.1 pipelining in the io_uring loop. The previous code dispatched the first request out of a recv, sent the response, then submitted a fresh recv() even when the next 15 requests were already buffered — client waited for responses while we waited for bytes that had already arrived. New drainAndSend loops on feed(0) after each dispatch, accumulates responses in the per-connection write buffer, and emits a single batched send() for the whole burst.
MAX_CONN 1024 → 128 per worker. SO_REUSEPORT spreads 4,096 bench connections across N workers (4-tuple hash); with 64 workers the per-worker mean is ~64 and σ ≈ 8, so 128 leaves comfortable headroom while shrinking BSS roughly 8×. The composite score's memory bonus uses sqrt(rps)/memMB, so flat throughput plus lower RSS bumps the score — on the previous run we sat at 187 MiB for baseline-4096; aiming for ~30–50 MiB now.

Supporting change in jsonHandler: emit a fixed-length header prefix with Content-Length zero-padded to 5 digits so every response starts at out[0]. Lets the drain loop concatenate responses without a memmove and keeps partial-send recovery correct.

All 20 local validation checks pass (17 original + GET /pipeline + Content-Type + 8-request pipelined batch served from one recv).

PR Commands — comment to trigger (requires collaborator approval):

Command	Description
`/benchmark -f zeemo`	Preview run, results posted as comment
`/benchmark -f zeemo --save`	Run and save results (updates leaderboard on merge)

Source: https://github.com/skylightis666/zeemo

MDA2AV · 2026-05-18T13:00:45Z

/benchmark -f zeemo --save

github-actions · 2026-05-18T13:00:56Z

👋 /benchmark request received. A collaborator will review and approve the run.

github-actions · 2026-05-18T13:01:15Z

⚠️ /benchmark --save cannot start: main has diverged and cannot be auto-merged into this branch. Please merge or rebase main manually, push, and re-run /benchmark --save.

- New /pipeline endpoint returning "ok" - io_uring loop now drains all complete requests buffered from one recv() into a single batched send(), instead of recv→dispatch→send→recv per request. The previous code stalled on HTTP/1.1 pipelining: after a response, it submitted a fresh recv() while the next 15 requests were already sitting in the parser buffer, so the client waited for responses while the server waited for bytes that had already arrived. - jsonHandler now emits a fixed-length header prefix with Content-Length zero-padded to 5 digits, so every response starts at out[0]. This lets the drain loop concatenate multiple responses in the write buffer without a memmove and keeps partial-send recovery correct. - MAX_CONN: 1024 → 128 per worker. SO_REUSEPORT spreads 4096 connections across N workers; with 64 workers the per-worker mean is ~64 and 128 leaves comfortable headroom while shrinking BSS ~8×. The composite score uses sqrt(rps)/memMB, so flat throughput plus lower memory raises the score. - meta.json: subscribe to pipelined and limited-conn. All 20 local validation checks pass (4 new pipelining cases).

skylightis666 · 2026-05-18T14:14:31Z

/benchmark -f zeemo --save

github-actions · 2026-05-18T14:14:41Z

👋 /benchmark request received. A collaborator will review and approve the run.

github-actions · 2026-05-18T14:21:02Z

Benchmark Results

Framework: zeemo | Test: all tests

Test	Conn	RPS	CPU	Mem	Δ RPS	Δ Mem
baseline	512	4,107,322	6336.8%	75MiB	+0.3%	~0%
baseline	4096	4,428,334	6404.3%	184MiB	+0.1%	+1.1%
pipelined	512	48,476,704	6552.9%	74MiB	NEW	NEW
pipelined	4096	50,013,615	6425.4%	173MiB	NEW	NEW
limited-conn	512	2,628,568	5497.0%	114MiB	NEW	NEW
limited-conn	4096	2,606,729	5592.4%	254MiB	NEW	NEW
json	4096	2,352,168	6385.8%	258MiB	-1.6%	-10.7%

Full log

  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  10
  Templates: 3
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.42ms   1.41ms   1.75ms   1.98ms   2.27ms

  13034266 requests in 5.00s, 13033649 responses
  Throughput: 2.61M req/s
  Bandwidth:  164.01MB/s
  Status codes: 2xx=13033649, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13033625 / 13033649 responses (100.0%)
  Reconnects: 1302981
  Per-template: 4344659,4344543,4344423
  Per-template-ok: 4344659,4344542,4344423
[info] CPU 5592.4% | Mem 254MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  10
  Templates: 3
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.42ms   1.41ms   1.75ms   1.99ms   2.35ms

  13012966 requests in 5.00s, 13012918 responses
  Throughput: 2.60M req/s
  Bandwidth:  163.73MB/s
  Status codes: 2xx=13012918, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13012868 / 13012918 responses (100.0%)
  Reconnects: 1300297
  Per-template: 4337580,4337583,4337705
  Per-template-ok: 4337580,4337583,4337705
[info] CPU 5714.3% | Mem 263MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  10
  Templates: 3
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.42ms   1.41ms   1.75ms   2.00ms   2.34ms

  13003199 requests in 5.00s, 13003427 responses
  Throughput: 2.60M req/s
  Bandwidth:  163.63MB/s
  Status codes: 2xx=13003427, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13003342 / 13003427 responses (100.0%)
  Reconnects: 1300254
  Per-template: 4334281,4334557,4334503
  Per-template-ok: 4334281,4334557,4334503
[info] CPU 5586.4% | Mem 272MiB

=== Best: 2606729 req/s (CPU: 5592.4%, Mem: 254MiB) ===
[info] input BW: 201.36MB/s (avg template: 81 bytes)
[info] saved results/limited-conn/4096/zeemo.json
httparena-bench-zeemo
httparena-bench-zeemo

==============================================
=== zeemo / json / 4096c (tool=gcannon) ===
==============================================
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  25
  Templates: 7
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    698us    387us   1.93ms   2.71ms   3.33ms

  11659650 requests in 5.00s, 11657637 responses
  Throughput: 2.33M req/s
  Bandwidth:  7.80GB/s
  Status codes: 2xx=11657637, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 11657558 / 11657637 responses (100.0%)
  Reconnects: 468767
  Per-template: 1659504,1662905,1666316,1670965,1669818,1666786,1661264
  Per-template-ok: 1659504,1662905,1666316,1670965,1669818,1666786,1661264
[info] CPU 5822.0% | Mem 249MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  25
  Templates: 7
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    681us    379us   1.85ms   2.70ms   3.32ms

  11762732 requests in 5.00s, 11760843 responses
  Throughput: 2.35M req/s
  Bandwidth:  7.87GB/s
  Status codes: 2xx=11760843, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 11760754 / 11760843 responses (100.0%)
  Reconnects: 473053
  Per-template: 1673811,1678485,1681386,1684827,1685289,1681351,1675605
  Per-template-ok: 1673811,1678485,1681386,1684827,1685289,1681351,1675605
[info] CPU 6385.8% | Mem 258MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  25
  Templates: 7
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    693us    386us   1.87ms   2.71ms   3.33ms

  11755019 requests in 5.00s, 11752939 responses
  Throughput: 2.35M req/s
  Bandwidth:  7.87GB/s
  Status codes: 2xx=11752939, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 11752892 / 11752939 responses (100.0%)
  Reconnects: 472690
  Per-template: 1672528,1675465,1680335,1683756,1684490,1681469,1674849
  Per-template-ok: 1672528,1675465,1680335,1683756,1684490,1681469,1674849
[info] CPU 5891.4% | Mem 262MiB

=== Best: 2352168 req/s (CPU: 6385.8%, Mem: 258MiB) ===
[info] input BW: 112.16MB/s (avg template: 50 bytes)
[info] saved results/json/4096/zeemo.json
httparena-bench-zeemo
httparena-bench-zeemo
[info] skip: zeemo does not subscribe to json-comp
[info] skip: zeemo does not subscribe to json-tls
[info] skip: zeemo does not subscribe to upload
[info] skip: zeemo does not subscribe to api-4
[info] skip: zeemo does not subscribe to api-16
[info] skip: zeemo does not subscribe to static
[info] skip: zeemo does not subscribe to async-db
[info] skip: zeemo does not subscribe to crud
[info] skip: zeemo does not subscribe to fortunes
[info] skip: zeemo does not subscribe to baseline-h2
[info] skip: zeemo does not subscribe to static-h2
[info] skip: zeemo does not subscribe to baseline-h2c
[info] skip: zeemo does not subscribe to json-h2c
[info] skip: zeemo does not subscribe to baseline-h3
[info] skip: zeemo does not subscribe to static-h3
[info] skip: zeemo does not subscribe to gateway-64
[info] skip: zeemo does not subscribe to gateway-h3
[info] skip: zeemo does not subscribe to production-stack
[info] skip: zeemo does not subscribe to unary-grpc
[info] skip: zeemo does not subscribe to unary-grpc-tls
[info] skip: zeemo does not subscribe to stream-grpc
[info] skip: zeemo does not subscribe to stream-grpc-tls
[info] skip: zeemo does not subscribe to echo-ws
[info] skip: zeemo does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
[info] restoring loopback MTU to 65536
[info] restoring CPU governor → powersave

Two memory-bonus changes bundled: 1. **Parser internals trimmed.** parser.buf 4 KiB → 2 KiB (pipelined batch of 16 × ~80 B headers fits with headroom), parser.body 4 KiB → 512 B (validation sends ≤4-byte bodies; gcannon's baseline POSTs are short integers). Slot drops from ~12 KiB to ~6.6 KiB. No RPS impact expected — buffers are still page-aligned, just narrower. 2. **Static [128]Slot array → fd-indexed dynamic `*Slot`.** Each accept mmaps a fresh Slot via `std.heap.page_allocator`; close munmaps it, returning pages to the kernel. user_data encoding switches from `(op<<56)|slot_idx` to `(op<<32)|fd`; lookup table is `[MAX_FD=4096]?*Slot` BSS, sparsely touched. Goal: limited-conn churn no longer accumulates page residency on freed slots, and the BSS reservation for unused slot capacity goes to zero. Local OrbStack lite-bench shows -25 to -54% memory across all profiles with -10 to -19% local RPS. Past PRs (#727, #729) showed local RPS gains of +13-17% translating to +0-1% on the real Threadripper bench, so the local RPS regression here is expected to mostly evaporate on bare metal. Worth a preview `/benchmark` to confirm before `--save`. All 20 local validation checks pass.

skylightis666 requested review from Kaliumhexacyanoferrat and MDA2AV as code owners May 18, 2026 12:44

skylightis666 force-pushed the zeemo-pipelined-shortlived branch from 932c7b0 to f816b60 Compare May 18, 2026 14:12

Benchmark results: zeemo

12300a1

MDA2AV approved these changes May 18, 2026

View reviewed changes

MDA2AV merged commit ba739a4 into MDA2AV:main May 18, 2026

This was referenced May 18, 2026

zeemo: split slot buffer (4 KiB inline + on-demand 16 KiB big_buf) #729

Merged

zeemo: dynamic fd-indexed slot allocation + parser shrink #736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zeemo: add pipelined + limited-conn, batch pipelined responses, shrink RSS#727

zeemo: add pipelined + limited-conn, batch pipelined responses, shrink RSS#727
MDA2AV merged 2 commits into
MDA2AV:mainfrom
skylightis666:zeemo-pipelined-shortlived

skylightis666 commented May 18, 2026

Uh oh!

MDA2AV commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

skylightis666 commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

skylightis666 commented May 18, 2026

Description

Uh oh!

MDA2AV commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

skylightis666 commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants