Skip to content

zeemo: add pipelined + limited-conn, batch pipelined responses, shrink RSS#727

Merged
MDA2AV merged 2 commits into
MDA2AV:mainfrom
skylightis666:zeemo-pipelined-shortlived
May 18, 2026
Merged

zeemo: add pipelined + limited-conn, batch pipelined responses, shrink RSS#727
MDA2AV merged 2 commits into
MDA2AV:mainfrom
skylightis666:zeemo-pipelined-shortlived

Conversation

@skylightis666
Copy link
Copy Markdown
Contributor

Description

Follow-up to #723. Three changes:

  1. Subscribe to pipelined and limited-conn profiles. New /pipeline endpoint returning ok. limited-conn reuses /baseline11 since it just exercises connection churn with the same payload.

  2. Fix HTTP/1.1 pipelining in the io_uring loop. The previous code dispatched the first request out of a recv, sent the response, then submitted a fresh recv() even when the next 15 requests were already buffered — client waited for responses while we waited for bytes that had already arrived. New drainAndSend loops on feed(0) after each dispatch, accumulates responses in the per-connection write buffer, and emits a single batched send() for the whole burst.

  3. MAX_CONN 1024 → 128 per worker. SO_REUSEPORT spreads 4,096 bench connections across N workers (4-tuple hash); with 64 workers the per-worker mean is ~64 and σ ≈ 8, so 128 leaves comfortable headroom while shrinking BSS roughly 8×. The composite score's memory bonus uses sqrt(rps)/memMB, so flat throughput plus lower RSS bumps the score — on the previous run we sat at 187 MiB for baseline-4096; aiming for ~30–50 MiB now.

Supporting change in jsonHandler: emit a fixed-length header prefix with Content-Length zero-padded to 5 digits so every response starts at out[0]. Lets the drain loop concatenate responses without a memmove and keeps partial-send recovery correct.

All 20 local validation checks pass (17 original + GET /pipeline + Content-Type + 8-request pipelined batch served from one recv).

PR Commands — comment to trigger (requires collaborator approval):

Command Description
/benchmark -f zeemo Preview run, results posted as comment
/benchmark -f zeemo --save Run and save results (updates leaderboard on merge)

Source: https://github.com/skylightis666/zeemo

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented May 18, 2026

/benchmark -f zeemo --save

@github-actions
Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ /benchmark --save cannot start: main has diverged and cannot be auto-merged into this branch. Please merge or rebase main manually, push, and re-run /benchmark --save.

- New /pipeline endpoint returning "ok"
- io_uring loop now drains all complete requests buffered from one recv()
  into a single batched send(), instead of recv→dispatch→send→recv per
  request. The previous code stalled on HTTP/1.1 pipelining: after a
  response, it submitted a fresh recv() while the next 15 requests were
  already sitting in the parser buffer, so the client waited for
  responses while the server waited for bytes that had already arrived.
- jsonHandler now emits a fixed-length header prefix with Content-Length
  zero-padded to 5 digits, so every response starts at out[0]. This lets
  the drain loop concatenate multiple responses in the write buffer
  without a memmove and keeps partial-send recovery correct.
- MAX_CONN: 1024 → 128 per worker. SO_REUSEPORT spreads 4096 connections
  across N workers; with 64 workers the per-worker mean is ~64 and 128
  leaves comfortable headroom while shrinking BSS ~8×. The composite
  score uses sqrt(rps)/memMB, so flat throughput plus lower memory
  raises the score.
- meta.json: subscribe to pipelined and limited-conn.

All 20 local validation checks pass (4 new pipelining cases).
@skylightis666 skylightis666 force-pushed the zeemo-pipelined-shortlived branch from 932c7b0 to f816b60 Compare May 18, 2026 14:12
@skylightis666
Copy link
Copy Markdown
Contributor Author

/benchmark -f zeemo --save

@github-actions
Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: zeemo | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
baseline 512 4,107,322 6336.8% 75MiB +0.3% ~0%
baseline 4096 4,428,334 6404.3% 184MiB +0.1% +1.1%
pipelined 512 48,476,704 6552.9% 74MiB NEW NEW
pipelined 4096 50,013,615 6425.4% 173MiB NEW NEW
limited-conn 512 2,628,568 5497.0% 114MiB NEW NEW
limited-conn 4096 2,606,729 5592.4% 254MiB NEW NEW
json 4096 2,352,168 6385.8% 258MiB -1.6% -10.7%
Full log
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  10
  Templates: 3
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.42ms   1.41ms   1.75ms   1.98ms   2.27ms

  13034266 requests in 5.00s, 13033649 responses
  Throughput: 2.61M req/s
  Bandwidth:  164.01MB/s
  Status codes: 2xx=13033649, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13033625 / 13033649 responses (100.0%)
  Reconnects: 1302981
  Per-template: 4344659,4344543,4344423
  Per-template-ok: 4344659,4344542,4344423
[info] CPU 5592.4% | Mem 254MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  10
  Templates: 3
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.42ms   1.41ms   1.75ms   1.99ms   2.35ms

  13012966 requests in 5.00s, 13012918 responses
  Throughput: 2.60M req/s
  Bandwidth:  163.73MB/s
  Status codes: 2xx=13012918, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13012868 / 13012918 responses (100.0%)
  Reconnects: 1300297
  Per-template: 4337580,4337583,4337705
  Per-template-ok: 4337580,4337583,4337705
[info] CPU 5714.3% | Mem 263MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  10
  Templates: 3
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   1.42ms   1.41ms   1.75ms   2.00ms   2.34ms

  13003199 requests in 5.00s, 13003427 responses
  Throughput: 2.60M req/s
  Bandwidth:  163.63MB/s
  Status codes: 2xx=13003427, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13003342 / 13003427 responses (100.0%)
  Reconnects: 1300254
  Per-template: 4334281,4334557,4334503
  Per-template-ok: 4334281,4334557,4334503
[info] CPU 5586.4% | Mem 272MiB

=== Best: 2606729 req/s (CPU: 5592.4%, Mem: 254MiB) ===
[info] input BW: 201.36MB/s (avg template: 81 bytes)
[info] saved results/limited-conn/4096/zeemo.json
httparena-bench-zeemo
httparena-bench-zeemo

==============================================
=== zeemo / json / 4096c (tool=gcannon) ===
==============================================
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  25
  Templates: 7
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    698us    387us   1.93ms   2.71ms   3.33ms

  11659650 requests in 5.00s, 11657637 responses
  Throughput: 2.33M req/s
  Bandwidth:  7.80GB/s
  Status codes: 2xx=11657637, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 11657558 / 11657637 responses (100.0%)
  Reconnects: 468767
  Per-template: 1659504,1662905,1666316,1670965,1669818,1666786,1661264
  Per-template-ok: 1659504,1662905,1666316,1670965,1669818,1666786,1661264
[info] CPU 5822.0% | Mem 249MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  25
  Templates: 7
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    681us    379us   1.85ms   2.70ms   3.32ms

  11762732 requests in 5.00s, 11760843 responses
  Throughput: 2.35M req/s
  Bandwidth:  7.87GB/s
  Status codes: 2xx=11760843, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 11760754 / 11760843 responses (100.0%)
  Reconnects: 473053
  Per-template: 1673811,1678485,1681386,1684827,1685289,1681351,1675605
  Per-template-ok: 1673811,1678485,1681386,1684827,1685289,1681351,1675605
[info] CPU 6385.8% | Mem 258MiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  25
  Templates: 7
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    693us    386us   1.87ms   2.71ms   3.33ms

  11755019 requests in 5.00s, 11752939 responses
  Throughput: 2.35M req/s
  Bandwidth:  7.87GB/s
  Status codes: 2xx=11752939, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 11752892 / 11752939 responses (100.0%)
  Reconnects: 472690
  Per-template: 1672528,1675465,1680335,1683756,1684490,1681469,1674849
  Per-template-ok: 1672528,1675465,1680335,1683756,1684490,1681469,1674849
[info] CPU 5891.4% | Mem 262MiB

=== Best: 2352168 req/s (CPU: 6385.8%, Mem: 258MiB) ===
[info] input BW: 112.16MB/s (avg template: 50 bytes)
[info] saved results/json/4096/zeemo.json
httparena-bench-zeemo
httparena-bench-zeemo
[info] skip: zeemo does not subscribe to json-comp
[info] skip: zeemo does not subscribe to json-tls
[info] skip: zeemo does not subscribe to upload
[info] skip: zeemo does not subscribe to api-4
[info] skip: zeemo does not subscribe to api-16
[info] skip: zeemo does not subscribe to static
[info] skip: zeemo does not subscribe to async-db
[info] skip: zeemo does not subscribe to crud
[info] skip: zeemo does not subscribe to fortunes
[info] skip: zeemo does not subscribe to baseline-h2
[info] skip: zeemo does not subscribe to static-h2
[info] skip: zeemo does not subscribe to baseline-h2c
[info] skip: zeemo does not subscribe to json-h2c
[info] skip: zeemo does not subscribe to baseline-h3
[info] skip: zeemo does not subscribe to static-h3
[info] skip: zeemo does not subscribe to gateway-64
[info] skip: zeemo does not subscribe to gateway-h3
[info] skip: zeemo does not subscribe to production-stack
[info] skip: zeemo does not subscribe to unary-grpc
[info] skip: zeemo does not subscribe to unary-grpc-tls
[info] skip: zeemo does not subscribe to stream-grpc
[info] skip: zeemo does not subscribe to stream-grpc-tls
[info] skip: zeemo does not subscribe to echo-ws
[info] skip: zeemo does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/baseline-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/pipelined-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
[info] restoring loopback MTU to 65536
[info] restoring CPU governor → powersave

@MDA2AV MDA2AV merged commit ba739a4 into MDA2AV:main May 18, 2026
MDA2AV pushed a commit that referenced this pull request May 18, 2026
Two memory-bonus changes bundled:

1. **Parser internals trimmed.** parser.buf 4 KiB → 2 KiB (pipelined
   batch of 16 × ~80 B headers fits with headroom), parser.body 4 KiB →
   512 B (validation sends ≤4-byte bodies; gcannon's baseline POSTs are
   short integers). Slot drops from ~12 KiB to ~6.6 KiB. No RPS impact
   expected — buffers are still page-aligned, just narrower.

2. **Static [128]Slot array → fd-indexed dynamic `*Slot`.** Each accept
   mmaps a fresh Slot via `std.heap.page_allocator`; close munmaps it,
   returning pages to the kernel. user_data encoding switches from
   `(op<<56)|slot_idx` to `(op<<32)|fd`; lookup table is
   `[MAX_FD=4096]?*Slot` BSS, sparsely touched.

   Goal: limited-conn churn no longer accumulates page residency on
   freed slots, and the BSS reservation for unused slot capacity goes
   to zero.

Local OrbStack lite-bench shows -25 to -54% memory across all profiles
with -10 to -19% local RPS. Past PRs (#727, #729) showed local RPS
gains of +13-17% translating to +0-1% on the real Threadripper bench,
so the local RPS regression here is expected to mostly evaporate on
bare metal. Worth a preview `/benchmark` to confirm before `--save`.

All 20 local validation checks pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants