Skip to content

Conversation

@dduan
Copy link
Owner

@dduan dduan commented Feb 7, 2026

Each commit contains a micro optimization.

dduan added 14 commits February 6, 2026 10:41
Avoid triple-quote probing on non-quote tokens in Parser.nextToken.scanString.\n\nThe tokenizer now branches on the head code unit before running single-quote or double-quote string logic, keeping existing parse behavior while removing repeated quote checks for numeric/date/bare-token paths.\n\nAlso cache range.upperBound in a local for scan loops and trim date/value range checks to use that cached bound.
Reduce parseSelect path churn by storing only selector prefixes in tablePath.
fillTablePath now returns the terminal key/hash/token directly instead of appending
then removeLast. This removes one append/remove pair per selector while
preserving walkTablePath semantics and selector error handling.
Add CodeUnits.isBasicStringBodyChar and use it in nextToken's double-quoted
string scanner. This replaces repeated per-byte backslash/quote/newline
comparisons with one table probe in both the unrolled and scalar loops.
Hoist  out of the inner loops in matchKeyValue, matchKeyArray, and matchKeyTable.\n\nThis keeps lookup behavior unchanged while removing repeated count reads in hot matcher loops.
Replace long-key byte-by-byte FNV hashing with constant-time prefix/suffix sampling in fastKeyHash(bytes:range:) and fastKeyHash(_:). Keep <=8-byte packed hashing unchanged. Preserve exact-key equality checks so hash remains a prefilter only.
Change createKeyValue to accept the parsed value token and append KeyValuePair with its final value in one step. This removes temporary Token.empty staging and follow-up indexed mutation in parseKeyValue while preserving lookup and insertion behavior.
Parse the first selector segment directly in parseSelect and branch immediately on closing bracket vs dot. Single-segment selectors now skip tablePath bookkeeping and walkTablePath entirely, while dotted selectors still use the existing path fill/walk logic after seeding the first segment. This removes speculative token rollback from prior attempts and keeps key ownership behavior aligned with existing dotted-path code.
Keep the single-segment selector fast path and add a byte-token lookup path for  selectors when no key transform is active. ParseSelect now hashes bare tokens directly, probes array keys without building a String, and only materializes the key when creating the array entry. Dotted selectors and transformed/quoted keys keep existing logic.
Load character-class tables only in branches that use them and split dot-special bare-key scanning into a common bare-key fast path plus plus-sign fallback. This removes per-token setup and branch work in tokenizer scan loops.
Replace the double-quoted string 8-byte classifier loop with unaligned\nUInt64 chunk probes that detect quote, backslash, and LF bytes with\nbit masks.\n\nKeep the existing bytewise fallback loop and parsing behavior unchanged,\nso only the hot scanning path changes.
Record two hash bits per inserted key on each internal table and gate table/array/value lookups with fast negative checks before linear scans.\n\nThis keeps behavior unchanged while cutting repeated miss scans during parse-time key insertion and table path traversal.
Drop optional base-address branches from hot tokenizer/hash paths where parse-time buffers are guaranteed to have storage for non-empty ranges.

- remove the fallback byte-by-byte 8-byte pre-scan branch in double-quoted token scanning
- short-circuit zero-length hash range and use direct base-address hashing path for non-empty ranges
- fast-return empty-range string creation and use direct base-address decoding for non-empty ranges
Advance the first selector segment directly before tablePath fill/walk.

- Reuse the same path-segment advancement logic for first-segment setup and subsequent walk.
- Start walkTablePath from the already-resolved table/keyed state.
- Keep existing two-phase selector shape while dropping one tablePath append for the first segment.

Goal: reduce parseSelect work on table/array selector paths without one-pass ARC regressions.
Teach parseSelect to special-case dotted selectors with exactly two segments.
After walking the first segment, parse the second segment directly; if the selector closes, skip fillTablePath and walkTablePath.
Fallback for longer dotted selectors keeps the existing tablePath+walk behavior unchanged.
@dduan dduan enabled auto-merge (rebase) February 7, 2026 05:27
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

Comparing results between 'main' and 'pull_request'

Host 'runnervmwffz4' with 4 'x86_64' processors with 15 GB memory, running:
#18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025

TOMLDecoderBenchmarks

Decode toml.io example metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
main 64 65 65 66 69 87 148 7357
pull_request 70 71 71 71 75 88 128 7053
Δ 6 6 6 5 6 1 -20 -304
Improvement % -9 -9 -9 -8 -9 -1 14 -304

Retains: results within specified thresholds, fold down for details.

Retains * p0 p25 p50 p75 p90 p99 p100 Samples
main 604 604 605 605 605 605 605 7357
pull_request 860 860 860 861 861 861 861 7053
Δ 256 256 255 256 256 256 256 -304
Improvement % -42 -42 -42 -42 -42 -42 -42 -304

Parse toml.io example metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ns) * p0 p25 p50 p75 p90 p99 p100 Samples
main 5380 5483 5523 5571 5743 15535 112240 10000
pull_request 4789 4891 4931 4971 5031 6123 20137 10000
Δ -591 -592 -592 -600 -712 -9412 -92103 0
Improvement % 11 11 11 11 12 61 82 0

Retains: results within specified thresholds, fold down for details.

Retains * p0 p25 p50 p75 p90 p99 p100 Samples
main 13 14 14 14 14 14 14 10000
pull_request 11 12 12 12 12 12 12 10000
Δ -2 -2 -2 -2 -2 -2 -2 0
Improvement % 15 14 14 14 14 14 14 0

decode canada.toml metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ms) * p0 p25 p50 p75 p90 p99 p100 Samples
main 268 268 268 268 270 270 270 4
pull_request 263 263 263 263 264 264 264 4
Δ -5 -5 -5 -5 -6 -6 -6 0
Improvement % 2 2 2 2 2 2 2 0

Retains: results within specified thresholds, fold down for details.

Retains (K) * p0 p25 p50 p75 p90 p99 p100 Samples
main 1674 1674 1674 1674 1674 1674 1674 4
pull_request 1674 1674 1674 1674 1674 1674 1674 4
Δ 0 0 0 0 0 0 0 0
Improvement % 0 0 0 0 0 0 0 0

decode twitter.toml metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
main 8892 8913 8929 8954 8987 9224 9415 111
pull_request 9591 9634 9650 9675 9716 10232 10290 103
Δ 699 721 721 721 729 1008 875 -8
Improvement % -8 -8 -8 -8 -8 -11 -9 -8

Retains: results within specified thresholds, fold down for details.

Retains (K) * p0 p25 p50 p75 p90 p99 p100 Samples
main 68 68 68 68 68 68 68 111
pull_request 114 114 114 114 114 114 114 103
Δ 46 46 46 46 46 46 46 -8
Improvement % -68 -68 -68 -68 -68 -68 -68 -8

parse GitHub events archive metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ms) * p0 p25 p50 p75 p90 p99 p100 Samples
main 153 153 154 162 164 164 164 7
pull_request 132 133 135 135 144 144 144 8
Δ -21 -20 -19 -27 -20 -20 -20 1
Improvement % 14 13 12 17 12 12 12 1

Retains: results within specified thresholds, fold down for details.

Retains (K) * p0 p25 p50 p75 p90 p99 p100 Samples
main 150 150 150 150 150 150 150 7
pull_request 188 188 188 188 188 188 188 8
Δ 38 38 38 38 38 38 38 1
Improvement % -25 -25 -25 -25 -25 -25 -25 1

parse canada.toml metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ms) * p0 p25 p50 p75 p90 p99 p100 Samples
main 16 16 17 17 18 19 19 60
pull_request 15 16 16 16 16 18 18 63
Δ -1 0 -1 -1 -2 -1 -1 3
Improvement % 6 0 6 6 11 5 5 3

Retains: results within specified thresholds, fold down for details.

Retains * p0 p25 p50 p75 p90 p99 p100 Samples
main 133 134 134 134 134 134 134 60
pull_request 131 132 132 132 132 132 132 63
Δ -2 -2 -2 -2 -2 -2 -2 3
Improvement % 2 1 1 1 1 1 1 3

parse twitter.toml metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
main 2609 2632 2638 2648 2664 2947 3346 371
pull_request 2219 2234 2241 2247 2257 2337 2966 436
Δ -390 -398 -397 -401 -407 -610 -380 65
Improvement % 15 15 15 15 15 21 11 65

Retains: results within specified thresholds, fold down for details.

Retains * p0 p25 p50 p75 p90 p99 p100 Samples
main 3138 3139 3139 3139 3139 3139 3139 371
pull_request 3757 3758 3758 3758 3758 3758 3758 436
Δ 619 619 619 619 619 619 619 65
Improvement % -20 -20 -20 -20 -20 -20 -20 65

@dduan dduan merged commit e23d734 into main Feb 7, 2026
16 checks passed
@dduan dduan deleted the dd/optimize-a-bunch branch February 7, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant