Improve Model Deserialization Speed by josephbirkner · Pull Request #136 · Klebert-Engineering/simfil

josephbirkner · 2026-02-24T21:29:03Z

Summary

Move to memcpy-style deserialization for Model columns, which are arrays of primitive elements.
Allow std::vector<uint8_t> as column buffer type to skip segmented vector page allocations.
Introduce compact mode for ArrayArena.
Deserialize using vector<uint8_t> instead of stringstream, which has a seemingly slow emscripten impl.

Result: 10-20x speed improvement for model deserialization in erdblick. This greatly improves completion/search performance, and slices roughly half off the tile render time.

Sacrifice: Big endian compatibility.

Note

High Risk
Touches core model storage and the binary serialization format/paths (including ArrayArena), so regressions could corrupt persisted data or break compatibility if not carefully validated across versions.

Overview
Migrates core model storage to the new typed ModelColumn abstraction (new include/simfil/model/column.h) and updates ModelPool/node-related columns to serialize/deserialize via bitsery object payloads with schema/record-size checks.

Refactors ArrayArena to use ModelColumn storage and introduces a compact head representation for faster, tighter serialization; updates bitsery::ext::ArrayArenaExt accordingly and adds byte_size() stats plumbing.

Adds optional build-graph validation for MODEL_COLUMN_TYPE structs via new cmake/column_type_validator.py and reusable CMake helpers (target file discovery + linked-target support), wired behind SIMFIL_VALIDATE_MODEL_COLUMNS.

Completes the transition away from stream-based reads by changing ModelPool::read and StringPool::read to accept std::vector<uint8_t> (with offset) and updates serialization tests to use buffer-based inputs.

^{Written by Cursor Bugbot for commit d2f3936. This will update automatically on new commits. Configure here.}

sonarqubecloud · 2026-02-24T21:30:27Z

Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

github-actions · 2026-02-24T21:38:58Z

Package	Line Rate	Branch Rate	Health
include.simfil	24%	10%	❌
include.simfil.model	76%	50%	➖
src	74%	46%	➖
src.model	82%	46%	✔
Summary	43% (6886 / 15910)	26% (4353 / 16475)	❌

github-actions · 2026-02-24T21:46:02Z

Test Results

1 files ±0 1 suites ±0 6m 44s ⏱️ -3s
88 tests ±0 88 ✅ ±0 0 💤 ±0 0 ❌ ±0
93 runs ±0 93 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit d2f3936. ± Comparison against base commit 6976c70.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on March 22

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-02-24T21:46:46Z

include/simfil/model/nodes.h

 struct ObjectField
 {
+    MODEL_COLUMN_TYPE(8);
+


Uninitialized padding bytes in ObjectField wire serialization

Medium Severity

ObjectField has StringId name_ (2 bytes) followed by ModelNodeAddress node_ (4-byte aligned), creating 2 bytes of struct padding at offset 2. The migration from per-field bitsery serialization (which wrote exactly 6 bytes via serialize()) to bulk memcpy via ModelColumn now includes these uninitialized padding bytes in the wire output. Since ObjectField instances are placement-constructed via emplace_back(members_, fieldId, addr), the padding is never zeroed, making the serialized representation non-deterministic for the same logical data.

Additional Locations (1)

include/simfil/model/bitsery-traits.h#L59-L66

cursor · 2026-02-24T21:46:46Z

include/simfil/model/model.h


 #include <memory>
 #include <string_view>
+#include <vector>


Duplicate vector include in model header

Low Severity

#include <vector> appears on both line 13 (newly added) and line 15 (pre-existing), creating a redundant include.

josephbirkner added 9 commits February 18, 2026 15:41

Migrate storage containers to noserde::Buffer

019f2e2

Point noserde CPM dependency to josephbirkner fork

fecb1ba

Remove stale StringRange bitsery serializer

8989316

Use vector<uint8_t> instead of stringstream.

d667eba

Enable fast serialization for ArrayArena.

5ed51b5

Introduce compactHeads_ for arrays.

3922313

model: add ModelColumn and tagged type validation

ae9f4ea

model: Finish code orga for ModelColumn infrastructure.

d337c2e

test: migrate complex serialization reads to vector input

d2f3936

josephbirkner changed the title ~~ModelColumn migration with automatic column-type validation~~ Improve Model Deserialization Speed Feb 24, 2026

josephbirkner requested a review from johannes-wolf February 24, 2026 21:34

cursor bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Model Deserialization Speed#136

Improve Model Deserialization Speed#136
josephbirkner wants to merge 9 commits intov0.6.3from
noserde

josephbirkner commented Feb 24, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

josephbirkner commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

sonarqubecloud bot commented Feb 24, 2026

Quality Gate failed

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Test Results

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This is the final PR Bugbot will review for you during this billing cycle

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Uninitialized padding bytes in ObjectField wire serialization

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Duplicate vector include in model header

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

josephbirkner commented Feb 24, 2026 •

edited

Loading