Skip to content

Improve Model Deserialization Speed#136

Open
josephbirkner wants to merge 9 commits intov0.6.3from
noserde
Open

Improve Model Deserialization Speed#136
josephbirkner wants to merge 9 commits intov0.6.3from
noserde

Conversation

@josephbirkner
Copy link
Collaborator

@josephbirkner josephbirkner commented Feb 24, 2026

Summary

  1. Move to memcpy-style deserialization for Model columns, which are arrays of primitive elements.
  2. Allow std::vector<uint8_t> as column buffer type to skip segmented vector page allocations.
  3. Introduce compact mode for ArrayArena.
  4. Deserialize using vector<uint8_t> instead of stringstream, which has a seemingly slow emscripten impl.

Result: 10-20x speed improvement for model deserialization in erdblick. This greatly improves completion/search performance, and slices roughly half off the tile render time.

Sacrifice: Big endian compatibility.


Note

High Risk
Touches core model storage and the binary serialization format/paths (including ArrayArena), so regressions could corrupt persisted data or break compatibility if not carefully validated across versions.

Overview
Migrates core model storage to the new typed ModelColumn abstraction (new include/simfil/model/column.h) and updates ModelPool/node-related columns to serialize/deserialize via bitsery object payloads with schema/record-size checks.

Refactors ArrayArena to use ModelColumn storage and introduces a compact head representation for faster, tighter serialization; updates bitsery::ext::ArrayArenaExt accordingly and adds byte_size() stats plumbing.

Adds optional build-graph validation for MODEL_COLUMN_TYPE structs via new cmake/column_type_validator.py and reusable CMake helpers (target file discovery + linked-target support), wired behind SIMFIL_VALIDATE_MODEL_COLUMNS.

Completes the transition away from stream-based reads by changing ModelPool::read and StringPool::read to accept std::vector<uint8_t> (with offset) and updates serialization tests to use buffer-based inputs.

Written by Cursor Bugbot for commit d2f3936. This will update automatically on new commits. Configure here.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@josephbirkner josephbirkner changed the title ModelColumn migration with automatic column-type validation Improve Model Deserialization Speed Feb 24, 2026
@github-actions
Copy link

Package Line Rate Branch Rate Health
include.simfil 24% 10%
include.simfil.model 76% 50%
src 74% 46%
src.model 82% 46%
Summary 43% (6886 / 15910) 26% (4353 / 16475)

@github-actions
Copy link

Test Results

 1 files  ±0   1 suites  ±0   6m 44s ⏱️ -3s
88 tests ±0  88 ✅ ±0  0 💤 ±0  0 ❌ ±0 
93 runs  ±0  93 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit d2f3936. ± Comparison against base commit 6976c70.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on March 22

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

struct ObjectField
{
MODEL_COLUMN_TYPE(8);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uninitialized padding bytes in ObjectField wire serialization

Medium Severity

ObjectField has StringId name_ (2 bytes) followed by ModelNodeAddress node_ (4-byte aligned), creating 2 bytes of struct padding at offset 2. The migration from per-field bitsery serialization (which wrote exactly 6 bytes via serialize()) to bulk memcpy via ModelColumn now includes these uninitialized padding bytes in the wire output. Since ObjectField instances are placement-constructed via emplace_back(members_, fieldId, addr), the padding is never zeroed, making the serialized representation non-deterministic for the same logical data.

Additional Locations (1)

Fix in Cursor Fix in Web


#include <memory>
#include <string_view>
#include <vector>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate vector include in model header

Low Severity

#include <vector> appears on both line 13 (newly added) and line 15 (pre-existing), creating a redundant include.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant