Add ONNX Runtime backend with multiple execution providers#1164
Draft
ChinChangYang wants to merge 18 commits intolightvector:masterfrom
Draft
Add ONNX Runtime backend with multiple execution providers#1164ChinChangYang wants to merge 18 commits intolightvector:masterfrom
ChinChangYang wants to merge 18 commits intolightvector:masterfrom
Conversation
Add a generic ONNX backend (-DUSE_BACKEND=ONNX) that loads standard .bin.gz
model files and builds ONNX graphs dynamically via OnnxModelBuilder. The
execution provider is selected at runtime via the onnxProvider config key:
- "cpu" (default) — CPU execution provider, works everywhere
- "coreml" — CoreML execution provider (macOS only, Apple Silicon)
New files:
- neuralnet/onnxbackend.cpp — ONNX Runtime backend implementation
- neuralnet/onnxmodelbuilder.{h,cpp} — Builds ONNX protobuf graph from ModelDesc
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add metadata input handling (input_meta tensor, buffer, copy loop) and build the 7-layer metadata encoder MLP in the ONNX graph so that models with metaEncoderVersion > 0 (e.g. b18c384nbt-humanv0) work correctly. Also fix version-dependent miscvalue parsing and remove unused out_moremiscvalue output. Verified via cross-backend testgpuerror against Eigen reference on 669 positions with max winrate error ~0.0005%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace Gemm with MatMul in model builder, eliminating weight transpose and dead setAttrFloat helper - Add null guard around MiscValue parsing in onnxbackend.cpp - Hoist numScoreValueChannels above version-check cascade to remove four redundant declarations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add direct loading of pre-built .onnx files alongside existing .bin.gz support. On load, a temporary ONNX Runtime session introspects input/output tensor shapes to populate ModelDesc metadata, and auto-detects model version from channel counts. The raw ONNX bytes are passed directly to the session (skipping OnnxModelBuilder). Also adds configurable input/output node names via config keys (onnxInputSpatial, onnxOutputPolicy, etc.) for compatibility with models exported by different tools, and relaxes output channel assertions from == to >= so models with extra output channels work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a guard for tellg() returning -1, which would cause a dangerously large allocation when cast to size_t. Replace the string copy of raw ONNX model bytes (~100MB+) with a pointer-based approach that references the existing data directly, avoiding an unnecessary allocation and copy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace fragile static thread_local nameCounter in onnxmodelbuilder.cpp with a local int passed by reference through all helper functions. Guard the openclTunerFile config read in setup.cpp with #else so it only compiles for non-ONNX backends, removing dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make addNode return NodeProto* and pass it directly to setAttrInt/ setAttrInts, removing the fragile "last node in graph" lookup pattern - Use name-based input introspection for raw .onnx files (matching "spatial"/"global"/"meta"), with shape-based fallback for non-KataGo models, consistent with how output introspection already works - Default to small batch size (2) for ONNX CPU provider, matching the Eigen backend behavior since CPU doesn't benefit from large batches Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Capture the return value of addBiasNode for v3Bias directly instead of re-fetching it via graph->node(graph->node_size() - 1).output(0), which would silently break if any node were inserted in between. This matches the pattern already used by the adjacent sv3Biased variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kend Use singleInputMetaElts for meta buffer sizing to match the pattern used by spatial and global buffers. Hoist onnxProvider config read so it is read once and reused for both backendExtraParam and nnMaxBatchSize logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add an OnnxExportWrapper that selects the correct policy channels by model version and combines miscvalue/moremiscvalue into the 6-channel format expected by the C++ ONNX backend. When -export-onnx is passed, torch.onnx.export produces a .onnx file alongside the .bin file with input/output names matching the C++ backend defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kend Replace silent zero-element tensor creation for unknown ONNX inputs with an error, preventing masked model-mismatch bugs on CoreML and other providers. Derive dummy input channel counts in the Python ONNX export from model config instead of hardcoding 22/19/192. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add dynamic spatial axes (height, width) to ONNX export so models support board sizes other than 19x19. Error on missing output node names in the C++ backend instead of silently proceeding with garbage values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tighten the ownership channel assertion from >= 1 to == 1 to match all other backends (Eigen, CUDA, OpenCL), since the offset calculation assumes exactly 1 channel. Add runonnxtests.sh with three test levels: runtinynntests, testgpuerror, and runnnevalcanarytests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 6 missing commas in GENERIC_MODEL_NAMES that caused implicit string concatenation, silently merging pairs of model name entries into single mangled strings. Add CMake IS_DIRECTORY validation for ONNX Runtime paths, and expand documentation comments for model version detection heuristics, policy channel indices, and mask derivation invariants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Compiling.md: add ONNX Runtime Backend section with build prerequisites, CMake commands, raw .onnx usage, and provider config - README.md: add ONNX to backend comparison section with summary and detailed description - cpp/README.md: add onnxbackend.cpp and onnxmodelbuilder to source code listing - gtp_example.cfg: document all ONNX config keys (onnxProvider, node name overrides, onnxModelVersion) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ONNX Runtime (MIT), ONNX (MIT), and Protobuf (BSD 3-Clause) attribution to LICENSE for dynamically linked dependencies - Add warning for unrecognized input tensor shapes during raw ONNX model introspection - Add null-pointer assertions on GetTensorData() return values after session->Run() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable Windows/Linux users to select CUDA or TensorRT execution providers via onnxProvider config key, in addition to existing CPU and CoreML providers. Update documentation and config comments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
-DUSE_BACKEND=ONNX).bin.gzmodel files (builds ONNX graph at runtime) and raw.onnxmodel files-export-onnxflag inexport_model_pytorch.pyfor PyTorch-to-ONNX model exportChanges
Core implementation:
cpp/neuralnet/onnxbackend.cpp— Backend implementation (session management, input/output processing, batching, provider selection)cpp/neuralnet/onnxmodelbuilder.cpp/.h— Builds ONNX graph from KataGo ModelDesc at runtimecpp/program/setup.cpp— Wire up ONNX config keys and backend selectioncpp/CMakeLists.txt— Build config withUSE_BACKEND=ONNX, ONNX Runtime linkingcpp/main.cpp,cpp/dataio/loadmodel.cpp— Minor integration hooksModel export:
python/export_model_pytorch.py— Add-export-onnxflag with dynamic axes and opset 17Documentation & config:
README.md,Compiling.md,cpp/README.md— ONNX backend docscpp/configs/gtp_example.cfg— Config keys (onnxProvider, node name overrides,onnxModelVersion)LICENSE— ONNX Runtime / ONNX / protobuf attributionTesting:
cpp/runonnxtests.sh— Integration test script (tiny model, GPU error, eval canary tests)Design highlights
.bin.gzfiles build an in-memory ONNX graph from weights;.onnxfiles load directly with auto-detection of input/output nodes and model versiononnxProvider = cpu | coreml | cuda | tensorrtin config, no recompilation needed (requires ONNX Runtime built with corresponding provider support)-DONNXRUNTIME_ROOTand-DONNXRUNTIME_BUILD_DIRTest plan
-DUSE_BACKEND=ONNXon macOS./katago runtestsand./katago testgpuerrorwith a modelcpp/runonnxtests.shintegration testspython/export_model_pytorch.py -export-onnxproduces valid .onnx files🤖 Generated with Claude Code