Add ONNX Runtime backend with multiple execution providers by ChinChangYang · Pull Request #1164 · lightvector/KataGo

ChinChangYang · 2026-03-04T13:34:02Z

Summary

Add ONNX Runtime as a new neural network backend (-DUSE_BACKEND=ONNX)
Support loading both standard .bin.gz model files (builds ONNX graph at runtime) and raw .onnx model files
Support multiple execution providers selectable at runtime via config:
- CPU (default, cross-platform)
- CoreML (macOS, Apple Silicon hardware acceleration)
- CUDA (NVIDIA GPUs)
- TensorRT (NVIDIA GPUs, optimized)
Include -export-onnx flag in export_model_pytorch.py for PyTorch-to-ONNX model export

Changes

Core implementation:

cpp/neuralnet/onnxbackend.cpp — Backend implementation (session management, input/output processing, batching, provider selection)
cpp/neuralnet/onnxmodelbuilder.cpp / .h — Builds ONNX graph from KataGo ModelDesc at runtime
cpp/program/setup.cpp — Wire up ONNX config keys and backend selection
cpp/CMakeLists.txt — Build config with USE_BACKEND=ONNX, ONNX Runtime linking
cpp/main.cpp, cpp/dataio/loadmodel.cpp — Minor integration hooks

Model export:

python/export_model_pytorch.py — Add -export-onnx flag with dynamic axes and opset 17

Documentation & config:

README.md, Compiling.md, cpp/README.md — ONNX backend docs
cpp/configs/gtp_example.cfg — Config keys (onnxProvider, node name overrides, onnxModelVersion)
LICENSE — ONNX Runtime / ONNX / protobuf attribution

Testing:

cpp/runonnxtests.sh — Integration test script (tiny model, GPU error, eval canary tests)

Design highlights

Dual model loading: .bin.gz files build an in-memory ONNX graph from weights; .onnx files load directly with auto-detection of input/output nodes and model version
Runtime provider selection: onnxProvider = cpu | coreml | cuda | tensorrt in config, no recompilation needed (requires ONNX Runtime built with corresponding provider support)
Cross-platform GPU support: CoreML on macOS, CUDA/TensorRT on Windows/Linux
Follows existing patterns: Single-threaded inference per handle (like Eigen backend), standard symmetry handling, same batching interface
Prerequisite: ONNX Runtime must be built from source; build paths configured via -DONNXRUNTIME_ROOT and -DONNXRUNTIME_BUILD_DIR

Test plan

Build with -DUSE_BACKEND=ONNX on macOS
Run ./katago runtests and ./katago testgpuerror with a model
Run cpp/runonnxtests.sh integration tests
Verify python/export_model_pytorch.py -export-onnx produces valid .onnx files

🤖 Generated with Claude Code

Add a generic ONNX backend (-DUSE_BACKEND=ONNX) that loads standard .bin.gz model files and builds ONNX graphs dynamically via OnnxModelBuilder. The execution provider is selected at runtime via the onnxProvider config key: - "cpu" (default) — CPU execution provider, works everywhere - "coreml" — CoreML execution provider (macOS only, Apple Silicon) New files: - neuralnet/onnxbackend.cpp — ONNX Runtime backend implementation - neuralnet/onnxmodelbuilder.{h,cpp} — Builds ONNX protobuf graph from ModelDesc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add metadata input handling (input_meta tensor, buffer, copy loop) and build the 7-layer metadata encoder MLP in the ONNX graph so that models with metaEncoderVersion > 0 (e.g. b18c384nbt-humanv0) work correctly. Also fix version-dependent miscvalue parsing and remove unused out_moremiscvalue output. Verified via cross-backend testgpuerror against Eigen reference on 669 positions with max winrate error ~0.0005%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace Gemm with MatMul in model builder, eliminating weight transpose and dead setAttrFloat helper - Add null guard around MiscValue parsing in onnxbackend.cpp - Hoist numScoreValueChannels above version-check cascade to remove four redundant declarations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add direct loading of pre-built .onnx files alongside existing .bin.gz support. On load, a temporary ONNX Runtime session introspects input/output tensor shapes to populate ModelDesc metadata, and auto-detects model version from channel counts. The raw ONNX bytes are passed directly to the session (skipping OnnxModelBuilder). Also adds configurable input/output node names via config keys (onnxInputSpatial, onnxOutputPolicy, etc.) for compatibility with models exported by different tools, and relaxes output channel assertions from == to >= so models with extra output channels work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a guard for tellg() returning -1, which would cause a dangerously large allocation when cast to size_t. Replace the string copy of raw ONNX model bytes (~100MB+) with a pointer-based approach that references the existing data directly, avoiding an unnecessary allocation and copy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace fragile static thread_local nameCounter in onnxmodelbuilder.cpp with a local int passed by reference through all helper functions. Guard the openclTunerFile config read in setup.cpp with #else so it only compiles for non-ONNX backends, removing dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Make addNode return NodeProto* and pass it directly to setAttrInt/ setAttrInts, removing the fragile "last node in graph" lookup pattern - Use name-based input introspection for raw .onnx files (matching "spatial"/"global"/"meta"), with shape-based fallback for non-KataGo models, consistent with how output introspection already works - Default to small batch size (2) for ONNX CPU provider, matching the Eigen backend behavior since CPU doesn't benefit from large batches Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Capture the return value of addBiasNode for v3Bias directly instead of re-fetching it via graph->node(graph->node_size() - 1).output(0), which would silently break if any node were inserted in between. This matches the pattern already used by the adjacent sv3Biased variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…kend Use singleInputMetaElts for meta buffer sizing to match the pattern used by spatial and global buffers. Hoist onnxProvider config read so it is read once and reused for both backendExtraParam and nnMaxBatchSize logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add an OnnxExportWrapper that selects the correct policy channels by model version and combines miscvalue/moremiscvalue into the 6-channel format expected by the C++ ONNX backend. When -export-onnx is passed, torch.onnx.export produces a .onnx file alongside the .bin file with input/output names matching the C++ backend defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…kend Replace silent zero-element tensor creation for unknown ONNX inputs with an error, preventing masked model-mismatch bugs on CoreML and other providers. Derive dummy input channel counts in the Python ONNX export from model config instead of hardcoding 22/19/192. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add dynamic spatial axes (height, width) to ONNX export so models support board sizes other than 19x19. Error on missing output node names in the C++ backend instead of silently proceeding with garbage values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tighten the ownership channel assertion from >= 1 to == 1 to match all other backends (Eigen, CUDA, OpenCL), since the offset calculation assumes exactly 1 channel. Add runonnxtests.sh with three test levels: runtinynntests, testgpuerror, and runnnevalcanarytests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix 6 missing commas in GENERIC_MODEL_NAMES that caused implicit string concatenation, silently merging pairs of model name entries into single mangled strings. Add CMake IS_DIRECTORY validation for ONNX Runtime paths, and expand documentation comments for model version detection heuristics, policy channel indices, and mask derivation invariants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Compiling.md: add ONNX Runtime Backend section with build prerequisites, CMake commands, raw .onnx usage, and provider config - README.md: add ONNX to backend comparison section with summary and detailed description - cpp/README.md: add onnxbackend.cpp and onnxmodelbuilder to source code listing - gtp_example.cfg: document all ONNX config keys (onnxProvider, node name overrides, onnxModelVersion) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add ONNX Runtime (MIT), ONNX (MIT), and Protobuf (BSD 3-Clause) attribution to LICENSE for dynamically linked dependencies - Add warning for unrecognized input tensor shapes during raw ONNX model introspection - Add null-pointer assertions on GetTensorData() return values after session->Run() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enable Windows/Linux users to select CUDA or TensorRT execution providers via onnxProvider config key, in addition to existing CPU and CoreML providers. Update documentation and config comments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChinChangYang and others added 18 commits February 28, 2026 20:30

Remove CLAUDE.md

d0e1e0b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ONNX Runtime backend with multiple execution providers#1164

Add ONNX Runtime backend with multiple execution providers#1164
ChinChangYang wants to merge 18 commits intolightvector:masterfrom
ChinChangYang:onnx-backend

ChinChangYang commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChinChangYang commented Mar 4, 2026

Summary

Changes

Design highlights

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant