Skip to content

Add ONNX Runtime backend with multiple execution providers#1164

Draft
ChinChangYang wants to merge 18 commits intolightvector:masterfrom
ChinChangYang:onnx-backend
Draft

Add ONNX Runtime backend with multiple execution providers#1164
ChinChangYang wants to merge 18 commits intolightvector:masterfrom
ChinChangYang:onnx-backend

Conversation

@ChinChangYang
Copy link
Contributor

Summary

  • Add ONNX Runtime as a new neural network backend (-DUSE_BACKEND=ONNX)
  • Support loading both standard .bin.gz model files (builds ONNX graph at runtime) and raw .onnx model files
  • Support multiple execution providers selectable at runtime via config:
    • CPU (default, cross-platform)
    • CoreML (macOS, Apple Silicon hardware acceleration)
    • CUDA (NVIDIA GPUs)
    • TensorRT (NVIDIA GPUs, optimized)
  • Include -export-onnx flag in export_model_pytorch.py for PyTorch-to-ONNX model export

Changes

Core implementation:

  • cpp/neuralnet/onnxbackend.cpp — Backend implementation (session management, input/output processing, batching, provider selection)
  • cpp/neuralnet/onnxmodelbuilder.cpp / .h — Builds ONNX graph from KataGo ModelDesc at runtime
  • cpp/program/setup.cpp — Wire up ONNX config keys and backend selection
  • cpp/CMakeLists.txt — Build config with USE_BACKEND=ONNX, ONNX Runtime linking
  • cpp/main.cpp, cpp/dataio/loadmodel.cpp — Minor integration hooks

Model export:

  • python/export_model_pytorch.py — Add -export-onnx flag with dynamic axes and opset 17

Documentation & config:

  • README.md, Compiling.md, cpp/README.md — ONNX backend docs
  • cpp/configs/gtp_example.cfg — Config keys (onnxProvider, node name overrides, onnxModelVersion)
  • LICENSE — ONNX Runtime / ONNX / protobuf attribution

Testing:

  • cpp/runonnxtests.sh — Integration test script (tiny model, GPU error, eval canary tests)

Design highlights

  • Dual model loading: .bin.gz files build an in-memory ONNX graph from weights; .onnx files load directly with auto-detection of input/output nodes and model version
  • Runtime provider selection: onnxProvider = cpu | coreml | cuda | tensorrt in config, no recompilation needed (requires ONNX Runtime built with corresponding provider support)
  • Cross-platform GPU support: CoreML on macOS, CUDA/TensorRT on Windows/Linux
  • Follows existing patterns: Single-threaded inference per handle (like Eigen backend), standard symmetry handling, same batching interface
  • Prerequisite: ONNX Runtime must be built from source; build paths configured via -DONNXRUNTIME_ROOT and -DONNXRUNTIME_BUILD_DIR

Test plan

  • Build with -DUSE_BACKEND=ONNX on macOS
  • Run ./katago runtests and ./katago testgpuerror with a model
  • Run cpp/runonnxtests.sh integration tests
  • Verify python/export_model_pytorch.py -export-onnx produces valid .onnx files

🤖 Generated with Claude Code

ChinChangYang and others added 18 commits February 28, 2026 20:30
Add a generic ONNX backend (-DUSE_BACKEND=ONNX) that loads standard .bin.gz
model files and builds ONNX graphs dynamically via OnnxModelBuilder. The
execution provider is selected at runtime via the onnxProvider config key:
- "cpu" (default) — CPU execution provider, works everywhere
- "coreml" — CoreML execution provider (macOS only, Apple Silicon)

New files:
- neuralnet/onnxbackend.cpp — ONNX Runtime backend implementation
- neuralnet/onnxmodelbuilder.{h,cpp} — Builds ONNX protobuf graph from ModelDesc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add metadata input handling (input_meta tensor, buffer, copy loop) and
build the 7-layer metadata encoder MLP in the ONNX graph so that models
with metaEncoderVersion > 0 (e.g. b18c384nbt-humanv0) work correctly.
Also fix version-dependent miscvalue parsing and remove unused
out_moremiscvalue output.

Verified via cross-backend testgpuerror against Eigen reference on 669
positions with max winrate error ~0.0005%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace Gemm with MatMul in model builder, eliminating weight
  transpose and dead setAttrFloat helper
- Add null guard around MiscValue parsing in onnxbackend.cpp
- Hoist numScoreValueChannels above version-check cascade to remove
  four redundant declarations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add direct loading of pre-built .onnx files alongside existing .bin.gz
support. On load, a temporary ONNX Runtime session introspects
input/output tensor shapes to populate ModelDesc metadata, and
auto-detects model version from channel counts. The raw ONNX bytes are
passed directly to the session (skipping OnnxModelBuilder).

Also adds configurable input/output node names via config keys
(onnxInputSpatial, onnxOutputPolicy, etc.) for compatibility with
models exported by different tools, and relaxes output channel
assertions from == to >= so models with extra output channels work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a guard for tellg() returning -1, which would cause a dangerously
large allocation when cast to size_t. Replace the string copy of raw
ONNX model bytes (~100MB+) with a pointer-based approach that references
the existing data directly, avoiding an unnecessary allocation and copy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace fragile static thread_local nameCounter in onnxmodelbuilder.cpp
with a local int passed by reference through all helper functions. Guard
the openclTunerFile config read in setup.cpp with #else so it only
compiles for non-ONNX backends, removing dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make addNode return NodeProto* and pass it directly to setAttrInt/
  setAttrInts, removing the fragile "last node in graph" lookup pattern
- Use name-based input introspection for raw .onnx files (matching
  "spatial"/"global"/"meta"), with shape-based fallback for non-KataGo
  models, consistent with how output introspection already works
- Default to small batch size (2) for ONNX CPU provider, matching the
  Eigen backend behavior since CPU doesn't benefit from large batches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Capture the return value of addBiasNode for v3Bias directly instead of
re-fetching it via graph->node(graph->node_size() - 1).output(0), which
would silently break if any node were inserted in between. This matches
the pattern already used by the adjacent sv3Biased variable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kend

Use singleInputMetaElts for meta buffer sizing to match the pattern used
by spatial and global buffers. Hoist onnxProvider config read so it is
read once and reused for both backendExtraParam and nnMaxBatchSize logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add an OnnxExportWrapper that selects the correct policy channels by
model version and combines miscvalue/moremiscvalue into the 6-channel
format expected by the C++ ONNX backend. When -export-onnx is passed,
torch.onnx.export produces a .onnx file alongside the .bin file with
input/output names matching the C++ backend defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kend

Replace silent zero-element tensor creation for unknown ONNX inputs with
an error, preventing masked model-mismatch bugs on CoreML and other
providers. Derive dummy input channel counts in the Python ONNX export
from model config instead of hardcoding 22/19/192.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add dynamic spatial axes (height, width) to ONNX export so models
support board sizes other than 19x19. Error on missing output node
names in the C++ backend instead of silently proceeding with garbage
values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tighten the ownership channel assertion from >= 1 to == 1 to match
all other backends (Eigen, CUDA, OpenCL), since the offset calculation
assumes exactly 1 channel. Add runonnxtests.sh with three test levels:
runtinynntests, testgpuerror, and runnnevalcanarytests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 6 missing commas in GENERIC_MODEL_NAMES that caused implicit string
concatenation, silently merging pairs of model name entries into single
mangled strings. Add CMake IS_DIRECTORY validation for ONNX Runtime
paths, and expand documentation comments for model version detection
heuristics, policy channel indices, and mask derivation invariants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Compiling.md: add ONNX Runtime Backend section with build
  prerequisites, CMake commands, raw .onnx usage, and provider config
- README.md: add ONNX to backend comparison section with summary
  and detailed description
- cpp/README.md: add onnxbackend.cpp and onnxmodelbuilder to source
  code listing
- gtp_example.cfg: document all ONNX config keys (onnxProvider,
  node name overrides, onnxModelVersion)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ONNX Runtime (MIT), ONNX (MIT), and Protobuf (BSD 3-Clause)
  attribution to LICENSE for dynamically linked dependencies
- Add warning for unrecognized input tensor shapes during raw ONNX
  model introspection
- Add null-pointer assertions on GetTensorData() return values after
  session->Run()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable Windows/Linux users to select CUDA or TensorRT execution
providers via onnxProvider config key, in addition to existing CPU
and CoreML providers. Update documentation and config comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant