- Single-image pipeline: Photo → Depth (DistillAnyDepth Small) → Unproject → Gaussian Splats → Stochastic Renderer @ 60 FPS
- Multi-view 2D offset: Feature matching → pixel offset → extended panoramic coverage (near-parallel views)
- Multi-view world-space (GT): Ground truth R/t/K → per-view depth → R^T rotation → world-space fusion (validated with TempleRing)
- Per-view depth scale alignment: Project scene center into each view, sample MDE depth, compute scale to match actual camera-to-center distance
- Studio UI: WebGPU-rendered UI, project management, OPFS persistence
- XR/VR: WebGL fallback path (session starts, rendering not yet verified on headset)
- SfM for real photos: Camera poses from essential matrix + PnP are too noisy for near-parallel views (bathroom photos). Works conceptually but produces garbled fusion.
- Seam blending: Hard cut between reference and extension views creates visible depth discontinuity at boundaries
- Depth scale alignment from SfM: Only 54 sparse points → noisy scale ratios. Per-view center projection works better with GT.
Source: https://github.com/SonnyC56/blunt — Python tool doing similar single-image-to-splat conversion.
- Read focal length from image EXIF data (SpawnDev.BlazorJS can access this)
- BLUNT uses
max(w,h) * 0.7as fallback; we use1.2— test which is better - Files:
DepthToGaussianKernel.cs(RunUnprojectAsync),MultiViewGenerationService.cs
- At depth discontinuities, pixels get unprojected to wrong positions ("floaters")
- BLUNT detects these via depth gradient magnitude and removes them
- We already compute depth gradient for edge sharpness — extend to cull extreme gradients
- Files:
DepthToGaussianKernel.cs(UnprojectAndPackKernel)
- Remove closest ~5-8% of splats by depth (often noise/artifacts)
- Simple threshold after min/max normalization
- Files:
DepthToGaussianKernel.cs
- Splats at depth edges get reduced opacity (currently all 0.9)
- Use the existing depth gradient to modulate opacity: high gradient → lower alpha
- Files:
DepthToGaussianKernel.cs(line that setsoutPacked[outOff + 9] = 0.9f)
- Smooth depth map before unprojection to reduce noise
- ILGPU kernel: 3x3 or 5x5 median filter on the depth buffer
- Files: New kernel in
DepthToGaussianKernel.csorDepthEstimationService.cs
- FP16 ONNX model converted from onnx-community/depth-anything-v3-small (50.5 MB)
- Patched
/backbone/Resizefrom cubic→linear for ORT WebGPU EP compat - Outputs:
predicted_depth,confidence,extrinsics,intrinsics— multi-view native - DAv3 outputs direct depth (high=far), flipped to disparity-like in
DepthEstimationService.FlipDepthKernel - Model selector added to Studio UI, DAv3 is now default
- Status: Inference works, testing depth quality
SplatParamsandSplatWorldParamsstructs replace allfloat[] pmagic-index arrays- No GPU buffer allocation for params — ILGPU decomposes to scalar bindings
- Optional
CameraParams? cameraon all public methods for EXIF flow-through
- Pure C# EXIF parser (
ExifReader.cs) extracts FocalLength + FocalLengthIn35mmFilm from JPEG bytes CameraParams.CreateFromExif(): FocalLength35mm (exact) → phone estimate (7x crop) → 1.2x heuristic- Integrated into single-image, multi-view, and SfM paths
- Compute surface normals from cross-product of neighboring 3D points (ILGPU kernel)
- Cull or reduce opacity of splats where normal is nearly perpendicular to camera ray (grazing angle)
- More general than depth-gradient culling — works for any scene geometry
- Files: New kernel in
DepthToGaussianKernel.csor separateNormalFilterKernel.cs
- TempleRing GT validates the world-space kernel math is correct
- SfM needs: better RANSAC thresholds, cheirality check, more features
- Test with TempleRing (SfM vs GT) to measure pose error
- Files:
SfmReconstructor.cs,GpuSfmKernels.cs
- Current: project scene center into view, sample one MDE depth → one scale ratio
- Better: project multiple SfM sparse points → multiple ratios → robust median/RANSAC
- Or: least-squares fit
D_sfm = s * D_mde + bper view using multiple correspondences - Files:
MultiViewGenerationService.cs
- Current: hard cut between reference and extension → visible seam
- Better: overlap band (100-200px) where both views contribute with opacity gradient
- Reference fades from 1.0→0.0, extension fades from 0.0→1.0 across the band
- Use sigmoid/cosine fade based on distance from image center → smooth dissolve between views
- Files:
DepthToGaussianKernel.cs(exclusion zone → blend zone)
- Use sparse SfM points as "metric skeleton" to anchor relative MDE depth
- ILGPU kernel: find global scale
sand biasbthat best fits MDE to SfM world coords - Scene-agnostic: works for any scene without per-view manual tuning
- Alternative to metric depth models (which are too large for browser)
- Files: New
GlobalAlignmentKernel.cs
- DAv3 accepts
[1, N, 3, H, W]input — process multiple views in single forward pass - Outputs extrinsics
[1, N, 3, 4]and intrinsics — eliminates need for separate SfM - Could replace SfmReconstructor entirely for supported view counts
- Files:
DepthEstimationService.cs,MultiViewGenerationService.cs
- 2D offset (median dx, dy) assumes pure translation between views
- For rotating camera (panning), correct transformation is a homography
- Compute homography from 4+ matched feature pairs using DLT + RANSAC
- Near objects shift more than far objects (parallax) — homography handles this for planar scenes
- Files: New
HomographyEstimator.csor extendMultiViewGenerationService.cs
- Assign each splat to a spatial chunk using Z-order curves
- ILGPU kernel: quantize position → interleave bits → Morton code
- Radix sort by Morton code (reuse GpuSplatSorter)
- Build chunk table: (mortonCode, startIndex, count)[]
- Files: New
SpatialOrganizer.cs
- Before rendering, test chunk AABBs against camera frustum
- Build active chunk list per frame
- Files:
GpuGaussianRenderer.cs
- Budget ~14-20M active splats for 60 FPS
- Near chunks: full density, Medium: 1/4, Far: 1/16
- Stochastic Morton LOD: Use LSBs of Morton index as density mask — further splats get masked by progressively more bits, maintaining constant screen-space density regardless of scene size. Pre-compute Morton index during unprojection kernel for O(1) voxel lookup later.
- This enables 100M+ splat scenes without exceeding WASM memory or GPU frame budget
- Files:
GpuGaussianRenderer.cs,DepthToGaussianKernel.cs(Morton pre-computation)
- Header + chunk index + quantized splat data
- Save/load via OPFS
- Files: New
SogFormat.cs, extendProjectService.cs
- HTMLVideoElement → seek at intervals → OffscreenCanvas → RGBA
- Score frames by sharpness (Laplacian variance) and diversity (feature distance)
- Files: New
VideoFrameExtractor.cs
- Accept video files in file picker
- Extract keyframes → add as project sources → run multi-view pipeline
- Files:
Studio.Projects.cs,Studio.UI.cs
- Multiple phones stream camera feeds to PC via WebRTC
- PC runs incremental SfM + scene generation in real-time
- VR headset on PC views the scene being built
- See NOTES.md for full architecture notes
| Dataset | Images | Ground Truth | Notes |
|---|---|---|---|
| TempleRing | 16 (every 3rd) | R/t/K in templeR_par.txt | Middlebury benchmark, ring around temple |
| DinoSparseRing | 16 | Possibly (check) | Similar ring capture |
| Skull | 75 | None | Real photos, good coverage |
| Bathroom | 16+ | None | Phone photos, near-parallel, challenging |
| SouthBuilding | 22 | None | Outdoor, wide baseline |
| SmallPlastic | 14 | None | Small object |
- Symptom: When multiple kernels are loaded via
LoadStreamKernelon the sameWebGPUAccelerator, the generated WGSL shader has duplicate declarations ofvar _uf_group_iter, causingCreateShaderModuleto fail silently. Kernels that referenceGrid.IdxXdon't execute (output buffers stay zeroed or unchanged). - Repro: Load 3+ stream kernels (e.g., 2 MatMul + 1 LayerNorm + 1 Softmax), all using
Grid.IdxX. The WGSL compiler reportsredeclaration of '_uf_group_iter'at multiple line offsets. - Workaround: Use
LoadAutoGroupedStreamKernelwithIndex1Dinstead (sequential-per-row approach for reduction kernels). This avoidsGrid.IdxXin the generated WGSL. - Fix: The WGSL code generator needs to emit unique variable names per kernel entry point when multiple kernels share a shader module, or compile each
LoadStreamKernelinto its own WGSL module. - Files: SpawnDev.ILGPU WebGPU backend WGSL code generator
- Priority: Medium — workaround exists, but shared memory reductions would be faster for large C dimensions
- SuperSplat (PlayCanvas): Editor/viewer for pre-made splats, SOG format, walk mode, annotations
- BLUNT: Python single-image-to-splat tool, good quality improvements (flying pixel removal, edge-aware opacity, EXIF focal length)
- DepthSplat (CVPR 2025): Multi-view depth + transformer → high-quality splats
- Splatt3r: Pose-free stereo pairs → splats at 4 FPS
- DUSt3R/MASt3R: Foundation models for dense 3D from 2+ images