SpawnScene Development Plans

Current State (2026-03-16)

What Works

Single-image pipeline: Photo → Depth (DistillAnyDepth Small) → Unproject → Gaussian Splats → Stochastic Renderer @ 60 FPS
Multi-view 2D offset: Feature matching → pixel offset → extended panoramic coverage (near-parallel views)
Multi-view world-space (GT): Ground truth R/t/K → per-view depth → R^T rotation → world-space fusion (validated with TempleRing)
Per-view depth scale alignment: Project scene center into each view, sample MDE depth, compute scale to match actual camera-to-center distance
Studio UI: WebGPU-rendered UI, project management, OPFS persistence
XR/VR: WebGL fallback path (session starts, rendering not yet verified on headset)

What Doesn't Work Yet

SfM for real photos: Camera poses from essential matrix + PnP are too noisy for near-parallel views (bathroom photos). Works conceptually but produces garbled fusion.
Seam blending: Hard cut between reference and extension views creates visible depth discontinuity at boundaries
Depth scale alignment from SfM: Only 54 sparse points → noisy scale ratios. Per-view center projection works better with GT.

Priority 1: Single-Image Quality (Quick Wins from BLUNT)

Source: https://github.com/SonnyC56/blunt — Python tool doing similar single-image-to-splat conversion.

1.1 EXIF Focal Length

Read focal length from image EXIF data (SpawnDev.BlazorJS can access this)
BLUNT uses max(w,h) * 0.7 as fallback; we use 1.2 — test which is better
Files: DepthToGaussianKernel.cs (RunUnprojectAsync), MultiViewGenerationService.cs

1.2 Flying Pixel Removal

At depth discontinuities, pixels get unprojected to wrong positions ("floaters")
BLUNT detects these via depth gradient magnitude and removes them
We already compute depth gradient for edge sharpness — extend to cull extreme gradients
Files: DepthToGaussianKernel.cs (UnprojectAndPackKernel)

1.3 Near-Camera Culling

Remove closest ~5-8% of splats by depth (often noise/artifacts)
Simple threshold after min/max normalization
Files: DepthToGaussianKernel.cs

1.4 Edge-Aware Opacity

Splats at depth edges get reduced opacity (currently all 0.9)
Use the existing depth gradient to modulate opacity: high gradient → lower alpha
Files: DepthToGaussianKernel.cs (line that sets outPacked[outOff + 9] = 0.9f)

1.5 Median Filtering on Depth

Smooth depth map before unprojection to reduce noise
ILGPU kernel: 3x3 or 5x5 median filter on the depth buffer
Files: New kernel in DepthToGaussianKernel.cs or DepthEstimationService.cs

1.6 Depth Anything V3 ✅ (2026-03-17)

FP16 ONNX model converted from onnx-community/depth-anything-v3-small (50.5 MB)
Patched /backbone/Resize from cubic→linear for ORT WebGPU EP compat
Outputs: predicted_depth, confidence, extrinsics, intrinsics — multi-view native
DAv3 outputs direct depth (high=far), flipped to disparity-like in DepthEstimationService.FlipDepthKernel
Model selector added to Studio UI, DAv3 is now default
Status: Inference works, testing depth quality

1.7 Kernel Struct Refactor ✅ (2026-03-17)

SplatParams and SplatWorldParams structs replace all float[] p magic-index arrays
No GPU buffer allocation for params — ILGPU decomposes to scalar bindings
Optional CameraParams? camera on all public methods for EXIF flow-through

1.8 EXIF Focal Length ✅ (2026-03-17)

Pure C# EXIF parser (ExifReader.cs) extracts FocalLength + FocalLengthIn35mmFilm from JPEG bytes
CameraParams.CreateFromExif(): FocalLength35mm (exact) → phone estimate (7x crop) → 1.2x heuristic
Integrated into single-image, multi-view, and SfM paths

1.9 Normal-Based Flying Pixel Removal

Compute surface normals from cross-product of neighboring 3D points (ILGPU kernel)
Cull or reduce opacity of splats where normal is nearly perpendicular to camera ray (grazing angle)
More general than depth-gradient culling — works for any scene geometry
Files: New kernel in DepthToGaussianKernel.cs or separate NormalFilterKernel.cs

Priority 2: Multi-View Fusion Improvements

2.1 Better SfM for Wide-Baseline Views

TempleRing GT validates the world-space kernel math is correct
SfM needs: better RANSAC thresholds, cheirality check, more features
Test with TempleRing (SfM vs GT) to measure pose error
Files: SfmReconstructor.cs, GpuSfmKernels.cs

2.2 Multi-Point Depth Scale Alignment

Current: project scene center into view, sample one MDE depth → one scale ratio
Better: project multiple SfM sparse points → multiple ratios → robust median/RANSAC
Or: least-squares fit D_sfm = s * D_mde + b per view using multiple correspondences
Files: MultiViewGenerationService.cs

2.3 Seam Blending for 2D Offset Mode

Current: hard cut between reference and extension → visible seam
Better: overlap band (100-200px) where both views contribute with opacity gradient
Reference fades from 1.0→0.0, extension fades from 0.0→1.0 across the band
Use sigmoid/cosine fade based on distance from image center → smooth dissolve between views
Files: DepthToGaussianKernel.cs (exclusion zone → blend zone)

2.5 Global Alignment Kernel (Relative Depth + Sparse SfM)

Use sparse SfM points as "metric skeleton" to anchor relative MDE depth
ILGPU kernel: find global scale s and bias b that best fits MDE to SfM world coords
Scene-agnostic: works for any scene without per-view manual tuning
Alternative to metric depth models (which are too large for browser)
Files: New GlobalAlignmentKernel.cs

2.6 DAv3 Multi-View Native Inference

DAv3 accepts [1, N, 3, H, W] input — process multiple views in single forward pass
Outputs extrinsics [1, N, 3, 4] and intrinsics — eliminates need for separate SfM
Could replace SfmReconstructor entirely for supported view counts
Files: DepthEstimationService.cs, MultiViewGenerationService.cs

2.4 Homography Instead of Translation

2D offset (median dx, dy) assumes pure translation between views
For rotating camera (panning), correct transformation is a homography
Compute homography from 4+ matched feature pairs using DLT + RANSAC
Near objects shift more than far objects (parallax) — homography handles this for planar scenes
Files: New HomographyEstimator.cs or extend MultiViewGenerationService.cs

Priority 3: SOG Format + Streaming LOD

3.1 Morton Code Spatial Organization

Assign each splat to a spatial chunk using Z-order curves
ILGPU kernel: quantize position → interleave bits → Morton code
Radix sort by Morton code (reuse GpuSplatSorter)
Build chunk table: (mortonCode, startIndex, count)[]
Files: New SpatialOrganizer.cs

3.2 Frustum Culling by Chunk

Before rendering, test chunk AABBs against camera frustum
Build active chunk list per frame
Files: GpuGaussianRenderer.cs

3.3 Distance-Based LOD + Stochastic Morton Masking

Budget ~14-20M active splats for 60 FPS
Near chunks: full density, Medium: 1/4, Far: 1/16
Stochastic Morton LOD: Use LSBs of Morton index as density mask — further splats get masked by progressively more bits, maintaining constant screen-space density regardless of scene size. Pre-compute Morton index during unprojection kernel for O(1) voxel lookup later.
This enables 100M+ splat scenes without exceeding WASM memory or GPU frame budget
Files: GpuGaussianRenderer.cs, DepthToGaussianKernel.cs (Morton pre-computation)

3.4 SOG File Format

Header + chunk index + quantized splat data
Save/load via OPFS
Files: New SogFormat.cs, extend ProjectService.cs

Priority 4: Video Input

4.1 Keyframe Extraction

HTMLVideoElement → seek at intervals → OffscreenCanvas → RGBA
Score frames by sharpness (Laplacian variance) and diversity (feature distance)
Files: New VideoFrameExtractor.cs

4.2 Studio Integration

Accept video files in file picker
Extract keyframes → add as project sources → run multi-view pipeline
Files: Studio.Projects.cs, Studio.UI.cs

Future: WebRTC Multi-Device Scanning

Multiple phones stream camera feeds to PC via WebRTC
PC runs incremental SfM + scene generation in real-time
VR headset on PC views the scene being built
See NOTES.md for full architecture notes

Test Datasets (wwwroot/datasets/)

Dataset	Images	Ground Truth	Notes
TempleRing	16 (every 3rd)	R/t/K in templeR_par.txt	Middlebury benchmark, ring around temple
DinoSparseRing	16	Possibly (check)	Similar ring capture
Skull	75	None	Real photos, good coverage
Bathroom	16+	None	Phone photos, near-parallel, challenging
SouthBuilding	22	None	Outdoor, wide baseline
SmallPlastic	14	None	Small object

SpawnDev.ILGPU Bugs to Fix

WGSL `_uf_group_iter` Redeclaration Bug

Symptom: When multiple kernels are loaded via LoadStreamKernel on the same WebGPUAccelerator, the generated WGSL shader has duplicate declarations of var _uf_group_iter, causing CreateShaderModule to fail silently. Kernels that reference Grid.IdxX don't execute (output buffers stay zeroed or unchanged).
Repro: Load 3+ stream kernels (e.g., 2 MatMul + 1 LayerNorm + 1 Softmax), all using Grid.IdxX. The WGSL compiler reports redeclaration of '_uf_group_iter' at multiple line offsets.
Workaround: Use LoadAutoGroupedStreamKernel with Index1D instead (sequential-per-row approach for reduction kernels). This avoids Grid.IdxX in the generated WGSL.
Fix: The WGSL code generator needs to emit unique variable names per kernel entry point when multiple kernels share a shader module, or compile each LoadStreamKernel into its own WGSL module.
Files: SpawnDev.ILGPU WebGPU backend WGSL code generator
Priority: Medium — workaround exists, but shared memory reductions would be faster for large C dimensions

Reference Projects

SuperSplat (PlayCanvas): Editor/viewer for pre-made splats, SOG format, walk mode, annotations
BLUNT: Python single-image-to-splat tool, good quality improvements (flying pixel removal, edge-aware opacity, EXIF focal length)
DepthSplat (CVPR 2025): Multi-view depth + transformer → high-quality splats
Splatt3r: Pose-free stereo pairs → splats at 4 FPS
DUSt3R/MASt3R: Foundation models for dense 3D from 2+ images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SpawnScene Development Plans

Current State (2026-03-16)

What Works

What Doesn't Work Yet

Priority 1: Single-Image Quality (Quick Wins from BLUNT)

1.1 EXIF Focal Length

1.2 Flying Pixel Removal

1.3 Near-Camera Culling

1.4 Edge-Aware Opacity

1.5 Median Filtering on Depth

1.6 Depth Anything V3 ✅ (2026-03-17)

1.7 Kernel Struct Refactor ✅ (2026-03-17)

1.8 EXIF Focal Length ✅ (2026-03-17)

1.9 Normal-Based Flying Pixel Removal

Priority 2: Multi-View Fusion Improvements

2.1 Better SfM for Wide-Baseline Views

2.2 Multi-Point Depth Scale Alignment

2.3 Seam Blending for 2D Offset Mode

2.5 Global Alignment Kernel (Relative Depth + Sparse SfM)

2.6 DAv3 Multi-View Native Inference

2.4 Homography Instead of Translation

Priority 3: SOG Format + Streaming LOD

3.1 Morton Code Spatial Organization

3.2 Frustum Culling by Chunk

3.3 Distance-Based LOD + Stochastic Morton Masking

3.4 SOG File Format

Priority 4: Video Input

4.1 Keyframe Extraction

4.2 Studio Integration

Future: WebRTC Multi-Device Scanning

Test Datasets (wwwroot/datasets/)

SpawnDev.ILGPU Bugs to Fix

WGSL `_uf_group_iter` Redeclaration Bug

Reference Projects

Uh oh!

FilesExpand file tree

PLANS.md

Latest commit

History

PLANS.md

File metadata and controls

SpawnScene Development Plans

Current State (2026-03-16)

What Works

What Doesn't Work Yet

Priority 1: Single-Image Quality (Quick Wins from BLUNT)

1.1 EXIF Focal Length

1.2 Flying Pixel Removal

1.3 Near-Camera Culling

1.4 Edge-Aware Opacity

1.5 Median Filtering on Depth

1.6 Depth Anything V3 ✅ (2026-03-17)

1.7 Kernel Struct Refactor ✅ (2026-03-17)

1.8 EXIF Focal Length ✅ (2026-03-17)

1.9 Normal-Based Flying Pixel Removal

Priority 2: Multi-View Fusion Improvements

2.1 Better SfM for Wide-Baseline Views

2.2 Multi-Point Depth Scale Alignment

2.3 Seam Blending for 2D Offset Mode

2.5 Global Alignment Kernel (Relative Depth + Sparse SfM)

2.6 DAv3 Multi-View Native Inference

2.4 Homography Instead of Translation

Priority 3: SOG Format + Streaming LOD

3.1 Morton Code Spatial Organization

3.2 Frustum Culling by Chunk

3.3 Distance-Based LOD + Stochastic Morton Masking

3.4 SOG File Format

Priority 4: Video Input

4.1 Keyframe Extraction

4.2 Studio Integration

Future: WebRTC Multi-Device Scanning

Test Datasets (wwwroot/datasets/)

SpawnDev.ILGPU Bugs to Fix

WGSL _uf_group_iter Redeclaration Bug

Reference Projects

WGSL `_uf_group_iter` Redeclaration Bug