European Conference on Computer Vision (ECCV) 2026
1Korea University
2Kakao Mobility Corp.
3Handong Global University
†Corresponding authors
The Orchestrator classifies the spatial query category and selects top-3 specialist agents, returning their roles with confidence scores.
Each specialist leverages its assigned inductive bias—depth maps, scene-graphs, or visual reasoning traces—to produce an independent answer.
A Reasoning Agent synthesizes all specialist outputs, weighted by their confidence scores, to produce the final answer and update scores.
Figure 1. Full pipeline of SpatiO's Reliability-aware Spatial Agents Orchestration. Specialist confidence scores are updated via Bayesian update and Dual EMA at every test-time step.
Figure 2. Test-Time Orchestration (TTO) confidence score update pipeline. At each query step t, specialist outputs are scored, rewards are computed and scaled, then a Bayesian update followed by Dual EMA produces the updated confidence score s(t+1).
SpatiO's multi-agent design enables each specialist to leverage complementary spatial signals. Below we show side-by-side outputs from the Head-agent (routing), three specialist agents, and the Reasoning Agent's final synthesis.
Figure 3. SpatiO correctly resolves a distance & depth query. The heuristic specialist (Qwen3-4B) returns an incorrect answer based on 2D appearance, while the 3D reconstruction specialist and scene-graph specialist both provide correct reasoning. The Reliability-aware Reasoning Agent overrides the heuristic and outputs the correct answer: (B) far away from each other.
Figure 4. SpatiO correctly resolves an orientation query. The Head-agent assigns Heuristic and 2D Scene-graph specialists. Despite the heuristic agent's initial error, the 3D reconstruction specialist (SpatialReasoner) and scene-graph specialist (Sa2va) both agree on (D) right, which the Reasoning Agent confirms as the final answer.
| Method | MMSI-Bench | STVQA-7k | CV-Bench | 3DSRBench | Avg. |
|---|---|---|---|---|---|
| LLaVA-4D | 23.2 | 57.2 | 68.3 | 49.7 | 49.6 |
| SpatialRGPT | 17.3 | 67.1 | 61.0 | 39.8 | 46.3 |
| Sa2VA | 8.7 | 65.3 | 70.2 | 48.5 | 48.2 |
| SpatialReasoner | 22.1 | 63.4 | 77.4 | 54.3 | 54.3 |
| Qwen-3.0-VL-4B | 24.1 | 77.9 | 84.4 | 59.1 | 61.4 |
| SpatiO (Ours) | 43.6 | 88.2 | 86.9 | 72.4 | 72.8 |