Benchmarking & Diagnostics
Locus is built with a focus on extreme performance. To maintain this, we provide a suite of tools for benchmarking and diagnosing failures, covering both the core Rust engine and the Python bindings.
The 3-Tier Tooling Stack
To tune the Locus codebase for maximum throughput, we enforce strict boundaries between measurement tools to avoid the "Observer Effect".
Tier 1: End-to-End Regression (The Python CLI)
- Tool:
uv run tools/cli.py bench real(ICRA 2020 scenarios or Hugging Face Hub datasets via--hub-config). - Purpose: Measures the true wall-clock time from Python memory ingestion, through the FFI boundary, across the Rust math kernels, and back to Python. Also reports recall and pose error (translation RMSE in metres) when ground truth poses are available.
- Rule: This is the ultimate ground truth for latency. If a micro-optimization doesn't lower this number, it didn't actually work.
Tier 2: Macro-Profiling (Tracy)
- Tool:
tracing-tracycombined with the Tracy GUI profiler (cargo test --features tracy). - Purpose: Identifies which pipeline stage is the bottleneck (e.g., proving that
decode_batch_soais taking 10ms whileextract_quadsis taking 2ms). - Rule: Never run Tracy concurrently with JSON loggers or console formatters. The string allocation overhead will pollute the nanosecond lock-free ring buffers.
Tier 3: Micro-Benchmarking (Divan)
- Tool:
cargo benchusing the Divan framework incrates/locus-core/benches/. - Purpose: Measures the Instructions Per Clock (IPC) and L1 cache utilization of isolated, single-threaded mathematical kernels (like SIMD bilinear sampling).
- Rule: Run these strictly single-threaded to prevent the OS scheduler or Rayon from thrashing the L1 cache.
Rust Benchmarking (Core Engine)
The Rust benchmarking suite is the source of truth for core engine performance and regressions.
Regression Suite (ICRA 2020)
The regression suite validates that Locus matches or exceeds ground truth for thousands of images.
- Set Dataset Path:
export LOCUS_ICRA_DATASET_DIR=/path/to/icra2020 - Run Benchmarks:
# Core check (Forward dataset + Fixtures, approx 15s) cargo test --release --test regression_icra2020 --features bench-internals # Extended check (Circle, Random, Rotation, approx 1-2 mins) LOCUS_EXTENDED_REGRESSION=1 cargo test --release --test regression_icra2020 --features bench-internals # Accurate latency measurement (sequential) cargo test --release --test regression_icra2020 --features bench-internals -- --test-threads=1[!IMPORTANT]
--releaseis mandatory for runningregression_icra2020tests. Running in debug mode is blocked and will panic.
Hub Regression Suite (Hugging Face)
Locus supports running regressions against large-scale datasets hosted on the Hugging Face Hub.
[!IMPORTANT]
--releaseis mandatory for running Hub regression tests. Running in debug mode is extremely slow and will likely timeout in CI or developer environments.
-
Synchronize Data: Download all Hub subsets to the local cache (
tests/data/hub_cache/). The script auto-discovers every available config by default:Or sync a specific subset:uv run python tools/bench/sync_hub.py --configs alluv run python tools/bench/sync_hub.py --configs \ locus_v1_tag36h11_640x480 \ locus_v1_tag36h11_1280x720 \ locus_v1_tag36h11_1920x1080 \ locus_v1_tag36h11_3840x2160 \ charuco_golden_v1_1920x1080 \ aprilgrid_golden_v1_1920x1080 -
Run Hub Tests:
# Tag-level regression (regression_render_tag) # Covers 4 resolutions × Erf/GWLF/EdLines variants and Fast/Accurate pose modes. # Requires LOCUS_HUB_DATASET_DIR to locate the cache. LOCUS_HUB_DATASET_DIR=tests/data/hub_cache \ cargo test --release --test regression_render_tag --features bench-internals -- --nocapture # Board-level regression (regression_board_hub) # Validates ChAruco and AprilGrid golden datasets. # Uses workspace-relative tests/data/hub_cache/ automatically — no env var needed. cargo test --release --test regression_board_hub --features bench-internals -- --nocapture
Logic-Specific Benchs (Micro-benchmarking)
For fine-grained benchmarking of specific components, we use Divan. These are located in crates/locus-core/benches.
# Run all micro-benchmarks
cargo bench
# Run specific micro-benchmark (e.g., real-world data)
cargo bench --bench real_data_bench
cargo bench --bench decoding_real_bench
Mutually Exclusive Telemetry Matrix
Locus implements a zero-cost, mutually exclusive telemetry architecture for its regression tests to avoid the "Observer Effect". You cannot simultaneously emit structured JSON logs and capture high-fidelity Tracy profiles without the JSON serialization skewing the nanosecond timings.
To resolve this, we decouple the profilers at the CI level using TELEMETRY_MODE.
Human Mode (Tracy)
Captures pristine binary traces for GUI analysis.
# Tracy client is assumed to be running or capturing headlessly
TRACY_NO_INVARIANT_CHECK=1 TELEMETRY_MODE=tracy cargo test --release --test regression_icra2020 --features tracy,bench-internals -- --test-threads=1
Agent/CI Mode (JSON)
Dumps structured pipeline timings to target/profiling/*_events.json for AI analysis and automated regression tracking.
TELEMETRY_MODE=json cargo test --release --test regression_icra2020 --features bench-internals -- --test-threads=1
Performance Reports
Methodology & durable learnings
- Benchmarking Lessons — consolidated timeline + architecture/profile/algorithm tradeoffs that should inform future work
- Micro-Benchmarking Guide — 3-tier validation loop
Recent point-in-time reports
- Quad-extraction truncation fix (2026-04-26)
- Render-tag 2160p recall lift (2026-04-25)
- Render-tag 1080p SOTA pursuit (2026-04-25)
- Hub Regression Performance (2026-04-23)
- Release Performance Report (2026-04-18)
Python Developer CLI
The tools/cli.py tool is the central entry point for high-level evaluations and development tasks.
Data Preparation
Download all required datasets (ICRA 2020 and Hugging Face Hub subsets):
PYTHONPATH=. uv run --group bench tools/cli.py bench prepare
tests/data/hub_cache/.
Real-World Evaluation (ICRA 2020)
Evaluate performance on the ICRA 2020 dataset scenarios (forward, circle):
# Basic run on Locus
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward
# Compare against OpenCV and AprilTag 3
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward --compare
Hub Dataset Evaluation
Evaluate against rendered Hugging Face Hub datasets. These datasets include ground-truth 6-DOF poses, so the CLI reports both recall and pose error (translation RMSE in metres).
[!NOTE] Pose convention: Hub ground truth poses use a center origin (the pose describes the tag center). Locus reports poses at the top-left corner origin. The CLI automatically applies the rigid center-to-top-left shift via
Metrics.align_posebefore computing the error.
# Single-tag evaluation
PYTHONPATH=. uv run --group bench tools/cli.py bench real \
--hub-config locus_v1_tag36h11_1920x1080
# Board-level evaluation (AprilGrid or ChAruco)
# The board topology is inferred automatically from the dataset's rich_truth.json.
PYTHONPATH=. uv run --group bench tools/cli.py bench real \
--hub-config aprilgrid_golden_v1_1920x1080
PYTHONPATH=. uv run --group bench tools/cli.py bench real \
--hub-config charuco_golden_v1_1920x1080
# Limit frames and use a custom cache directory
PYTHONPATH=. uv run --group bench tools/cli.py bench real \
--hub-config aprilgrid_golden_v1_1920x1080 \
--data-dir tests/data/hub_cache \
--limit 50
Hub evaluation is mutually exclusive with ICRA scenarios — passing --hub-config skips the --scenarios loop.
Regression Tracking (Baselines)
You can save a "Golden Baseline" and compare current performance against it.
# Save a baseline
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward --save-baseline docs/engineering/benchmarking/baseline.json
# Compare current run against baseline
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward --baseline docs/engineering/benchmarking/baseline.json
Deep Profiling (Tracy)
Locus supports high-fidelity profiling using the Tracy Profiler.
- Rebuild with Tracy support:
uv run maturin develop -r -F tracy - Start the Tracy GUI client.
- Run benchmark with profiling flag:
Note: On some Linux systems, you may need
# Add --profile to any 'bench real' command PYTHONPATH=. uv run --group bench tools/cli.py bench real --profile --limit 5TRACY_NO_INVARIANT_CHECK=1if your CPU doesn't support invariant TSC.
Visual Debugging with Rerun
For diagnosing recall issues or tuning parameters, use the specialized visualization tool:
uv run tools/cli.py visualize --scenario forward --limit 5
Locus provides a high-fidelity debugging pipeline integrated with the Rerun SDK.
Features
- Convergence Tracking: Visualize subpixel jitter (yellow arrows) and reprojection errors (scalar plots) for every tag.
- Failure Diagnosis: Differentiate between geometric rejections (Red) and decoding failures (Orange).
- Remote & Edge Ready: Debug edge devices remotely using
--rerun-addrto stream to a local Rerun viewer.
For a comprehensive walkthrough, see the How-to Guide: Debugging with Rerun.