Benchmarking & Diagnostics
Locus is built with a focus on extreme performance. To maintain this, we provide a suite of tools for benchmarking and diagnosing failures, covering both the core Rust engine and the Python bindings.
The 3-Tier Tooling Stack
To tune the Locus codebase for maximum throughput, we enforce strict boundaries between measurement tools to avoid the "Observer Effect".
Tier 1: End-to-End Regression (The Python CLI)
- Tool:
uv run tools/cli.py bench real(using the ICRA 2020 dataset). - Purpose: Measures the true wall-clock time from Python memory ingestion, through the FFI boundary, across the Rust math kernels, and back to Python.
- Rule: This is the ultimate ground truth for latency. If a micro-optimization doesn't lower this number, it didn't actually work.
Tier 2: Macro-Profiling (Tracy)
- Tool:
tracing-tracycombined with the Tracy GUI profiler (cargo test --features tracy). - Purpose: Identifies which pipeline stage is the bottleneck (e.g., proving that
decode_batch_soais taking 10ms whileextract_quadsis taking 2ms). - Rule: Never run Tracy concurrently with JSON loggers or console formatters. The string allocation overhead will pollute the nanosecond lock-free ring buffers.
Tier 3: Micro-Benchmarking (Divan)
- Tool:
cargo benchusing the Divan framework incrates/locus-core/benches/. - Purpose: Measures the Instructions Per Clock (IPC) and L1 cache utilization of isolated, single-threaded mathematical kernels (like SIMD bilinear sampling).
- Rule: Run these strictly single-threaded to prevent the OS scheduler or Rayon from thrashing the L1 cache.
Rust Benchmarking (Core Engine)
The Rust benchmarking suite is the source of truth for core engine performance and regressions.
Regression Suite (ICRA 2020)
The regression suite validates that Locus matches or exceeds ground truth for thousands of images.
- Set Dataset Path:
export LOCUS_DATASET_DIR=/path/to/icra2020 - Run Benchmarks:
# Core check (Forward dataset + Fixtures, approx 15s) cargo test --release --test regression_icra2020 --features bench-internals # Extended check (Circle, Random, Rotation, approx 1-2 mins) LOCUS_EXTENDED_REGRESSION=1 cargo test --release --test regression_icra2020 --features bench-internals # Accurate latency measurement (sequential) cargo test --release --test regression_icra2020 --features bench-internals -- --test-threads=1[!IMPORTANT]
--releaseis mandatory for runningregression_icra2020tests. Running in debug mode is blocked and will panic.
Hub Regression Suite (Hugging Face)
Locus supports running regressions against large-scale datasets hosted on the Hugging Face Hub.
[!IMPORTANT]
--releaseis mandatory for running Hub regression tests. Running in debug mode is extremely slow and will likely timeout in CI or developer environments.
-
Synchronize Data: Download the datasets to a local cache. This requires the
benchdependency group.uv run python tools/bench/sync_hub.py --configs single_tag_locus_v1_std41h12_1920x1080 charuco_golden_v1 aprilgrid_golden_v1 -
Run Hub Tests: Point the test runner to the local cache directory:
# Tag-level regression LOCUS_HUB_DATASET_DIR=tests/data/hub_cache cargo test --release --test regression_render_tag --features bench-internals -- --nocapture # Board-level regression (Multi-tag estimation) # Validates ChAruco and AprilGrid golden datasets (150 images each) LOCUS_HUB_DATASET_DIR=tests/data/hub_cache cargo test --release --test regression_board_hub --features bench-internals -- --nocapture
Logic-Specific Benchs (Micro-benchmarking)
For fine-grained benchmarking of specific components, we use Divan. These are located in crates/locus-core/benches.
# Run all micro-benchmarks
cargo bench
# Run specific micro-benchmark (e.g., real-world data)
cargo bench --bench real_data_bench
cargo bench --bench decoding_real_bench
Mutually Exclusive Telemetry Matrix
Locus implements a zero-cost, mutually exclusive telemetry architecture for its regression tests to avoid the "Observer Effect". You cannot simultaneously emit structured JSON logs and capture high-fidelity Tracy profiles without the JSON serialization skewing the nanosecond timings.
To resolve this, we decouple the profilers at the CI level using TELEMETRY_MODE.
Human Mode (Tracy)
Captures pristine binary traces for GUI analysis.
# Tracy client is assumed to be running or capturing headlessly
TRACY_NO_INVARIANT_CHECK=1 TELEMETRY_MODE=tracy cargo test --release --test regression_icra2020 --features tracy,bench-internals -- --test-threads=1
Agent/CI Mode (JSON)
Dumps structured pipeline timings to target/profiling/*_events.json for AI analysis and automated regression tracking.
TELEMETRY_MODE=json cargo test --release --test regression_icra2020 --features bench-internals -- --test-threads=1
Performance Reports
Current
- Release Performance Report (2026-03-22)
- SOTA Presets + GN Covariance (2026-03-21)
- EDLines + Joint Gauss-Newton (2026-03-21)
- SIMD CCL Fusion (2026-03-19)
- Current Micro-Benchmark Baseline (2026-03-19)
- Micro-Benchmarking Guide
Historical
- Performance Evolution (Mar 2–16) — consolidated timeline of superseded reports
Python Developer CLI
The tools/cli.py tool is the central entry point for high-level evaluations and development tasks.
Data Preparation
Before running benchmarks, download all required datasets (AprilTag Mosaic and ICRA 2020):
PYTHONPATH=. uv run --group bench tools/cli.py bench prepare
Real-World Evaluation
Evaluate performance on the ICRA 2020 dataset scenarios (forward, circle):
# Basic run on Locus
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward
# Compare against OpenCV and AprilTag 3
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward --compare
Regression Tracking (Baselines)
You can save a "Golden Baseline" and compare current performance against it.
# Save a baseline
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward --save-baseline docs/engineering/benchmarking/baseline.json
# Compare current run against baseline
PYTHONPATH=. uv run --group bench tools/cli.py bench real --scenarios forward --baseline docs/engineering/benchmarking/baseline.json
Deep Profiling (Tracy)
Locus supports high-fidelity profiling using the Tracy Profiler.
- Rebuild with Tracy support:
uv run maturin develop -r -F tracy - Start the Tracy GUI client.
- Run benchmark with profiling flag:
Note: On some Linux systems, you may need
# Add --profile to any 'bench real' command PYTHONPATH=. uv run --group bench tools/cli.py bench real --profile --limit 5TRACY_NO_INVARIANT_CHECK=1if your CPU doesn't support invariant TSC.
Visual Debugging with Rerun
For diagnosing recall issues or tuning parameters, use the specialized visualization tool:
uv run tools/cli.py visualize --scenario forward --limit 5
Locus provides a high-fidelity debugging pipeline integrated with the Rerun SDK.
Features
- Convergence Tracking: Visualize subpixel jitter (yellow arrows) and reprojection errors (scalar plots) for every tag.
- Failure Diagnosis: Differentiate between geometric rejections (Red) and decoding failures (Orange).
- Remote & Edge Ready: Debug edge devices remotely using
--rerun-addrto stream to a local Rerun viewer.
For a comprehensive walkthrough, see the How-to Guide: Debugging with Rerun.