SOTA Configurations — Performance Report
Date: 2026-03-21
Build: --release, default thread count
Hardware: AMD EPYC-Milan, 4 cores / 8 threads, L3 32 MiB, Linux 6.8.0-101
1. What Changed
Three additions enable scenario-specific SOTA presets:
| Addition | Description | Commit |
|---|---|---|
| GN covariance propagation | cholesky_inverse_8x8 extracts per-corner 2×2 covariances from the GN Hessian H⁻¹; threaded through extract_quad_edlines → batch.corner_covariances |
31f247a |
| Builder setters | huber_delta_px, tikhonov_alpha_max, sigma_n_sq, structure_tensor_radius tunable via builder |
this session |
| Three SOTA presets | sota_metrology_default, sota_pure_tags_default, sota_checkerboard_default |
this session |
Architecture: The GN→Pose Handover (Metrology preset only)
Before: GN corners → batch.corners → pose: Structure Tensor fallback
H discarded ↗
After: GN corners + σ²·H⁻¹ blocks → batch.corner_covariances → Weighted LM (direct)
Per-corner covariance Σₖ = σ²·H⁻¹[2k:2k+2, 2k:2k+2], σ² = Σr²/(n_obs−8).
If GN diverges, corner_covariances is zeroed and pose falls back to the Structure Tensor.
2. The Three SOTA Presets
| Preset | Target Scenario | Key Differences vs Production |
|---|---|---|
sota_metrology_default() |
Single isolated tag, pose accuracy | EdLines + GN + None; sharpening off; Hard decode |
sota_pure_tags_default() |
Dense multi-tag scenes | Soft decode (+19pp recall); else identical to production |
sota_checkerboard_default() |
Touching tags in grid patterns | 4-connectivity; Soft decode; relaxed contrast/edge gates; sharpening off |
Why three separate presets?
The three target scenarios have mutually incompatible requirements:
- Metrology needs the lowest possible corner RMSE. This requires EdLines + GN corners (never post-processed), but the missing sharpening and None refinement hurt recall on multi-tag images where small/distant tags are marginal.
- Pure tags needs maximum recall without harming precision. Soft decoding is the dominant lever (+19pp vs Hard), but it causes a precision collapse (~10–20%) when combined with EdLines (which produces many more quad candidates from background edges). ContourRdp + Soft is the stable choice.
- Checkerboard has non-negotiable topological constraints (4-connectivity, relaxed contrast/edge thresholds) that actively hurt performance on isolated tags. Soft decoding was untested for this scenario; it proved equally effective (+18.4pp recall).
Preset parameters
sota_metrology_default()
quad_extraction_mode: EdLines
refinement_mode: None ← GN corners are sub-pixel; ERF degrades them
enable_sharpening: false ← pass raw PSF directly to the solver
enable_bilateral: false
quad_max_elongation: 20.0
quad_min_density: 0.15
decode_mode: Hard ← Soft causes precision collapse on EdLines
sota_pure_tags_default()
refinement_mode: Erf ← same as production
enable_sharpening: true ← same as production
quad_max_elongation: 20.0 ← same as production
quad_min_density: 0.15 ← same as production
decode_mode: Soft ← only difference; +19pp recall on ICRA forward
sota_checkerboard_default()
refinement_mode: Erf
enable_sharpening: false ← sharpening creates halos at shared borders
segmentation_connectivity: Four ← separates touching tag borders (non-negotiable)
decoder_min_contrast: 10.0 ← relaxed for low-contrast packed tags
quad_min_edge_score: 2.0 ← relaxed for weaker interior-border edge scores
quad_max_elongation: 20.0
quad_min_density: 0.15
decode_mode: Soft ← extends recall on low-contrast packed tags
3. ICRA 2020 — Pure Tags (multi-tag isolated, forward/pure_tags_images)
50 images, ~8 tags/frame average. Ground-truth convention remapped from UMich CCW to Locus CW.
3.1 Fixtures (CI gold standard, 1 image)
| Config | Recall | RMSE |
|---|---|---|
| Production (ContourRdp + Erf + Hard) | 100% | 0.131 px |
| ContourRdp + Soft | 100% | 0.131 px |
| EdLines + Erf + Hard | 100% | 0.071 px |
| SOTA Metrology (EdLines + None + Hard, no sharp) | 74.0% | 0.713 px |
| SOTA Pure Tags (ContourRdp + Erf + Soft) | 100% | 0.131 px |
SOTA Pure Tags matches production recall exactly on the fixture and inherits the same RMSE — the only change (Soft decode) has no effect on a clean high-contrast image.
3.2 Forward Dataset (50 images)
| Config | Recall | RMSE | Total Latency |
|---|---|---|---|
| Production (ContourRdp + Erf + Hard) | 76.9% | 0.274 px | 164.4 ms |
| GWLF | 65.6% | 0.545 px | 179.3 ms |
| EDLines + Erf + Hard | 71.1% | 0.254 px | 196.8 ms |
| SOTA Metrology (EdLines + None + Hard, no sharp) | 46.3% | 0.754 px | 106.6 ms |
| SOTA Pure Tags (ContourRdp + Erf + Soft) | 96.2% | 0.315 px | 70.8 ms |
+19.3pp recall vs production at a modest +15% RMSE cost, with a −57% latency reduction. Soft decode's MIH search is branch-limited — on this dataset it terminates early more often than Hard decode's full threshold pass, making it faster despite the increased code complexity.
4. ICRA 2020 — Checkerboard (touching tags, forward/checkerboard_corners_images)
50 images, dense tag grids where adjacent tag borders share a pixel boundary.
4.1 Fixtures (same 0037.png — contains both isolated and touching tags)
| Config | Recall | RMSE |
|---|---|---|
| Production (ContourRdp + Erf + Hard) | 100% | 0.131 px |
| Legacy Checkerboard (4-conn + Hard, no sharp) | 100% | 0.131 px |
| SOTA Checkerboard (4-conn + Soft, no sharp) | 100% | 0.144 px |
4.2 Forward Checkerboard Dataset (50 images)
| Config | Recall | RMSE | Total Latency |
|---|---|---|---|
| Production (ContourRdp + Erf + Hard) | — | — | — |
| Legacy Checkerboard (4-conn + Hard, no sharp) | 73.0% | 0.332 px | 153.6 ms |
| SOTA Checkerboard (4-conn + Soft, no sharp) | 91.4% | 0.458 px | 103.2 ms |
+18.4pp recall vs the legacy checkerboard preset (+25pp vs production), with a −33% latency reduction over legacy. Soft decoding proves equally effective on touching tags as on isolated tags, confirming the hypothesis. RMSE increases by +38% — acceptable for detection tasks where tag identity and pose (rather than sub-pixel corner precision) are the primary outputs.
5. Hub Dataset — Single Isolated Tag (AprilTag 36h11, PoseEstimationMode::Accurate)
5.1 Summary
| Resolution | Config | Recall | Precision | Corner RMSE | Repro RMSE | Trans P50 | Rot P50 | Total Latency |
|---|---|---|---|---|---|---|---|---|
| 640×480 | Production | 100% | 100% | 0.994 px | — | 4.1 mm | 1.29° | 74.3 ms |
| GWLF | 97.8% | 100% | 0.718 px | — | 3.5 mm | 0.25° | — | |
| SOTA Metrology | 93.3% | 100% | 0.173 px | 0.440 px | 1.0 mm | 0.32° | 53.9 ms | |
| 720p | Production | 100% | 100% | 0.933 px | — | 5.8 mm | 2.20° | 116.6 ms |
| GWLF | 100% | 100% | 0.751 px | — | 3.9 mm | 0.31° | — | |
| SOTA Metrology | 96.0% | 100% | 0.277 px | 1.906 px | 1.0 mm | 0.35° | 25.5 ms | |
| 1080p | Production | 100% | 100% | 1.146 px | — | 10.7 mm | 2.24° | 106.6 ms |
| GWLF | 100% | 100% | 0.928 px | — | 4.9 mm | 0.31° | — | |
| SOTA Metrology | 95.6% | 97.8% | 0.291 px | 2.142 px | 1.9 mm | 0.34° | 102.6 ms | |
| 4K | Production | 97.8% | 100% | 1.116 px | — | 43.5 mm | 6.68° | 278.4 ms |
| GWLF | 97.8% | 100% | 0.829 px | — | 9.9 mm | 0.97° | — | |
| SOTA Metrology | 88.9% | 100% | 0.157 px | 1.690 px | 5.6 mm | 0.58° | 182.2 ms |
Latency = total test time for 45–50 images including Accurate pose estimation. Hardware: AMD EPYC-Milan, 4 cores / 8 threads.
5.2 SOTA Metrology vs Production (Hub)
| Resolution | Corner RMSE Δ | Rot P50 Δ | Trans P50 Δ | Recall Δ |
|---|---|---|---|---|
| 640×480 | −83% (0.17 vs 0.99 px) | −75% (0.32° vs 1.29°) | −76% (1.0 vs 4.1 mm) | −6.7 pp |
| 720p | −70% (0.28 vs 0.93 px) | −84% (0.35° vs 2.20°) | −83% (1.0 vs 5.8 mm) | −4.0 pp |
| 1080p | −75% (0.29 vs 1.15 px) | −85% (0.34° vs 2.24°) | −82% (1.9 vs 10.7 mm) | −4.4 pp |
| 4K | −86% (0.16 vs 1.12 px) | −91% (0.58° vs 6.68°) | −87% (5.6 vs 43.5 mm) | −8.9 pp |
Note: SOTA Pure Tags and SOTA Checkerboard were not tested on the hub dataset. Soft decoding causes a precision collapse (10–22%) on EdLines due to the larger candidate set from background edges. ContourRdp + Soft on single isolated hub tags would likely restore precision; this can be added as
regression_hub_tag36h11_*_sota_pure_tagsif needed.
6. Latency Overview
All measurements: --release, single-threaded test runner (--test-threads=1), AMD EPYC-Milan 4c/8t.
ICRA 2020 (50 images each)
| Preset | Dataset | Total Latency | Per-Image |
|---|---|---|---|
| Production (ContourRdp + Erf + Hard) | forward/pure_tags | 164.4 ms | 3.3 ms |
| SOTA Pure Tags (ContourRdp + Erf + Soft) | forward/pure_tags | 70.8 ms | 1.4 ms |
| SOTA Metrology (EdLines + None + Hard) | forward/pure_tags | 106.6 ms | 2.1 ms |
| Legacy Checkerboard (4-conn + Hard) | checkerboard | 153.6 ms | 3.1 ms |
| SOTA Checkerboard (4-conn + Soft) | checkerboard | 103.2 ms | 2.1 ms |
Hub Single-Tag (Accurate pose, ~45–50 images each)
| Resolution | Production | SOTA Metrology | Speedup |
|---|---|---|---|
| 640×480 | 74.3 ms (45 img) | 53.9 ms | 1.4× |
| 720p | 116.6 ms (50 img) | 25.5 ms | 4.6× |
| 1080p | 106.6 ms (45 img) | 102.6 ms | 1.04× |
| 4K | 278.4 ms (45 img) | 182.2 ms | 1.5× |
The 720p speedup (4.6×) is the largest because EdLines + None skips ERF subpixel refinement
entirely — GN corners are directly sub-pixel — and the single clean-background tag means
very few quad candidates reach the Accurate pose step. The 1080p result is near-parity because
the full-resolution image has more candidate contours, so ContourRdp's cheaper rejection
offsets the GN advantage.
7. Which Preset to Use
| Scenario | Preset | Key metric |
|---|---|---|
| Single-tag metrology / calibration | sota_metrology_default() |
0.16–0.29px RMSE, 0.32–0.58° P50 rotation |
| Dense multi-tag detection | sota_pure_tags_default() |
96.2% recall (vs 76.9% production) |
| Touching-tag checkerboard grids | sota_checkerboard_default() |
91.4% recall (vs 73.0% legacy) |
| Balanced production | production_default() |
100% recall + precision, fast |
| Low latency | fast_default() |
Lowest decode overhead |
8. Methodology
- Hub data: Hugging Face
single_tag_locus_v1_tag36h11_*(45–50 images each). - ICRA data: ICRA 2020
forward/pure_tags_images(50 images),forward/checkerboard_corners_images(50 images). - Harness:
regression_render_tagandregression_icra2020,--release. - Snapshots:
regression_icra2020__*_sota_pure_tags.snapregression_icra2020__icra_forward_checkerboard_sota.snapregression_render_tag__hub_*_sota.snap- Decode investigation: Soft on EdLines/hub → 10–22% precision (rejected for metrology). Soft on ContourRdp/ICRA → maintains 100% precision on both ICRA scenarios.