SOTA Configurations — Performance Report

Date: 2026-03-21 Build: --release, default thread count Hardware: AMD EPYC-Milan, 4 cores / 8 threads, L3 32 MiB, Linux 6.8.0-101

1. What Changed

Three additions enable scenario-specific SOTA presets:

Addition	Description	Commit
GN covariance propagation	`cholesky_inverse_8x8` extracts per-corner 2×2 covariances from the GN Hessian H⁻¹; threaded through `extract_quad_edlines` → `batch.corner_covariances`	`31f247a`
Builder setters	`huber_delta_px`, `tikhonov_alpha_max`, `sigma_n_sq`, `structure_tensor_radius` tunable via builder	this session
Three SOTA presets	`sota_metrology_default`, `sota_pure_tags_default`, `sota_checkerboard_default`	this session

Architecture: The GN→Pose Handover (Metrology preset only)

Before:  GN corners → batch.corners → pose: Structure Tensor fallback
         H discarded ↗

After:   GN corners + σ²·H⁻¹ blocks → batch.corner_covariances → Weighted LM (direct)

Per-corner covariance Σₖ = σ²·H⁻¹[2k:2k+2, 2k:2k+2], σ² = Σr²/(n_obs−8). If GN diverges, corner_covariances is zeroed and pose falls back to the Structure Tensor.

2. The Three SOTA Presets

Preset	Target Scenario	Key Differences vs Production
`sota_metrology_default()`	Single isolated tag, pose accuracy	EdLines + GN + None; sharpening off; Hard decode
`sota_pure_tags_default()`	Dense multi-tag scenes	Soft decode (+19pp recall); else identical to production
`sota_checkerboard_default()`	Touching tags in grid patterns	4-connectivity; Soft decode; relaxed contrast/edge gates; sharpening off

Why three separate presets?

The three target scenarios have mutually incompatible requirements:

Metrology needs the lowest possible corner RMSE. This requires EdLines + GN corners (never post-processed), but the missing sharpening and None refinement hurt recall on multi-tag images where small/distant tags are marginal.
Pure tags needs maximum recall without harming precision. Soft decoding is the dominant lever (+19pp vs Hard), but it causes a precision collapse (~10–20%) when combined with EdLines (which produces many more quad candidates from background edges). ContourRdp + Soft is the stable choice.
Checkerboard has non-negotiable topological constraints (4-connectivity, relaxed contrast/edge thresholds) that actively hurt performance on isolated tags. Soft decoding was untested for this scenario; it proved equally effective (+18.4pp recall).

Preset parameters

`sota_metrology_default()`

quad_extraction_mode:  EdLines
refinement_mode:       None        ← GN corners are sub-pixel; ERF degrades them
enable_sharpening:     false       ← pass raw PSF directly to the solver
enable_bilateral:      false
quad_max_elongation:   20.0
quad_min_density:      0.15
decode_mode:           Hard        ← Soft causes precision collapse on EdLines

`sota_pure_tags_default()`

refinement_mode:       Erf         ← same as production
enable_sharpening:     true        ← same as production
quad_max_elongation:   20.0        ← same as production
quad_min_density:      0.15        ← same as production
decode_mode:           Soft        ← only difference; +19pp recall on ICRA forward

`sota_checkerboard_default()`

refinement_mode:       Erf
enable_sharpening:     false       ← sharpening creates halos at shared borders
segmentation_connectivity: Four    ← separates touching tag borders (non-negotiable)
decoder_min_contrast:  10.0        ← relaxed for low-contrast packed tags
quad_min_edge_score:   2.0         ← relaxed for weaker interior-border edge scores
quad_max_elongation:   20.0
quad_min_density:      0.15
decode_mode:           Soft        ← extends recall on low-contrast packed tags

3. ICRA 2020 — Pure Tags (multi-tag isolated, `forward/pure_tags_images`)

50 images, ~8 tags/frame average. Ground-truth convention remapped from UMich CCW to Locus CW.

3.1 Fixtures (CI gold standard, 1 image)

Config	Recall	RMSE
Production (ContourRdp + Erf + Hard)	100%	0.131 px
ContourRdp + Soft	100%	0.131 px
EdLines + Erf + Hard	100%	0.071 px
SOTA Metrology (EdLines + None + Hard, no sharp)	74.0%	0.713 px
SOTA Pure Tags (ContourRdp + Erf + Soft)	100%	0.131 px

SOTA Pure Tags matches production recall exactly on the fixture and inherits the same RMSE — the only change (Soft decode) has no effect on a clean high-contrast image.

3.2 Forward Dataset (50 images)

Config	Recall	RMSE	Total Latency
Production (ContourRdp + Erf + Hard)	76.9%	0.274 px	164.4 ms
GWLF	65.6%	0.545 px	179.3 ms
EDLines + Erf + Hard	71.1%	0.254 px	196.8 ms
SOTA Metrology (EdLines + None + Hard, no sharp)	46.3%	0.754 px	106.6 ms
SOTA Pure Tags (ContourRdp + Erf + Soft)	96.2%	0.315 px	70.8 ms

+19.3pp recall vs production at a modest +15% RMSE cost, with a −57% latency reduction. Soft decode's MIH search is branch-limited — on this dataset it terminates early more often than Hard decode's full threshold pass, making it faster despite the increased code complexity.

4. ICRA 2020 — Checkerboard (touching tags, `forward/checkerboard_corners_images`)

50 images, dense tag grids where adjacent tag borders share a pixel boundary.

4.1 Fixtures (same 0037.png — contains both isolated and touching tags)

Config	Recall	RMSE
Production (ContourRdp + Erf + Hard)	100%	0.131 px
Legacy Checkerboard (4-conn + Hard, no sharp)	100%	0.131 px
SOTA Checkerboard (4-conn + Soft, no sharp)	100%	0.144 px

4.2 Forward Checkerboard Dataset (50 images)

Config	Recall	RMSE	Total Latency
Production (ContourRdp + Erf + Hard)	—	—	—
Legacy Checkerboard (4-conn + Hard, no sharp)	73.0%	0.332 px	153.6 ms
SOTA Checkerboard (4-conn + Soft, no sharp)	91.4%	0.458 px	103.2 ms

+18.4pp recall vs the legacy checkerboard preset (+25pp vs production), with a −33% latency reduction over legacy. Soft decoding proves equally effective on touching tags as on isolated tags, confirming the hypothesis. RMSE increases by +38% — acceptable for detection tasks where tag identity and pose (rather than sub-pixel corner precision) are the primary outputs.

5. Hub Dataset — Single Isolated Tag (AprilTag 36h11, `PoseEstimationMode::Accurate`)

5.1 Summary

Resolution	Config	Recall	Precision	Corner RMSE	Repro RMSE	Trans P50	Rot P50	Total Latency
640×480	Production	100%	100%	0.994 px	—	4.1 mm	1.29°	74.3 ms
	GWLF	97.8%	100%	0.718 px	—	3.5 mm	0.25°	—
	SOTA Metrology	93.3%	100%	0.173 px	0.440 px	1.0 mm	0.32°	53.9 ms
720p	Production	100%	100%	0.933 px	—	5.8 mm	2.20°	116.6 ms
	GWLF	100%	100%	0.751 px	—	3.9 mm	0.31°	—
	SOTA Metrology	96.0%	100%	0.277 px	1.906 px	1.0 mm	0.35°	25.5 ms
1080p	Production	100%	100%	1.146 px	—	10.7 mm	2.24°	106.6 ms
	GWLF	100%	100%	0.928 px	—	4.9 mm	0.31°	—
	SOTA Metrology	95.6%	97.8%	0.291 px	2.142 px	1.9 mm	0.34°	102.6 ms
4K	Production	97.8%	100%	1.116 px	—	43.5 mm	6.68°	278.4 ms
	GWLF	97.8%	100%	0.829 px	—	9.9 mm	0.97°	—
	SOTA Metrology	88.9%	100%	0.157 px	1.690 px	5.6 mm	0.58°	182.2 ms

Latency = total test time for 45–50 images including Accurate pose estimation. Hardware: AMD EPYC-Milan, 4 cores / 8 threads.

5.2 SOTA Metrology vs Production (Hub)

Resolution	Corner RMSE Δ	Rot P50 Δ	Trans P50 Δ	Recall Δ
640×480	−83% (0.17 vs 0.99 px)	−75% (0.32° vs 1.29°)	−76% (1.0 vs 4.1 mm)	−6.7 pp
720p	−70% (0.28 vs 0.93 px)	−84% (0.35° vs 2.20°)	−83% (1.0 vs 5.8 mm)	−4.0 pp
1080p	−75% (0.29 vs 1.15 px)	−85% (0.34° vs 2.24°)	−82% (1.9 vs 10.7 mm)	−4.4 pp
4K	−86% (0.16 vs 1.12 px)	−91% (0.58° vs 6.68°)	−87% (5.6 vs 43.5 mm)	−8.9 pp

Note: SOTA Pure Tags and SOTA Checkerboard were not tested on the hub dataset. Soft decoding causes a precision collapse (10–22%) on EdLines due to the larger candidate set from background edges. ContourRdp + Soft on single isolated hub tags would likely restore precision; this can be added as regression_hub_tag36h11_*_sota_pure_tags if needed.

6. Latency Overview

All measurements: --release, single-threaded test runner (--test-threads=1), AMD EPYC-Milan 4c/8t.

ICRA 2020 (50 images each)

Preset	Dataset	Total Latency	Per-Image
Production (ContourRdp + Erf + Hard)	forward/pure_tags	164.4 ms	3.3 ms
SOTA Pure Tags (ContourRdp + Erf + Soft)	forward/pure_tags	70.8 ms	1.4 ms
SOTA Metrology (EdLines + None + Hard)	forward/pure_tags	106.6 ms	2.1 ms
Legacy Checkerboard (4-conn + Hard)	checkerboard	153.6 ms	3.1 ms
SOTA Checkerboard (4-conn + Soft)	checkerboard	103.2 ms	2.1 ms

Hub Single-Tag (Accurate pose, ~45–50 images each)

Resolution	Production	SOTA Metrology	Speedup
640×480	74.3 ms (45 img)	53.9 ms	1.4×
720p	116.6 ms (50 img)	25.5 ms	4.6×
1080p	106.6 ms (45 img)	102.6 ms	1.04×
4K	278.4 ms (45 img)	182.2 ms	1.5×

The 720p speedup (4.6×) is the largest because EdLines + None skips ERF subpixel refinement entirely — GN corners are directly sub-pixel — and the single clean-background tag means very few quad candidates reach the Accurate pose step. The 1080p result is near-parity because the full-resolution image has more candidate contours, so ContourRdp's cheaper rejection offsets the GN advantage.

7. Which Preset to Use

Scenario	Preset	Key metric
Single-tag metrology / calibration	`sota_metrology_default()`	0.16–0.29px RMSE, 0.32–0.58° P50 rotation
Dense multi-tag detection	`sota_pure_tags_default()`	96.2% recall (vs 76.9% production)
Touching-tag checkerboard grids	`sota_checkerboard_default()`	91.4% recall (vs 73.0% legacy)
Balanced production	`production_default()`	100% recall + precision, fast
Low latency	`fast_default()`	Lowest decode overhead

8. Methodology

Hub data: Hugging Face single_tag_locus_v1_tag36h11_* (45–50 images each).
ICRA data: ICRA 2020 forward/pure_tags_images (50 images), forward/checkerboard_corners_images (50 images).
Harness: regression_render_tag and regression_icra2020, --release.
Snapshots:
regression_icra2020__*_sota_pure_tags.snap
regression_icra2020__icra_forward_checkerboard_sota.snap
regression_render_tag__hub_*_sota.snap
Decode investigation: Soft on EdLines/hub → 10–22% precision (rejected for metrology). Soft on ContourRdp/ICRA → maintains 100% precision on both ICRA scenarios.