2160p render_tag_hub recall lift (2026-04-25)
PR #200 (feat-render-tag-sota) shipped the render_tag_hub profile and
landed 100 % recall at 480p / 720p / 1080p on the
locus_v1_tag36h11_<res> Hub render-tag subsets — but at 2160p
recall stalled at 47/50 (94 %) for both high_accuracy and
render_tag_hub. The PR analysis dismissed the gap as
"pre-EdLines (segmentation fragments large tags into multiple
components — out of scope for this profile)."
This branch (feat-roi-rescue) ran the gap to ground. The original
diagnosis was wrong: segmentation was fine. The actual failure
was an extract_quads_soa ordering bug that surfaces only when the
number of geometrically valid quads exceeds the SoA capacity. After
a one-line fix, all three failing scenes decode cleanly and 2160p
joins the lower resolutions at 50/50 (100 %) recall, 1.0
precision.
§1 Verified hardware metadata
$ lscpu
Architecture: x86_64
Vendor ID: AuthenticAMD
Model name: AMD EPYC-Milan Processor
CPU(s): 8 (4 cores × 2 threads)
Hypervisor: KVM
Virtualization type: full
Flags include: avx2, fma, sha_ni, avx512* not present
$ uname -r
6.8.0-107-generic
Build profile: --release with --features bench-internals. Rust
toolchain pinned via rust-toolchain.toml. Default Rayon thread
count (8). All numbers below were captured in the same
session with LOCUS_HUB_DATASET_DIR=tests/data/hub_cache.
§2 Diagnosis
§2.1 The pre-fix theory (incorrect)
The plan in /home/dev/.claude/plans/toasty-booping-lemon.md
hypothesised that 2160p tag interiors fragmented during CCL because
the threshold-tile invalid propagation only travels ~16 px (2
iterations × 8 px tile size), while a 540 px tag interior at 4K
needs O(50) hops to reach the centre. The plan proposed Phase-2
A/B candidates: iterate-until-converged propagation, larger tile
size, opt-in decimation.
§2.2 What the diagnostic test actually showed
crates/locus-core/tests/diagnose_render_tag_2160p.rs runs threshold
+ LSL on each failing scene and reports the largest CCL component
that overlaps the GT bbox. For all three scenes the largest component
covered ≥ 90 % of the GT bbox, had geometric area / aspect / fill
ratios within the production gates, and would have been accepted
downstream. Segmentation was not the bottleneck.
Iterate-until-converged propagation (Phase-2 candidate 1) was
implemented and benchmarked; it lifted scene 0006 fill from 0.25 to
0.41 but did not move the needle on detector recall. Tile-size 16
(candidate 3) lifted scene 0033 fill to 0.70 but again no recall
delta. Opt-in decimation: 2 (Phase 3) rescued scene 0006 alone but
collapsed the rest of the 2160p subset (47/50 → 1/50) because most
other tags shrank below the EdLines viable size.
The Phase 2 sweep was followed by env-gated traces inside
edlines.rs::run_pipeline_with_mode and corresponding edge-score /
funnel checks. Every trace showed EdLines returning Success for
scenes 0006 and 0026 with reasonable corner geometry. The candidate
quads were being produced — they were just disappearing before
decode.
§2.3 Root cause (correct)
The SoA DetectionBatch has a hard ceiling
(pub(crate) const MAX_CANDIDATES: usize = 1024; in
crates/locus-core/src/batch.rs). At 2160p, EdLines accepts
more candidate quads than ContourRdp because its geometric gates
are more permissive (smaller noise components survive). The total
count of valid quads exceeds 1024 on tag-bearing scenes.
The original extract_quads_soa collected components in raster
order (top-to-bottom, left-to-right — LSL's natural label
assignment) and then truncated:
for (i, …) in detections.into_iter().take(n).enumerate() { … }
Where n = detections.len().min(MAX_CANDIDATES). Tags placed in the
mid-to-lower image region had high label indices; when the small
noise components above them filled the first 1024 slots, the tag
candidates were silently dropped before the funnel/decoder ran.
Lower resolutions never tripped this because they produced fewer
candidate quads than the ceiling.
§2.4 Fix
A new helper pixel_count_descending_order(arena, stats) in
crates/locus-core/src/quad.rs returns the top-MAX_CANDIDATES
(pixel_count, label_idx) pairs in descending order. Both
extract_quads_soa (the rectified path) and
extract_quads_soa_with_camera (the non_rectified distortion
path) now drive their parallel extraction loop off this order
rather than off stats.par_iter().enumerate().
Truncation at MAX_CANDIDATES therefore drops the smallest blobs
(noise) rather than the largest (tag candidates).
Three principal-engineer-tier refinements over the naïve fix:
- Bump-allocated buffer. The order array is a
bumpalo::Vecsourced from the per-frameFrameContext.arena. The hot path does not touch the system allocator. The added&Bumpparameter on both extraction entry points threads the existingstate.arena(split-borrow against&mut state.batch). select_nth_unstable_byfor top-K. Whenlen > MAX_CANDIDATESthe helper partitions in O(n) before sorting only the surviving prefix in O(k log k), avoiding the full O(n log n) sort over noise tails at 4K.- Pair packing.
(pixel_count, label_idx)is laid out inline so comparators stay inside one cache-resident slice instead of double-chasing intostats[i].pixel_countper compare.
Sort-descending order on the prefix is load-bearing for downstream snapshot stability — the funnel/decoder dedup is processing-order sensitive.
This change is global — it applies to every profile, not just
render_tag_hub, and to both the rectified and
distortion-aware extraction paths. The cross-dataset sweep (§4)
confirms it does no harm anywhere else.
§3 Recall / precision deltas
§3.1 render_tag_hub (the target profile)
| Resolution | Recall (pre) | Recall (post) | Precision | RMSE | Repro RMSE | r p90 (°) | t p90 (m) |
|---|---|---|---|---|---|---|---|
| 480p | 50/50 | 50/50 | 1.000 | 0.180 | 0.219 | 0.087 | 0.0024 |
| 720p | 50/50 | 50/50 | 1.000 | 0.183 | 0.193 | 0.115 | 0.0036 |
| 1080p | 50/50 | 50/50 | 1.000 | 0.181 | 0.181 | 0.158 | 0.0055 |
| 2160p | 47/50 | 50/50 | 1.000 | 0.180 | 0.162 | 0.416 | 0.0116 |
Lower resolutions are byte-identical (same snapshots, no rebind). 2160p went from 0.94 → 1.00 recall, precision held at 1.0. The slight uptick in r p90 / RMSE is expected: scene 0033 has fill 0.14 (an unusual oblique-grazing render) and contributes more pose noise than the easier scenes already in the population.
§3.2 high_accuracy (default profile, unchanged config)
| Resolution | Recall (pre) | Recall (post) | Precision |
|---|---|---|---|
| 2160p | 47/50 | 50/50 | 1.000 |
The fix lifts the default profile too — same root cause, same remedy. No JSON or config touched.
§4 Cross-dataset no-regression sweep
Running every regression suite the project ships, per
docs/engineering/quality-gates.md § 2 and the plan §Verification
"critical" requirement (the fix is global, not profile-scoped):
| Suite | Tests | Result | Notes |
|---|---|---|---|
regression_render_tag (tag36h11) |
15 | pass | 2 snapshots rebound — both at 2160p, both recall ↑ |
regression_render_tag_robustness |
7 | pass | 2 tag16h5 1080p snapshots rebound — precision 0.276 → 0.311 (+3.5 pp), recall 1.0 unchanged |
regression_distortion_hub (Brown) |
1 | pass | snapshot rebound — recall 0.929 → 0.935 (+0.6 pp), precision 0.994 unchanged |
regression_distortion_hub (Kannala) |
1 | pass | unchanged |
regression_board_hub |
4 | pass | unchanged |
regression_icra2020 |
16 | pass | unchanged |
diagnose_render_tag_2160p |
1 | pass | new — pins the contract |
contract_detection_batch |
7 | pass | unchanged |
test_quad_soa |
1 | pass | unchanged |
Every snapshot delta is strictly an improvement (recall up, precision unchanged or improved). The fix has no observed adverse interaction across datasets.
§5 SOTA comparison at 2160p
External baselines captured 2026-04-25 via tools/cli.py bench
real --hub-config locus_v1_tag36h11_3840x2160 --compare:
| Detector | Recall | Precision | t mean | t p99 | r mean | r p99 | Latency |
|---|---|---|---|---|---|---|---|
OpenCV cv2.aruco |
100 % | 98.04 % | 34 mm | 408 mm | 0.171° | 1.203° | 165 ms |
| AprilTag-C (pupil) | 100 % | 100 % | 16 mm | 101 mm | 2.666° | 65.716° | 143 ms |
Locus high_accuracy |
100 % | 100 % | 17 mm | 121 mm | 7.134° | 108.494° | 73 ms |
Locus render_tag_hub (snapshot) |
100 % | 100 % | — | — | — | — | — |
The CLI runs the default high_accuracy profile; render_tag_hub
numbers come from the rebound regression snapshot
(regression_render_tag__common__hub__hub_locus_v1_tag36h11_3840x2160_render_tag_hub.snap).
Locus is the latency leader at every 2160p configuration tested,
matches AprilTag-C on the recall+precision floor, and trails
OpenCV's tighter rotation distribution at 4K — a known artefact of
EdLines pose ambiguity on extreme grazing-angle synthetic renders
(the same effect that drives r p99 in the 1080p comparison; see
render_tag_sota_20260425.md §1 for the discussion).
§6 What shipped
crates/locus-core/src/quad.rs— pixel-count-descending iteration inextract_quads_soa(the fix).crates/locus-core/tests/diagnose_render_tag_2160p.rs— newbench-internalsregression that asserts the geometric gates pass and the detector returns ≥ 1 detection on each previously failing scene.- Five rebound snapshots:
regression_render_tag__…__3840x2160.snap(accuracy_baseline)regression_render_tag__…__3840x2160_render_tag_hub.snapregression_distortion_hub__…__brown_conrady_v1_1920x1080.snapregression_render_tag_robustness__…__tag16h5_1920x1080.snapregression_render_tag_robustness__…__tag16h5_1920x1080_tuned.snap- This document.
No JSON profile, schema, or Python binding was touched. The fix is ~5 lines of Rust plus a comment.
§7 Exit criteria status
- ✅
render_tag_hub2160p recall = 50/50 (100 %). - ✅ 480p / 720p / 1080p
render_tag_hubrecall stays at 100 %; precision stays at 1.00 across all 4 resolutions. - ✅ No regression on robustness, distortion, board, ICRA suites (one snapshot rebound, recall improved).
- ✅ Diagnostic regression test asserts each previously-failing scene now decodes.
- ✅ Hardware metadata + verification log recorded above.
The non-goal identified in the plan ("scene 0033 — content-intrinsic decoder margin") was incidentally resolved: 0033 was failing for the same MAX_CANDIDATES truncation reason as 0006/0026, not for any decoder-margin reason.