Detection Pipeline
This document details the chronological data flow through the Locus detection pipeline. For memory layout and allocation strategy, see Memory Model. For the mathematical foundations of each solver, see Algorithms.
Pipeline Overview
The pipeline follows a strict sequential-then-parallel execution model. Each stage operates on the shared DetectionBatch (SoA) with well-defined read/write privileges.
flowchart LR
subgraph Preprocessing
Integral["Integral Image"]
Thresh["Adaptive Threshold"]
end
subgraph Segmentation
RLE["SIMD Fused RLE"]
LSL["Light-Speed Labeling"]
end
subgraph Geometry ["Math & Geometry Core"]
Homog["Homography (DLT)"]
ERF["ERF Sub-pixel"]
GWLF["GWLF Sub-pixel"]
IPPE["IPPE-Square"]
LM["LM Solvers"]
end
subgraph QuadExtraction ["Quad Extraction"]
Contour["Contour Tracing"]
DP["Douglas-Peucker"]
Refine["Corner Refinement"]
end
subgraph Decoding
Funnel["Fast-Path Funnel"]
Sample["Homography Sampling"]
EC["Error Correction"]
end
subgraph PoseEst ["Pose Estimation"]
PoseInit["IPPE Decomposition"]
PoseRefine["LM Refinement"]
end
Integral --> Thresh
Thresh --> RLE --> LSL
LSL --> Contour --> DP --> Refine
Refine -.->|ERF or GWLF| Geometry
Refine --> Funnel --> Sample --> EC
Sample -.->|DDA| Geometry
EC --> PoseInit --> PoseRefine
PoseInit -.->|IPPE| Geometry
PoseRefine -.->|LM| Geometry
Stage 0: Pre-allocation & Upscaling
At the start of each detect() call, the bump arena is reset in \(O(1)\) time, freeing all per-frame ephemeral data. If the input image is below the minimum resolution threshold, it is upscaled into a pre-allocated buffer.
Module: detector.rs
Stage 1: Preprocessing
Two linear passes over the image produce the binarized input for segmentation.
- Integral Image — A single-pass \(O(N)\) scan computes the prefix-sum table used by the adaptive thresholder.
- Adaptive Threshold — Each pixel is compared against the local mean computed from the integral image over a configurable tile size. The output is a binary image where foreground pixels indicate potential tag edges.
Module: threshold.rs | Complexity: \(O(N)\) | Typical Latency: ~0.9 ms (720p)
Stage 2: Segmentation (SIMD Fused)
Connected components labeling identifies contiguous foreground regions.
- Run-Length Encoding (RLE) — A SIMD-accelerated pass (
multiversiondispatch to AVX2/NEON) extracts horizontal runs from the binary image. - Light-Speed Labeling (LSL) — A single-pass 1D algorithm resolves equivalences between runs using a union-find structure, producing labeled components with bounding-box statistics.
Components below the minimum area threshold are discarded immediately.
Module: simd_ccl_fusion.rs, segmentation.rs | Complexity: \(O(N)\) | Typical Latency: ~0.5 ms (720p)
Stage 3: Quad Extraction
For each surviving component, the pipeline extracts and refines quadrilateral candidates.
3a. Contour Tracing
A boundary-following algorithm traces the outer contour of each labeled component, producing an ordered chain of pixel coordinates.
3b. Polygon Approximation (Douglas-Peucker)
The contour is simplified to a polygon. Only quadrilaterals (exactly 4 vertices after simplification) survive this gate.
3c. Corner Refinement
Sub-pixel corner localization is applied via one of two strategies (selected by CornerRefinementMode):
| Mode | Algorithm | Module |
|---|---|---|
| ERF | Gauss-Newton fit of a PSF-blurred step function along edge normals. | edge_refinement.rs |
| GWLF | Gradient-weighted orthogonal distance regression (PCA) with algebraic line intersection in homogeneous coordinates. Propagates full \(3 \times 3\) line covariances to \(2 \times 2\) corner covariances. | gwlf.rs |
Both strategies share the centralized erf_approx / erf_approx_v4 SIMD kernels from simd::math.
Module: quad.rs, edge_refinement.rs, gwlf.rs | Complexity: \(O(K \cdot M)\) | Typical Latency: ~1.5 ms (720p, 50 tags)
Stage 4: Homography & Decoding
4a. Homography Pass
For each active quad, a square_to_quad homography is computed mapping the canonical unit square \([(-1,-1), (1,-1), (1,1), (-1,1)]\) to the detected image corners. This is a batch SoA operation parallelized via rayon.
Module: homography.rs
4b. Fast-Path Funnel
Before expensive bit sampling, an \(O(1)\) contrast gate rejects candidates lacking photometric evidence of a tag edge. The gate samples a small number of pixels along each edge using the DDA (Digital Differential Analyzer) and checks for sufficient intensity contrast.
Module: funnel.rs
4c. Bit Sampling & Decoding
For surviving candidates, the homography DDA incrementally generates sample coordinates in image space. Two strategies are available:
| Strategy | Mechanism | Module |
|---|---|---|
| Hard-Decision | Direct intensity thresholding. \(O(1)\) dictionary lookup via precomputed tables. | decoder.rs, strategy.rs |
| Soft-Decision | Log-Likelihood Ratio (LLR) extraction with Maximum Likelihood search via Multi-Index Hashing (MIH). Sub-linear dictionary search. | decoder.rs, strategy.rs |
Sampling uses SIMD-accelerated bilinear interpolation with hardware reciprocal approximation (rcp_nr) and vectorized FMA. The ROI caching system copies small-tag regions to contiguous stack buffers (or arena buffers for large tags) to minimize L1 cache misses.
Module: decoder.rs | Complexity: \(O(Q)\) | Typical Latency: ~10.0 ms (720p, 50 tags)
Stage 5: Pose Estimation (Optional)
When camera intrinsics are provided, a 6-DOF pose is recovered for each valid tag.
5a. IPPE Decomposition
The homography is normalized by the inverse intrinsics matrix and decomposed via the IPPE-Square analytical solution, yielding two candidate poses. The candidate with lower reprojection error is selected.
5b. Levenberg-Marquardt Refinement
The selected pose is refined iteratively. Two modes are available:
| Mode | Objective | Uncertainty | Module |
|---|---|---|---|
| Fast | Huber-robust geometric reprojection error (pixel distance). | None. | pose.rs |
| Accurate | Huber-robust Mahalanobis distance using per-corner \(2 \times 2\) information matrices from the Structure Tensor or GWLF covariance propagation. | Full \(6 \times 6\) pose covariance (Cramer-Rao bound). | pose_weighted.rs |
Both solvers use Nielsen trust-region scheduling with Marquardt diagonal scaling.
Module: pose.rs, pose_weighted.rs | Complexity: \(O(V)\) | Typical Latency: ~0.05-0.2 ms per tag
End-to-End Sequence
sequenceDiagram
participant App as Application
participant Det as Detector
participant Thresh as ThresholdEngine
participant Seg as Segmentation
participant Quad as QuadExtraction
participant Geom as Math & Geometry Core
participant Decode as Decoder
participant Pose as PoseEstimator
App->>Det: detect(image)
activate Det
Note over Det: Arena Reset (O(1))
Det->>Thresh: compute_integral_image()
Det->>Thresh: adaptive_threshold()
Thresh-->>Det: Binarized Image
Det->>Seg: SIMD Fused RLE + LSL
Seg-->>Det: Component Labels & Stats
loop For each component
Det->>Quad: Contour Trace + Douglas-Peucker
Quad->>Geom: Corner Refinement (ERF or GWLF)
Geom-->>Quad: Sub-pixel Corners
end
Quad-->>Det: Quad Candidates
Det->>Geom: Batch Homography Pass (rayon)
Geom-->>Det: Homographies [SoA]
Det->>Decode: Fast-Path Funnel (O(1) contrast gate)
loop For each surviving candidate
Decode->>Geom: DDA Coordinate Generation
Geom-->>Decode: Sample Coordinates
Decode->>Decode: Bit/LLR Extraction + Error Correction
end
opt If Intrinsics Provided
Det->>Pose: Batch Pose Estimation
Pose->>Geom: IPPE Decomposition
Pose->>Geom: LM Refinement
Geom-->>Pose: Refined Poses
end
Det-->>App: DetectionBatch
deactivate Det
Performance Budget
| Stage | Complexity | Latency (50 Tags, 720p) | Notes |
|---|---|---|---|
| Preprocessing | \(O(N)\) | ~0.9 ms | Adaptive thresholding + Integral Image. |
| Segmentation | \(O(N)\) | ~0.5 ms | SIMD Fused RLE + Light-Speed Labeling. |
| Quad Extraction | \(O(K \cdot M)\) | ~1.5 ms | SoA extraction + sub-pixel refinement. |
| Decoding (Hard) | \(O(Q)\) | ~10.0 ms | SoA math pass; SIMD bilinear sampling. |
| Pose Refinement | \(O(V)\) | ~0.2 ms | Partitioned solver (valid tags only). |
Total: ~14.5 ms for 50 tags (720p) on a modern desktop CPU (e.g., Zen 4).