Algorithms & Mathematical Foundations
This document provides rigorous mathematical descriptions of the core solvers in Locus. For the pipeline data flow, see Pipeline. For memory layout, see Memory Model.
1. Homography Estimation (DLT)
Module: homography.rs
A homography \(\mathbf{H} \in \mathbb{R}^{3 \times 3}\) is the projective transformation mapping points from canonical tag space to image pixels:
where \(\tilde{\mathbf{p}} = [x, y, 1]^T\) are homogeneous coordinates.
1.1 General DLT (from_pairs)
Given 4 point correspondences \((\mathbf{s}_i, \mathbf{d}_i)\), the Direct Linear Transform constructs an \(8 \times 9\) matrix \(\mathbf{A}\) where each correspondence contributes two rows:
Setting \(h_{33} = 1\) (dehomogenization), the remaining 8 parameters are solved via LU decomposition of the reduced \(8 \times 8\) system:
A post-hoc reprojection check validates \(\|\mathbf{H} \tilde{\mathbf{s}}_i - \tilde{\mathbf{d}}_i\|^2 < 10^{-4}\) for all 4 points, rejecting degenerate configurations.
1.2 Optimized Square-to-Quad (square_to_quad)
When the source points are the canonical unit square \(\{(-1,-1), (1,-1), (1,1), (-1,1)\}\), the coefficient matrix \(\mathbf{M}\) has a fixed sparsity pattern. The implementation hardcodes these coefficients, eliminating the generic source-point loop while producing an identical \(8 \times 8\) LU solve.
1.3 Digital Differential Analyzer (DDA)
For grid-based sampling (decoding), recomputing \(\mathbf{H} \tilde{\mathbf{p}}\) per pixel is wasteful. The DDA exploits the linearity of the homogeneous numerators:
Stepping by \((\Delta u, \Delta v)\) in tag space requires only 3 additions per pixel:
The perspective divide \(\mathbf{p}_{\text{img}} = (n_x/d, \; n_y/d)\) is computed using a hardware reciprocal approximation (rcp_nr) with one Newton-Raphson refinement step for ~23-bit accuracy.
2. ERF Sub-pixel Edge Refinement
Module: edge_refinement.rs
2.1 Intensity Model
The observed intensity along the normal to a tag edge is modeled as a PSF-blurred step function:
where:
- \(A, B\) are the dark and light intensities on either side of the edge.
- \(d\) is the signed perpendicular distance from the sample point to the edge line.
- \(\sigma\) is the Gaussian blur parameter (PSF width).
- \(\operatorname{erf}(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} dt\) is the error function.
2.2 Edge Parameterization
The edge line is parameterized by its perpendicular offset \(\rho\) from an initial estimate:
where \(\mathbf{n}\) is the unit normal to the edge and \(\mathbf{p}_0\) is a point on the initial edge estimate.
2.3 Gauss-Newton Optimization
The scalar parameter \(\rho\) is refined by minimizing the sum of squared residuals:
The Jacobian for each sample is:
The Gauss-Newton update is:
with convergence when \(|\Delta\rho| < 10^{-4}\) or after 15 iterations.
2.4 Configuration Variants
| Variant | A/B Estimation | Min Contrast | Use Case |
|---|---|---|---|
| Quad-style | Once before GN loop | None | Quad corner refinement |
| Decoder-style | Per iteration | Early exit if $ | B-A |
The erf_approx function is shared across both variants and has a SIMD-vectorized 4-wide version (erf_approx_v4) for batch evaluation.
3. Gradient-Weighted Line Fitting (GWLF)
Module: gwlf.rs
3.1 Moment Accumulation
For each of the 4 quad edges, image gradient magnitudes serve as weights in a spatial moment accumulation:
The \(2 \times 2\) gradient-weighted covariance matrix is:
3.2 Line Fitting via PCA
The edge direction is the eigenvector of \(\mathbf{C}\) corresponding to \(\lambda_{\max}\) (the tangent direction). The eigenvector corresponding to \(\lambda_{\min}\) gives the edge normal \(\mathbf{n}\).
The line is represented in homogeneous form \(\mathbf{l} = [n_x, n_y, d]^T\) where \(d = -(n_x \bar{x} + n_y \bar{y})\).
The eigendecomposition of the \(2 \times 2\) symmetric matrix is solved analytically:
3.3 Line Covariance Propagation
The \(3 \times 3\) covariance \(\boldsymbol{\Sigma}_{\mathbf{l}}\) of the homogeneous line parameters is propagated through the PCA fitting procedure, encoding the uncertainty in the edge direction and offset.
3.4 Corner as Line Intersection
The refined corner is the intersection of two adjacent homogeneous lines:
The covariance of the homogeneous intersection point is propagated via the cross-product Jacobians:
where \([\mathbf{l}]_\times\) is the skew-symmetric cross-product matrix.
3.5 Perspective Division to Affine Coordinates
The Cartesian corner \(\mathbf{c} = (\tilde{c}_x / \tilde{c}_w, \; \tilde{c}_y / \tilde{c}_w)\) and its \(2 \times 2\) covariance are obtained via the Jacobian of perspective division:
This \(2 \times 2\) corner covariance feeds directly into the Accurate mode pose estimator as per-corner uncertainty.
3.6 Adaptive Transversal Windowing
The perpendicular search band for gradient sampling scales with edge length \(L\):
This prevents over-smoothing on short edges while capturing sufficient gradient evidence on long edges.
4. IPPE-Square Pose Estimation
Module: pose.rs
4.1 Homography Normalization
The pixel-space homography \(\mathbf{H}_\text{pixel}\) from square_to_quad is normalized by the inverse intrinsics:
The metric homography accounting for tag size \(s\) and the modern OpenCV top-left origin convention:
4.2 Jacobian SVD Decomposition
The Jacobian \(\mathbf{J} = [\mathbf{h}_1, \mathbf{h}_2]\) encodes the image-plane stretch. Its \(2 \times 2\) Gram matrix:
has eigenvalues \(\sigma_1^2, \sigma_2^2\) (singular values squared) computed analytically:
4.3 Dual Pose Solutions
IPPE-Square produces two candidate poses from the SVD, corresponding to the two minima of the planar PnP error surface (the "Necker reversal" ambiguity). The candidate with the lower reprojection error is selected.
Frontal degeneracy: When \(|\sigma_1 - \sigma_2| < 10^{-4} \sigma_1\), the two solutions collapse. In this case, Gram-Schmidt orthonormalization of \([\mathbf{h}_1, \mathbf{h}_2, \mathbf{h}_1 \times \mathbf{h}_2]\) produces the rotation, and the translation scale \(\gamma = (\sigma_1 + \sigma_2)/2\).
4.4 Orthogonalization
The raw rotation estimate from the SVD is projected onto \(SO(3)\) via polar decomposition:
ensuring \(\det(\mathbf{R}) = +1\).
5. Levenberg-Marquardt Pose Refinement
5.1 Fast Mode (Geometric Error)
Module: pose.rs | Function: refine_pose_lm
Objective
Minimize Huber-robust geometric reprojection error:
where \(\mathbf{r}_i = \mathbf{p}_i^\text{obs} - \pi(\mathbf{R} \mathbf{P}_i + \mathbf{t})\) is the 2D reprojection residual and \(\pi\) is the pinhole projection.
Huber Loss
Applied per corner as an IRLS weight: \(w_i = \min(1, \; \delta / \|\mathbf{r}_i\|)\).
Jacobian Structure
The \(2 \times 6\) Jacobian for corner \(i\) with respect to the \(\mathfrak{se}(3)\) twist \(\boldsymbol{\xi} = [\mathbf{t}, \boldsymbol{\omega}]^T\):
where:
Normal Equations with Marquardt Scaling
Marquardt diagonal scaling damps each DOF proportionally to its own curvature, correcting the scale mismatch between translational and rotational gradient magnitudes. Solved via Cholesky decomposition.
Trust-Region (Nielsen Schedule)
- Accept: Gain ratio \(\rho = \text{actual}/\text{predicted} > 0\) triggers \(\lambda \leftarrow \lambda \cdot \max(1/3, \; 1 - (2\rho-1)^3)\), \(\nu \leftarrow 2\).
- Reject: \(\lambda \leftarrow \lambda \cdot \nu\), \(\nu \leftarrow 2\nu\).
Manifold Update
The \(\mathfrak{se}(3)\) increment is applied via the exponential map:
The rotation update uses UnitQuaternion::from_scaled_axis for numerically stable \(SO(3)\) integration.
Convergence
- Gradient convergence: \(\|\mathbf{J}^T \mathbf{W} \mathbf{r}\|_\infty < 10^{-8}\)
- Step convergence: \(\|\boldsymbol{\delta}\| < 10^{-8}\)
- Maximum 20 iterations (typically exits in 3-6).
5.2 Accurate Mode (Mahalanobis Distance)
Module: pose_weighted.rs | Function: refine_pose_lm_weighted
Objective
Minimize Huber-robust Mahalanobis distance using per-corner information matrices:
where:
is the Mahalanobis distance and \(\mathbf{W}_i = \boldsymbol{\Sigma}_i^{-1}\) is the information matrix (inverse of the \(2 \times 2\) corner covariance from the Structure Tensor or GWLF).
Huber-on-Mahalanobis IRLS
The IRLS weight in the Mahalanobis metric:
The augmented information matrix: \(\tilde{\mathbf{W}}_i = w(s_i) \cdot \mathbf{W}_i\)
The threshold \(k = 1.345\) provides 95% asymptotic efficiency under Gaussian noise.
Normal Equations
The Jacobian \(\mathbf{J}_i\) has the same structure as in Fast mode.
Corner Uncertainty Sources
| Source | Method | Module |
|---|---|---|
| Structure Tensor | \(\boldsymbol{\Sigma}_c \approx \sigma_n^2 \mathbf{S}^{-1}\) where \(\mathbf{S}\) is the Sobel-based structure tensor | pose_weighted.rs |
| GWLF Propagation | Formal covariance propagation through PCA line fitting and homogeneous intersection | gwlf.rs |
Gain-Scheduled Tikhonov Regularization
For severely foreshortened tags (grazing angles), the structure tensor becomes ill-conditioned. A gain-scheduled regularizer prevents the information matrix from exploding:
where the anisotropy ratio \(R = \lambda_{\min} / \lambda_{\max}\) and:
The quadratic transfer function keeps \(\alpha \approx 0\) for well-conditioned frontal tags (\(R \to 1\)) and smoothly ramps to \(\alpha_{\max}\) for degenerate configurations (\(R \to 0\)).
Covariance Output
At convergence, the \(6 \times 6\) pose covariance (Cramer-Rao lower bound) is extracted from the final normal equations:
This encodes the full translational and rotational uncertainty and is returned alongside the refined pose.
6. Decoding Strategies
Module: decoder.rs, strategy.rs
6.1 Hard-Decision
Each bit cell is sampled at its grid center via the homography DDA. The sampled intensity is compared against the local adaptive threshold to produce a binary code. Dictionary lookup is \(O(1)\) via precomputed Hamming distance tables.
6.2 Soft-Decision (LLR + MIH)
Instead of binarizing, each bit's Log-Likelihood Ratio is computed:
The Maximum Likelihood codeword is found via Multi-Index Hashing (MIH): the code is split into \(k\) substrings, each indexing into a hash table of dictionary entries. Candidates from matching substrings are scored using the full LLR vector. This achieves sub-linear search over the dictionary with early-exit pruning.
The implementation is zero-allocation (stack-allocated SoftCode) and uses SmallVec-style fixed buffers.