Vibe-Coding Perception Pipelines Without Losing Control

perception
robotics
agentic-development
vibe-coding
engineering-leadership
Author

Noé Fontana

Published

March 7, 2026

This post details the evaluation infrastructure and architectural guardrails required to safely vibe-code (ship functional code without reading it) a high-performance classical perception pipeline and how to keep development velocity from stalling under review and regression load. locus-tag and render-tag, personal projects built to explore these ideas, serve as the concrete case study.

1 The System Under Test: locus-tag

locus-tag is a high-performance fiducial marker detector written in Rust, with Python bindings for robotics integration. It must simultaneously satisfy three competing constraints: latency, recall, and corner RMSE. The pipeline is broken down in the Pipeline Diagram.

Performance comparison of locus-tag against AprilTag 3 benchmarks.
Detector Recall corner RMSE Latency (1080p)
Locus (Soft) 93.16% 0.26 px 77.2 ms
Locus (Hard) 74.35% 0.24 px 58.3 ms
AprilTag 3 62.34% 0.22 px 105.9 ms
  • Latency: Time to process a full frame.
  • Recall: Share of tags detected in the scene.
  • Corner RMSE: Sub-pixel accuracy of detected corner positions.

2 Why Textual Rules Don’t Scale Velocity

Textual instructions (AGENTS.md, custom skills, and spec-driven development) effectively multiply raw code generation, but this acceleration rapidly breaks down in complex systems.

Context engineering is fundamentally open-loop. Relying on an agent’s context window to enforce architectural constraints creates a hard ceiling:

  • Complexity Growth: As the system grows, it outpaces the agent’s working memory.
  • Context Retention: Limits on retention lead to fragmented implementations.
  • Technical Debt: Regressions compound silently in regions where neither human nor agent retains full context.

To prevent development from stalling, we must transition from writing passive textual rules to building active, programmatic evaluation.

3 CI: The Last Line of Defense

To fully leverage parallel AI agents, we must grant them sweeping permissions: the ability to traverse filesystems, rewrite build systems, execute scripts, and autonomously push changes. This creates a development “Wild West.”

WarningSoft guardrails are not enough

AGENTS.md and custom skills are merely soft guardrails—suggestions, too easily discarded when an agent optimizes for immediate task completion or hits its context limit.

Continuous Integration must serve as the immutable line of defense.

The primary failure mode of long-horizon reasoning models is execution drift: over extended coding sessions, agents inevitably shortcut standard practices, bypassing linters, type checkers, and test suites to force a compiling solution.

By enforcing strict, automated quality gates, CI guarantees that the main branch remains stable, preserving high-performance standards despite massive LLM-driven refactors.

4 The Integration North Star: Validating the Physical Contract

In an agentic workflow, unit tests become a fragile liability. LLMs refactor aggressively and frequently break internal test signatures even when the mathematical output is correct.

To maintain velocity, we shift focus to a “North Star”: a high-level integration suite that validates outcomes independently of implementation details. For locus-tag, the contract is: raw pixels go in, accurate detections come out.

The E2E suite evaluates three pillars:

  1. Functional Robustness: We treat the pipeline as a black box, feeding it external datasets (such as those generated by render-tag, our synthetic oracle).
  2. Key Metric Evaluation: Performance is a first-class feature. E2E tests track latency, recall, and corner RMSE against established snapshots.
  3. Pragmatic CI: Ideally, every PR would trigger full regression testing. In practice, cost constrains this to a tiered strategy: lightweight snapshot tests run on every push; resource-intensive benchmarks run selectively before major releases.

5 The Architect Outcome and Outlook: Reviewing Systems, Not Syntax

Defining an E2E North Star sharpens the focus of the effective engineer’s role. Designing systems with observable contracts and verifying they hold under change has always been a core responsibility of Senior engineers. Agentic development simply makes this discipline pay off faster.

For locus-tag, this means enforcing a clean facade where raw pixels go in and precise corners come out, making the physical contract testable as a black box.

5.0.1 Takeaways

The bottleneck is no longer coding speed; agents handle that. It is the reliability of the infrastructure that validates each iteration without compounding review overhead. By investing in both system design and evaluation infrastructure, we can fearlessly scale complexity without losing control of it.

Architecture Deep Dive: The Physical Contract

The following diagram illustrates the execution flow within locus-tag. This structured pipeline is what allows us to treat the system as a “black box” for evaluation while maintaining extreme performance.

sequenceDiagram
    participant App as Application
    participant Det as Detector
    participant Thresh as ThresholdEngine
    participant Seg as Segmentation
    participant Quad as QuadExtraction
    participant Decode as Decoder
    participant Pose as PoseEstimation

    App->>Det: detect(image)
    activate Det

    Note over Det: 0. Pre-allocation & Upscaling
    Det->>Det: Arena Reset

    Note over Det: 1. Preprocessing
    Det->>Thresh: compute_integral_image()
    Det->>Thresh: adaptive_threshold()
    Thresh-->>Det: Binarized Image

    Note over Det: 2. Segmentation
    Det->>Seg: label_components()
    Note right of Seg: Union-Find (Flat Array)
    Seg-->>Det: Component Labels

    Note over Det: 3. Quad Extraction
    loop For each component
        Det->>Quad: extract_quad()
        Quad->>Quad: Contour Tracing
        Quad->>Quad: Polygon Approx (Douglas-Peucker)
        Quad->>Quad: Sub-pixel Refinement (Gradient)
    end
    Quad-->>Det: Quad Candidates

    Note over Det: 4. Decoding
    loop For each candidate
        Det->>Decode: Homography Sampling
        Note right of Decode: Strategy: Hard (Bit) vs Soft (LLR)
        Decode->>Decode: Bit/LLR Extraction (Bilinear)
        Decode->>Decode: Error Correction (Hamming/Soft-ML)
    end

    Note over Det: 5. Pose Estimation (Optional)
    opt [If Intrinsics Provided]
        Det->>Pose: estimate_tag_pose()
        alt [Mode = Fast]
            Pose->>Pose: IPPE + LM (Geometric Error)
        else [Mode = Accurate]
            Pose->>Pose: Structure Tensor (Corner Uncertainty)
            Pose->>Pose: Weighted LM (Mahalanobis Distance)
        end
    end

    Det-->>App: Final Detections
    deactivate Det

Citation

BibTeX citation:
@online{fontana2026,
  author = {Fontana, Noé},
  title = {Vibe-Coding {Perception} {Pipelines} {Without} {Losing}
    {Control}},
  date = {2026-03-07},
  url = {https://noefontana.github.io/posts/2026-03-07-vibe-coding-pipelines/},
  langid = {en}
}
For attribution, please cite this work as:
Fontana, Noé. 2026. “Vibe-Coding Perception Pipelines Without Losing Control.” March 7, 2026. https://noefontana.github.io/posts/2026-03-07-vibe-coding-pipelines/.