Architecture
render-tag follows a strictly decoupled architecture to ensure that generation logic is independent of the rendering engine.
High-Level Overview
The system is divided into Host code (Python >=3.11) and Backend code (Blender/BlenderProc).
graph TD
A["CLI / Python API"] --> B["SceneCompiler"]
B -- "SceneRecipe" --> C["Unified Orchestrator"]
C -- "ZMQ REQ" --> D["ZMQ Server (Blender)"]
D --> E["Worker Server"]
E --> F["BlenderBridge"]
F --> G["BlenderProc / Blender"]
G --> H["Rendered Images & Metadata"]
Components
1. Core (src/render_tag/core/)
The foundation of the system. Contains the single source of truth Schemas (Pydantic models), configuration logic, and fundamental utilities. Schema evolution (versioned migrations and deprecated-field sunsets) is documented in engineering/schema_migrations.md.
2. Generation (src/render_tag/generation/)
Pure Python procedural logic. Samples parameters and builds SceneRecipe objects. It has no dependency on Blender.
3. Backend (src/render_tag/backend/)
The 3D rendering engine.
bootstrap.py: Centralized environment stabilization. Handles paths, venv site-packages, and logging.bridge.py:BlenderBridgeprovides explicit Dependency Injection for Blender/BlenderProc APIs.worker_server.py: Implementation of the "Hot Loop" ZMQ server.engine.py: The actual BlenderProc execution logic.
4. Orchestration (src/render_tag/orchestration/)
The UnifiedWorkerOrchestrator manages the lifecycle of PersistentWorkerProcess instances. It handles ZMQ communication, VRAM guardrails, and parallel sharding.
5. Data I/O (src/render_tag/data_io/)
Handles asset loading, caching, and writing final annotations (COCO, CSV).
6. Audit (src/render_tag/audit/)
Post-render verification: dataset integrity, telemetry analysis, and dashboard reports.
7. Viz (src/render_tag/viz/)
Visualization tooling (FiftyOne integration). Independent of audit and backend.
Package Dependency Graph
The dependency rules below are enforced as import-linter contracts in
[tool.importlinter] of pyproject.toml and verified in CI by the
Architecture (Import Linter) job. The render-tag lint-arch CLI runs the
same check locally.
graph TD
cli["cli"]
orch["orchestration"]
audit["audit"]
viz["viz"]
backend["backend"]
gen["generation"]
data_io["data_io"]
core["core<br/>(incl. core.geometry)"]
cli --> orch
cli --> audit
cli --> viz
cli --> gen
cli --> data_io
cli --> core
orch --> audit
orch --> core
audit --> gen
audit --> data_io
audit --> core
viz --> data_io
viz --> core
backend --> gen
backend --> data_io
backend --> core
gen --> data_io
gen --> core
data_io --> core
Layers, top to bottom:
| Layer | Package(s) | May import from |
|---|---|---|
| 6 | cli |
everything below |
| 5 | orchestration |
audit, core (never backend — spawned as a subprocess via ZMQ) |
| 4 | audit, viz, backend (siblings) |
generation, data_io, core (never each other) |
| 3 | generation |
data_io, core |
| 2 | data_io |
core |
| 1 | core |
(foundation; no internal deps) |
Additional contracts:
- Host isolation — only
render_tag.backendmay importbpy,blenderproc,mathutils, orrender_tag.backenditself. Every other package must remain importable in a plain Python environment without a Blender interpreter. - Determinism —
render_tag.generationmust not import therandommodule. The procedural generator uses seedednumpyRNGs only; Python's globalrandomstate would break bit-reproducibility ofSceneRecipes.
When a proposed import would violate any contract, fix the code first. If
the violation reflects a deliberate architectural decision, change the
contract in pyproject.toml and add a one-line justification next to it.
The "Hot Loop" (Persistent Workers)
To avoid the significant overhead of starting Blender for every scene, render-tag uses a persistent worker architecture.
- Orchestrator starts one or more Blender instances in the background.
- Each Blender instance runs a ZMQ Server (
zmq_server.py). - The Orchestrator sends Scene Recipes over ZMQ.
- The Worker Server receives the recipe, uses the BlenderBridge to access APIs, and renders the scene.
- Blender remains ready for the next recipe without quitting.
This "Hot Loop" can improve rendering throughput by \(2-5\times\) for small scenes.
Rendering Performance (CV-Safe)
To maximize generation speed while maintaining the high sub-pixel accuracy required for fiducial tag detection, render-tag employs several optimization strategies:
1. Adaptive Sampling & Denoising
We use Cycles' Adaptive Sampling with a noise threshold (default \(0.05\)) rather than a fixed sample count. This is combined with Intel OpenImageDenoise (OIDN) guided by Albedo and Normal passes. This "CV-Safe" approach ensures that flat surfaces render nearly instantaneously while high-frequency edges (like tag corners) receive enough samples to remain sharp and accurate.
2. Light Path Optimization
Standard path tracing bounces light many times to achieve artistic realism. For computer vision training, we "min-max" these bounces to balance fidelity with throughput:
| Parameter | Value | Rationale |
|---|---|---|
| Total Bounces | 4 | Diminishing returns for CV after 4. |
| Diffuse | 2 | Enough for realistic indirect lighting. |
| Glossy | 4 | Critical for preserving specular highlights (glare). |
| Transmission | 0 | Disabled unless glass/refraction is explicitly needed. |
| Caustics | Off | Computationally expensive and irrelevant for tag detection. |
Geometric Data Contract (3D-Anchored Orientation)
To ensure synthetic data maintains 6DoF orientation integrity (\(\text{roll}, \text{pitch}, \text{yaw}\)), render-tag follows a strict local-space geometric contract for all point-based subjects (Tags, Boards).
The "Logical Corner 0" Rule
All subject keypoint arrays MUST be ordered such that:
- Index 0: Represents the Logical Top-Left of the subject's local payload/texture, located at local coordinates \((-w/2, -h/2, 0)\).
- Indices 1, 2, 3: Follow a strict Clockwise (CW) winding in the subject's local \(XY\) plane.
- Index 1: Logical Top-Right \((+w/2, -h/2, 0)\)
- Index 2: Logical Bottom-Right \((+w/2, +h/2, 0)\)
- Index 3: Logical Bottom-Left \((-w/2, +h/2, 0)\)
Architectural Enforcement
- Asset Layer:
keypoints_3dare assigned explicitly in local coordinates during mesh generation. -
Projection Layer: Performs a pure mathematical transformation from world space \(P_{world}\) to pixel space \(p_{pixel}\):
\[ p_{pixel} = K [R|t] P_{world} \]This ensures zero-drift between the 3D asset and its 2D annotations without any visual re-sorting heuristics.
-
Annotation Layer: Preserves the original 3D indices in the 2D output (COCO keypoints, CSV corners).
Data Products: rich_truth.json
The rich_truth.json file is the canonical "Data Product" for a rendered dataset. It exists in two wire formats — the reader layer (unwrap_rich_truth) handles both transparently.
v1 — Legacy (bare array)
[
{ "image_id": "frame_0001", "tag_id": 0, "corners": [[x,y], ...], ... },
...
]
Produced by pipelines without an eval_margin_px setting. corners_visibility and keypoints_visibility are absent (null in DetectionRecord).
v2 — Versioned envelope
{
"version": "2.0",
"evaluation_context": {
"photometric_margin_px": 21,
"truncation_policy": "ternary_visibility"
},
"records": [
{
"image_id": "frame_0001",
"tag_id": 0,
"corners": [[x,y], ...],
"corners_visibility": [2, 1, 2, 2],
"keypoints_visibility": [2, 2, 1, ...],
...
}
]
}
Produced when camera.eval_margin_px > 0 in the config. The evaluation_context header is informational; the per-record eval_margin_px field is the authoritative source for which margin was applied to each detection.
KeypointVisibility Convention
Per-keypoint visibility follows the COCO keypoint convention extended with a semantic "Don't Care" state:
| Value | Name | Meaning | COCO v |
|---|---|---|---|
| 0 | OUT_OF_FRAME |
Sentinel (-1,-1) — behind camera or not projected |
v=0 (zeroed coords) |
| 1 | MARGIN_TRUNCATED |
Inside image but within eval_margin_px of any edge — excluded from evaluation |
v=1 |
| 2 | VISIBLE |
Inside the inner safe region — fully evaluable | v=2 |
eval_margin_px is configured per-camera in camera.eval_margin_px (YAML) / CameraConfig.eval_margin_px (Python). Set to 0 (default) to disable. Recommended value: 5 px (half-radius of a standard 11-px Gaussian kernel, ensuring corner localization doesn't pick up out-of-image signal).
eval_complete — Partial Detection Filter
A tag or board where any corner/keypoint falls inside eval_margin_px (or is out-of-frame) sets eval_complete = false on the record. This is the canonical field for downstream consumers to exclude partial detections from metrics — no iteration over per-point arrays needed.
# Example: filter rich_truth records for clean evaluation
usable = [r for r in records if r.get("eval_complete", True)]
Only meaningful for TAG records. A tag with three corners well inside the safe region and one corner just inside the margin zone will have eval_complete = false, even though the majority of its geometry is evaluable.
BOARD records do not use eval_complete. A board can span dozens of saddle points and may legitimately be partially off-screen; a single boolean rollup is not useful. Use keypoints_visibility directly to filter individual saddle points:
usable_saddles = [
pt for pt, v in zip(record["keypoints"], record["keypoints_visibility"])
if v == 2 # KeypointVisibility.VISIBLE
]
For datasets generated without eval_margin_px (v1 or eval_margin_px=0), eval_complete defaults to true — backward compatible.
FiftyOne Views
| Saved view | Filter |
|---|---|
| Evaluation Ready | filter_labels(..., F("visibility") == 2) — per-point; removes individual margin-zone points |
| Strict Geometry | No filter — all projected points including margin-zone ones |
Note: the FiftyOne views filter at the keypoint level; eval_complete filters at the record level. Use the saved views for visual inspection and eval_complete for programmatic metric gating.
Reproducibility
Correctness in synthetic data requires strict reproducibility. render-tag ensures this through:
- Environment Fingerprinting: We hash the
uv.lockand record the Blender version. If the environment changes, the system issues a warning or error. - Config Hashing: Every dataset contains a
job.json(JobSpec) that includes a SHA256 hash of the exact configuration used. - Asset Hashing: The
JobSpecincludes a SHA256 hash of all external assets (HDRIs, floor textures, tag images) referenced by the job. This prevents "silent" dataset drift when binary assets are updated on disk or on the Hub. - Deterministic Sharding: Seeds are derived from a master seed and shard index, ensuring that scene #500 is identical regardless of whether it was rendered in a single batch or as part of a specific shard.