ADR-0005 — DenseComposite: the depth-buffered where-composite primitive
| Number | 0005 |
| Title | Unified where-composite primitive for all modalities |
| Status | Accepted |
| Author | @NoeFontana |
| Created | 2026-04-22 |
| Updated | 2026-04-22 |
| Tag | ADR-0005 |
| Relates-to | ADR-0001 Parts (i), (ii), (iv); ADR-0004 |
Context
After W1 (ADR-0003, ADR-0004) the pipeline is fully DenseSample-native on
the instance branch, but the per-modality composite logic in
augmentation/copy_paste.py (481 lines) and processing/placement.py
(311 lines) was written for instance-only copy-paste with DetectionTarget
(now deleted). It hardcodes alpha blending, per-object iteration, a fixed
collision threshold, and post-hoc box recomputation. None of DenseSample's
other active modalities (semantic, panoptic, depth, normals) are usable
without rewriting blend and occlusion logic from scratch each time.
P1 workstreams W2–W4 land the remaining modalities:
- W2 — instance parity + ClassMix (semantic).
- W3 — panoptic-CP.
- W4 — depth-aware paste with metric-depth rescaling.
If each of these workstreams keeps rewriting the same composite arithmetic on a different modality, ADR-0001 Part (ii)'s invariant matrix will diverge across branches of a copy-paste tree that should share one primitive.
Decision
Introduce a single DenseComposite operator (§1) — the §7.2
where-composite — from which every modality-specific augmentation falls
out as a ≤50-line wrapper:
InstancePaste(W2) — keeps the existingCopyPasteAugmentationpublic signature as a shim.ClassMix(W2) — whole-frame semantic-map mix, no placement.PanopticPaste(W3) — panoptic-id composite with stuff/thing split.DepthAwarePaste(W4) — metric-aware z-buffered composite.
The depth clause degenerates to the Ghiasi alpha composite when depth is
None, recovering the existing instance-branch behavior exactly
(see §4, parity gate).
1. DenseComposite primitive
Location. src/segpaste/_internal/composite.py. Kept under the
_internal namespace introduced in ADR-0001 Part (i); not added to
segpaste.__all__. Promotion is deferred until W3/W4 validate that
DenseComposite's interface generalizes across all five modalities.
Base class. torch.nn.Module. Modality-presence flags are resolved at
__init__ (CompositeConfig.modalities: set[Modality]) so that
forward uses only Python-level constants — torch.compile traces cleanly
without branch-on-tensor-value.
Algorithm. Given target, source: DenseSample and a
paste_mask: Tensor [H, W] bool:
- Effective mask.
- If both
target.depthandsource.depthare set:M_eff = paste_mask & ((source.depth < target.depth) | ~target.depth_valid). - Else:
M_eff = paste_mask. (Recovers Ghiasi alpha composite bitwise.) - Per-modality composite.
- Float fields (
image,depth,normals):out = M_eff * source + (1 − M_eff) * target, withM_effbroadcast-cast to the field dtype. - Integer label maps (
semantic_map,panoptic_map):out = where(M_eff, source, target)— no float arithmetic, no interpolation at class boundaries. instance_masks: the target's masks are subtracted byM_eff(occlusion); the source's placed masks are stacked on top. Freshinstance_idsare allocated[max_prev+1, …)so the ADR-0001 Part (ii) "fresh instance ids on paste" invariant holds.depth_valid:where(M_eff, source.depth_valid, target.depth_valid).- Box recomputation. Centralized:
boxes_out = masks_to_boxes(inst_masks_out). Per-mask area< CompositeConfig.min_composited_areais dropped — ADR-0001 Part (ii) invariant.
Config. CompositeConfig (frozen pydantic, extra="forbid"):
min_composited_area: int = 50, occluded_area_threshold: float = 0.99,
modalities: frozenset[Modality]. Does not subsume CopyPasteConfig
— the two are composed by InstancePaste.
2. PlacementSampler
Location. src/segpaste/_internal/placement.py. Replaces the full
contents of src/segpaste/processing/placement.py — the three-class
generator/validator/placer split collapses to a single nn.Module.
Interface. sample(target_size, source_bbox, existing_boxes, padding_mask)
→ (top, left, paste_mask) | None. Emits a resolved paste-mask directly in
target coordinates. For W2 the sampler is translation-only (integer
top, left offsets); scaled/rotated placement via affine + grid_sample
is deferred to W3/W4, where actual scaling is needed and the parity
baseline no longer constrains the arithmetic.
RNG. The W2 implementation preserves the current dual-RNG pattern
— torch.randint on the non-padding path, random.randint on the
padding-aware path — to keep the parity sweep bitwise (see §4).
Unification under a single torch.Generator is scheduled for the same
follow-up that introduces grid_sample.
3. Specializations
InstancePaste (src/segpaste/_internal/instance_paste.py).
Composes CopyPasteConfig + CompositeConfig; iterates source
objects, calls PlacementSampler, delegates blend/occlusion to
DenseComposite. The public CopyPasteAugmentation class in
augmentation/copy_paste.py becomes a thin shim around InstancePaste
— signature unchanged, segpaste.__all__ unchanged.
ClassMix (src/segpaste/_internal/classmix.py). Semantic-modality
specialization. Samples a class-union from the source semantic_map
(default-excludes the 255 ignore label), builds a paste_mask, and
calls DenseComposite. Uses no PlacementSampler — ClassMix is a
whole-frame paste, which makes it a cleaner "does the primitive
generalize?" test case for a second modality than a second placed paste.
4. Parity baseline
The W2 rewrite is gated by a bitwise CPU parity sweep:
tests/test_dense_composite_parity.py runs 200 fixed-seed
(target, sources) pairs through the current CopyPasteAugmentation
output snapshot (tests/fixtures/composite_baseline.pt, generated from
commit 23a2d47 immediately before the rewrite lands) vs. the new
InstancePaste, and asserts torch.equal on image, boxes, labels,
instance_ids, and masks.
The fixture is never regenerated after W2 lands. The generation
script (scripts/gen_composite_baseline.py) is committed alongside
the fixture so the baseline is reproducible if needed, but the
canonical anchor is the serialized file, not the script's re-run output.
CUDA uses a statistical-equivalence gate (area-histogram KS-test)
rather than bitwise parity — grid_sample is nondeterministic on
CUDA and torch.randint generator streams diverge across devices.
CUDA parity is advisory (skipped when no CUDA is available), not
a merge gate.
5. Public surface
W2 makes no change to segpaste.__all__. All new symbols
(DenseComposite, CompositeConfig, PlacementSampler,
InstancePaste, ClassMix, ClassMixConfig) land under
segpaste._internal. Promotion is deferred to a follow-up ADR after
W3/W4 validate the interface. The CopyPasteAugmentation shim
preserves the current signature and is the only public entry point
into the composite primitive during 0.9.x.
Consequences
src/segpaste/processing/placement.pyis deleted (contents fully replaced by_internal/placement.py— not a rename).augmentation/copy_paste.pycollapses from 481 to ~40 lines.tests/test_placement.py+tests/test_placement_fuzz.pyare deleted, superseded bytests/test_placement_sampler.py.tests/test_copy_paste.py+tests/test_copy_paste_fuzz.pykeep their current shape — they exercise the shim end-to-end and provide behavioral regression coverage alongside the bitwise parity sweep.- Benchmarks (
benchmarks/bench_copy_paste.py) are re-baselined post-rewrite;baseline.jsonis updated in the same commit that flips the shim.benchmarks/bench_classmix.pylands alongside. - Promoting any
_internalsymbol to the public surface requires an ADR-0005 amendment and an entry added totests/test_public_surface.py::_EXPECTED_PUBLIC_APIper ADR-0001 Part (i).
Alternatives considered
torchvision.transforms.v2.Transformbase. The v2 transform contract (make_params/transform(inpt, params)) assumes single-input flows;DenseCompositeis inherently binary (target, source, paste_mask). Keeping it a plainnn.Moduleavoids shoehorning a two-input op into a one-input protocol.- Keep per-composite operators separate, share only a config base class. Discarded: ADR-0001 Part (ii)'s invariant matrix is exactly the set of constraints a shared primitive enforces centrally. Duplicating the depth clause across four specializations is the failure mode ADR-0005 exists to prevent.
- Land
grid_samplein W2. Discarded: the parity gate forbids arithmetic drift, and W2's placement is translation-only in the current implementation. Introducinggrid_samplefor no geometric gain would forfeit bitwise parity for nothing. - Promote
DenseCompositetosegpaste.__all__immediately. Discarded: one-modality validation (instance + semantic) is a weak signal that the interface generalizes. The public surface is cheap to add to and expensive to remove from (ADR-0001 Part (i)); wait for W3/W4 pressure-testing before committing.