ADR-0007 — DepthAwarePaste: depth z-buffer paste with intrinsics handling
| Number | 0007 |
| Title | Depth-modality composite with intrinsics rescale and h-flip normals transport |
| Status | Accepted |
| Author | @NoeFontana |
| Created | 2026-04-23 |
| Updated | 2026-04-23 |
| Tag | ADR-0007 |
| Relates-to | ADR-0001 Parts (ii), (iii); ADR-0005; ADR-0006 |
Context
ADR-0001 Part (ii) pins five depth/normals invariants — monotonicity on
effective paste pixels, validity join, metric intrinsics rescale;
unit-norm-on-valid for normals, and right-down-forward camera-frame
convention. The predicates are implemented in tests/invariants/depth.py
and tests/invariants/normals.py, with five
InvariantRow(Modality.DEPTH|NORMALS, …, xfail=True) entries in
tests/test_invariant_matrix.py:201-205 waiting for a composite that
satisfies them.
ADR-0005 landed DenseComposite.forward with image, instance, and
semantic branches; ADR-0006 added the panoptic branch in W3. Depth and
normals remain untouched: forward never emits depth, depth_valid,
or normals, though _effective_mask already performs the z-test
(composite.py:102-113). No depth-specialized wrapper exists.
W4 closes that gap. Unlike W2 (anchored to a pre-rewrite bitwise parity snapshot) but like W3, there is no predecessor implementation — W4 is defined from first principles against the §(ii) invariants, two hand-constructed analytical golden fixtures (monotonicity + intrinsics rescale), and a forward-gate 200-seed snapshot.
Decision
Add _composite_depth and _composite_normals branches to
DenseComposite and land DepthAwarePaste as its specialization under
src/segpaste/_internal/depth_paste.py. DepthAwarePaste reuses the W2
PlacementSampler for translation-only placement, preprocesses the
source with an optional h-flip (with a normals x-sign-flip), performs
the metric intrinsic rescale when active, and delegates the per-pixel
write to DenseComposite. The public surface is unchanged —
segpaste.__all__ stays frozen, DepthAwarePaste stays under
_internal per ADR-0005 §5.
1. metric_depth lives on DenseSample, not on the wrapper config
Per-sample state pairs naturally with the existing camera_intrinsics
field on DenseSample. A dataloader can mix metric and affine samples
without constructing two composites. DenseSample.__post_init__
enforces: if metric_depth=True and depth is set, camera_intrinsics
must also be set — silent fallback to identity intrinsics is explicitly
forbidden.
DepthAwarePaste.transform additionally enforces at runtime that both
operands share the same metric_depth flag; mismatched flags raise
ValueError. Composing a metric source into an affine target (or vice
versa) would mix two incommensurable depth scales.
ADR-0001 Part (iii) is amended to add metric_depth: bool = False to
the DenseSample field table and to document the cross-field invariant
in the rationale.
2. Validity semantics: target-dominant outside M_eff, conjunctive inside
The intended validity formula is piecewise: outside M_eff,
V_out = V_tgt (un-touched target pixels retain their validity);
inside M_eff, V_out = V_src ∧ V_tgt (both must be valid for the
pasted pixel to be trusted). The previous ADR-0001 §(ii) wording
("pixelwise AND everywhere") would have let a source's invalid region
invalidate an un-touched target pixel — an unintended regression. ADR-0001
§(ii) is amended to state the piecewise formula, and
tests/invariants/depth.py::assert_depth_validity_join is updated to
take effective_paste_mask and assert the piecewise formula.
2a. Generalization to all per-pixel validity signals (ADR-0008 amendment)
The piecewise validity formula is not specific to depth. It applies to
every per-pixel validity signal carried on a DenseSample. Two
instances exist today:
depth_valid— gates pixel-level depth measurements (this ADR's §2).image_valid := ~padding_mask— derived fromPaddingMask. Marks which pixels of a sample carry real image content rather than pad introduced byFixedSizeCrop/ LSJ. Composites must not pull source-pad zeros into pasted regions, and placements must not be drawn over target pad. Both gates fold intoM_effsymmetrically:M_eff = paste_mask ∧ (z-test) ∧ image_valid_src(ADR-0008 §C5,_internal/composite.py::_effective_mask,_internal/gpu/tile_composite.py::_effective_mask). The placement side consults the equivalentvalid_extentreduction (BatchCopyPaste._valid_extent) so translates land inside the target's valid rect, and discards source rows whose bbox extends past the source's valid extent.
Future per-pixel validity signals (e.g. semantic-confidence mask,
amodal-occlusion mask) plug into the same machinery: warp under the
same grid_sample(mode="nearest", padding_mode="zeros") template used
for depth_valid, AND into M_eff at composite time, optionally
contribute to valid_extent at placement time. No new ADR is required
for additional per-pixel signals — this generalization is the
architectural commitment.
3. Depth composite reuses _effective_mask
The z-test is already correct in _effective_mask (composite.py:102-113):
inside the placement paste mask, source wins iff
d_src < d_tgt ∨ ~V_tgt. Monotonicity
(d_out = min(d_src, d_tgt) inside M_eff) reduces to
torch.where(m_eff.unsqueeze(0), d_src, d_tgt) — no explicit
torch.minimum required, because m_eff already encodes the z-test.
This mirrors _composite_semantic and _composite_panoptic exactly.
_composite_normals is the same shape:
torch.where(m_eff.unsqueeze(0), n_src, n_tgt). Inputs are unit-norm by
precondition; per-pixel selection preserves unit-norm without any
renormalization. No interpolation is introduced by the composite at
pixel granularity.
4. Metric intrinsic rescale: geometric mean of focal lengths
When metric_depth=True on both operands, the wrapper rescales the
source depth before constructing the synthetic source fed into
DenseComposite:
For isotropic pixels (fx == fy), this reduces to the brief's
f_t / f_s ratio. For non-square pixels, the geometric mean handles
both axes symmetrically. This is the Metric3D-v2 canonical-camera trick
productionized per the source report §3c.
The rescale runs once in DepthAwarePaste.transform; the composite
itself is intrinsics-agnostic.
tests/invariants/depth.py::assert_depth_metric_intrinsics_rescale is
updated from its placeholder fx-only ratio
(depth.py:66-67) to match.
5. Validity-join reconciliation with ADR-0001
Because the previous ADR-0001 wording was wrong (§2 above),
assert_depth_validity_join was written against a false predicate. W4
changes both the ADR-0001 §(ii) text and the invariant body in the same
commit so the xfail=False flip is consistent. Every other caller of
assert_depth_validity_join is the fuzz / matrix harness — no silent
pass-through.
6. Blend-mode restriction enforced at the type level
DepthAwarePasteConfig.blend_mode: Literal["alpha"] = "alpha".
Constructing DepthAwarePasteConfig(blend_mode="gaussian") raises
pydantic.ValidationError — no custom ConfigurationError class is
needed. Matches the existing CopyPasteConfig.blend_mode: Literal["alpha"]
pattern (src/segpaste/config.py:23-24).
The source-report §3d reasoning is specific: pasted depth discontinuities align with real scene depth edges and match gradient-matching losses, whereas Gaussian feathering would create synthetic depth ramps that are neither plausible nor in-distribution for monocular-depth models.
Moot post-v0.3.0.
DepthAwarePasteConfigand the entire CPU wrapper class were deleted in v0.3.0 (ADR-0008 hard-deprecation). The GPU successorBatchCopyPasteexposes harmonization viaBatchCopyPasteConfig.harmonize(ADR-0012). The §3d Gaussian-on-depth concern is preserved in spirit: ADR-0012's three harmonization modes operate on the image channel only — depth, normals, and label modalities still composite under the nearest-sample alpha-where inTileCompositor, never under a smoothed blend.
7. H-flip as wrapper-level preprocessing
DepthAwarePaste._maybe_hflip_source(source, rng) flips the source
atomically when config.hflip_probability fires:
image, instance_masks, depth, depth_valid, normals, semantic_map,
panoptic_map, and boxes all get the standard torchvision h-flip; the
normals x-component is sign-flipped (normals[0] = -normals[0]). This
is the single correct camera-frame transformation under h-flip in the
right-down-forward convention.
Rotation and translation are explicitly out of scope for P1 (source
report §8 defers ray-rectified normal transport). The wrapper does
not accept a rotation parameter; the only geometric transform on
normals is the h-flip sign-flip. PlacementSampler is untouched,
preserving W3's parity gate.
8. Debug validity-join assert
DepthAwarePasteConfig.debug_assert_validity: bool = False. When set,
post-composite asserts V_out matches the piecewise formula. Default
off; flagged on by the fuzz test harness. Mirrors
PanopticPasteConfig.debug_assert_bijection.
9. NYUv2 AbsRel regression is out of scope
A reference SegFormer-like fine-tune on NYUv2 with depth-CP would demonstrate that the composite does not harm downstream metrics, but requires model weights, a dataset story, and a CI pathway the repository does not currently carry. The composite's determinism is asserted by the analytical goldens (monotonicity + metric rescale) plus the 200-seed forward-gate snapshot; an end-to-end AbsRel regression is future integration-test work.
10. Test strategy summary
- Analytical golden (monotonicity):
tests/fixtures/synthetic/depth_overlap.pyhand-constructs a(target, source, expected)triple with two planar surfaces at known depths (d_tgt=2.0, d_src=1.0); expected output bitwise equalstorch.where(m_eff, d_src, d_tgt). - Analytical golden (intrinsics rescale):
tests/fixtures/synthetic/depth_intrinsics.pywithfx_t=1000, fx_s=500(isotropic) — output depth inside the paste region equals2 * d_src. - Hypothesis fuzz:
dense_sample_strategy({Modality.IMAGE, Modality.INSTANCE, Modality.DEPTH, Modality.NORMALS})atmax_examples=200; all five invariant bodies called post-transform. - Forward-gate snapshot:
tests/fixtures/depth_baseline.ptgenerated viascripts/gen_depth_baseline.pyon CPU at W4 HEAD; never regenerated (same policy as ADR-0005 §4). CUDA is skipped because_effective_mask'storch.where+ mask reductions are CPU-deterministic but not strictly guaranteed on CUDA. - Invariant-matrix flip: five
InvariantRow(Modality.DEPTH|NORMALS, …, xfail=True)entries intests/test_invariant_matrix.py:201-205flip to passing. - W2 + W3 parity regression canaries:
tests/test_dense_composite_parity.pyandtests/test_panoptic_paste_parity.pystay bitwise — emission of depth / depth_valid / normals is conditional on at least one input carrying them, so instance-only and panoptic-only paths are untouched.
Consequences
DenseSamplegains ametric_depth: bool = Falsefield; ADR-0001 §(iii) table updated.test_public_surface.pyremains green because the field has a default andDenseSampleis constructed through__init__with keyword args.DenseComposite.forwardnow emitsdepth,depth_valid, andnormalsiff at least one input carries them. Instance-only and panoptic-only paths are unchanged.DepthAwarePaste,DepthAwarePasteConfigstay undersegpaste._internal. Promotion requires an ADR-0007 amendment and an_EXPECTED_PUBLIC_APIentry per ADR-0001 Part (i).- ADR-0001 §(ii) validity-join wording is amended from AND-everywhere
to the piecewise formula.
assert_depth_validity_joinsignature changes (addseffective_paste_maskargument). assert_depth_metric_intrinsics_rescalesignature is unchanged, but the ratio is computed assqrt(fx_t*fy_t) / sqrt(fx_s*fy_s)instead offx_t / fx_s. Tests carrying isotropic intrinsics are unaffected.benchmarks/_fixture.pygrowswith_depth,with_normals, andmetric_depthparameters;benchmarks/bench_depth_paste.pylands alongsidebench_panoptic_paste.py.- ADR-0005 §5's private-until-validated policy moves one step closer to
a follow-up promotion ADR: W4 exercises
DenseCompositeon two more modalities (depth + normals), with a wrapper-level h-flip preprocessing step that is orthogonal to the composite itself.
Alternatives considered
metric_depthonDepthAwarePasteConfig. Discarded: per-composite state forces users to construct two wrappers to mix metric and affine samples in a single dataloader. The flag is intrinsically per-sample — it states whether the tensor values are in meters or dimensionless relative depth — and pairs withcamera_intrinsics, which is already per-sample. The amendment to ADR-0001 §(iii) is small (one field, defaultFalse).- Computing
min(d_src, d_tgt)explicitly inside_composite_depth. Discarded: redundant with_effective_mask, which already encodes the z-test. Thetorch.where(m_eff, d_src, d_tgt)formula is provably identical onM_src ∩ placement, one pass shorter, and mirrors the other composite branches. - ADR-0001 validity-join wording kept as-is. Discarded: the AND-everywhere formula is wrong under a paste operation (lets the source's invalid region invalidate an un-touched target pixel). The bug would have surfaced the first time a monocular-depth user pasted a source with a larger invalid region than the target; W4's piecewise semantics is the correct fix.
- Extend
PlacementSamplerwith aflipparameter. Discarded: more principled long-term but touches W3-shipped infra. Theflipparameter would have to be threaded through_place_thingsinPanopticPasteand any future wrapper, and the W3 parity snapshot would need regeneration. Wrapper-level preprocessing isolates the change. If future wrappers need geometric augmentation beyond h-flip, a follow-up ADR can promote the extension. - Introduce a
BlendModeenum. Discarded:Literal["alpha"]via pydantic already enforces the restriction at construction. Adding an enum would duplicate ADR-0001's reserved-names list and require a surface-change amendment. The existingCopyPasteConfig.blend_mode: Literal["alpha"]pattern is reused. - Ray-rectified normal transport under translation. Discarded per
source report §8: the geometry is nontrivial (requires back-projection
through
Kand forward-projection throughK), per-pixel, and numerically delicate. P1 ships h-flip only; a follow-up ADR can add translation transport once a concrete downstream use case exists. - Runtime validity-join check in release. Discarded: the composite
construction is correct by design. A post-hoc assertion on every
transformburns a full-frame reduction. Debug mode keeps the check as a fuzz-test harness opt-in.