Skip to content

Distributed evaluation across ranks

Per ADR-0031 (instance), ADR-0032 (semantic + panoptic), and ADR-0035 (the public entry class), every rank evaluates its own slice locally, gathers a small bytes payload via the user's transport (torch.distributed.all_gather_object, mpi4py.comm.gather, etc.), and the head rank reconstructs a Summary equivalent to a batch run over the union of the partials. The same idiom works in all three paradigms.

vernier ships a bytes interface; the transport is the user's problem (no import torch inside vernier and no torch-version pin to chase). rank_id is the user's responsibility — vernier doesn't try to discover it.

Instance — bbox / segm / boundary / keypoints

import torch.distributed as dist
from vernier.instance import Evaluator, Bbox

ev = Evaluator(iou=Bbox(), parity_mode="corrected")
partial = ev.evaluate_to_partial(
    gt_bytes, dt_for_this_rank, rank_id=dist.get_rank()
)  # bytes

gathered: list[bytes | None] = [None] * dist.get_world_size()
dist.all_gather_object(gathered, partial)

if dist.get_rank() == 0:
    summary = Evaluator.from_partials(
        gt_bytes, gathered, iou=Bbox(), parity_mode="corrected"
    )
    log_metrics(summary)

If your validation loop submits image-by-image (rather than handing a single per-rank batch to evaluate_to_partial), use BackgroundEvaluator and call finalize_to_partial() instead — the gather + Evaluator.from_partials(...) step on the head rank is the same.

Semantic — mIoU / FWIoU / pAcc / mAcc

The headline difference for semantic: confusion-matrix sums are u64- additive, so strict-mode merge is unconditionally bit-equal to a batch run over the union — no tiebreak caveat:

import torch.distributed as dist
import vernier.semantic as sem

rank_gt = sem.Dataset.from_arrays(rank_gt_maps, n_classes=19)  # Cityscapes
rank_dt = sem.Predictions.from_arrays(rank_dt_maps)

ev = sem.Evaluator(parity_mode="strict")
partial = ev.evaluate_to_partial(rank_gt, rank_dt, rank_id=dist.get_rank())

gathered: list[bytes | None] = [None] * dist.get_world_size()
dist.all_gather_object(gathered, partial)

if dist.get_rank() == 0:
    summary = sem.Evaluator.from_partials(
        n_classes=19, partials=gathered, parity_mode="strict",
    )
    log_metrics(summary)

Panoptic — PQ

Panoptic has one additional knob — retain_per_image_deltas — that is the flagship determinism control. Default False keeps single-rank memory lean (per-category PqStat fold only). Set it to True on every rank when you need strict-mode bit-equality across the merge boundary; the merge accumulator re-sorts per-image deltas by image_id and re-sums in batch order, recovering bit-equality despite f64 non- associativity.

The panoptic evaluate_to_partial takes per-image tuples directly (PanopticDataset does not yet expose per-image accessors — closing that gap is a follow-up to ADR-0035):

import torch.distributed as dist
import vernier.panoptic as pq

ev = pq.Evaluator(parity_mode="strict")

# Per-image tuples: (image_id, gt_label_map, gt_segments_info,
#                   dt_label_map, dt_segments_info)
images_for_this_rank = [...]

partial = ev.evaluate_to_partial(
    images_for_this_rank,
    categories=categories_json,
    rank_id=dist.get_rank(),
    retain_per_image_deltas=True,    # opt-in for deterministic CI gate
)

gathered: list[bytes | None] = [None] * dist.get_world_size()
dist.all_gather_object(gathered, partial)

if dist.get_rank() == 0:
    summary = pq.Evaluator.from_partials(
        categories_json,
        gathered,
        parity_mode="strict",
        retain_per_image_deltas=True,
    )
    log_metrics(pq=summary.pq, sq=summary.sq, rq=summary.rq)

Wire-size cost of retain_per_image_deltas=True is ~few hundred bytes per image per rank: ~100 KB at Cityscapes val (500 images), ~1 MB at COCO panoptic val (5k images). Corrected mode without deltas stays within ADR-0004's 4-ULP envelope at zero memory cost.

Shared Partial* exception family

The five Partial* exception classes (PartialFormatMismatch, PartialDatasetMismatch, PartialParamsMismatch, PartialPartitionOverlap, PartialRankCollision) are paradigm-sharedvernier.instance.PartialDatasetMismatch is vernier.semantic.PartialDatasetMismatch is vernier.panoptic.PartialDatasetMismatch holds, so a single top-level handler catches the same condition across all three paradigms.

See also

  • ADR-0031 — instance streaming determinism + merge contract.
  • ADR-0032 — full determinism contract and validation surface across paradigms.
  • ADR-0035 — why the entry class is Evaluator (not StreamingEvaluator).