vernier.panoptic
Panoptic-quality (PQ) evaluation. Evaluator + evaluate(...) mirror
the instance API; Predictions / Dataset carry the panoptic-specific
payload (segment id maps + segment-info JSON).
Panoptic-quality (PQ) evaluation surface (ADR-0025).
Per ADR-0029, the panoptic-segmentation evaluation paradigm lives under
vernier.panoptic. Sibling to :mod:vernier.instance and
:mod:vernier.semantic. The Rust kernel ships in the
vernier-panoptic crate; this module is a thin Python wrapper.
BackgroundEvaluator
BackgroundEvaluator(
categories: bytes,
parity_mode: str,
*,
things_stuff_split: bool = ...,
retain_per_image_deltas: bool = ...,
rank_id: int | None = ...,
queue_capacity: int = ...,
worker_affinity: int | None = ...,
worker_nice: int = ...,
shutdown_timeout_seconds: float = ...,
)
Background panoptic-quality evaluator (ADR-0014 + ADR-0032).
Wraps a single dedicated worker thread that owns the underlying
[StreamingPanopticEvaluator]. submit(image_id, gt_label_map,
gt_segments_info, dt_label_map, dt_segments_info) posts one image;
snapshot / finalize block on a worker reply. Mirrors the
instance and semantic background surfaces.
retain_per_image_deltas=True opts into the strict-mode bit-
equality property at merge time (ADR-0032 §"Determinism") at the
cost of ~2× streaming memory, same as the sibling streaming
evaluator.
finalize
method descriptor
finalize() -> PanopticSummary
Drain the queue, finalize the evaluator, and join the worker.
finalize_to_partial
method descriptor
ADR-0032 / ADR-0035: drain, serialize the final state, and shut the worker down.
submit
method descriptor
submit(
image_id: int,
gt_label_map: NDArray[uint32],
gt_segments_info: bytes,
dt_label_map: NDArray[uint32],
dt_segments_info: bytes,
*,
timeout: float | None = None,
) -> None
Submit one image's GT/DT pair to the worker. timeout mirrors
the instance / semantic background surfaces.
submit_png
method descriptor
submit_png(
image_id: int,
gt_png_bytes: bytes,
gt_segments_info: bytes,
dt_png_bytes: bytes,
dt_segments_info: bytes,
*,
timeout: float | None = None,
) -> None
Variant of [Self::submit] that takes panoptic PNG byte blobs
instead of pre-decoded uint32 ndarrays. Fuses libpng decode +
RGB→id + (DT side) S3 area marginals + S1/S11 validation in
one Rust pass; skips the Pillow → numpy → uint32 round-trip
the Python wrapper would otherwise drive on the main thread.
Strictly equivalent to
submit(decode_label_map_png(p), ...) on the result; the diff
is wall-time, not correctness.
Breakdown
Python wrapper around [Breakdown] / [ClassGroupBreakdown].
buckets
property
Range buckets as a list of (label, lo, hi) triples in
construction order.
Raises AttributeError if this Breakdown was built via
from_class_groups. Use class_groups instead.
class_groups
property
Class-id groups as a list of (label, class_ids) pairs in
construction order.
Raises AttributeError if this Breakdown was built via
from_ranges. Use buckets instead.
kind
property
Variant discriminator: "range" for from_ranges-constructed
breakdowns, "class_groups" for from_class_groups-constructed
ones. Use this to dispatch in validators that accept a
Breakdown of a specific shape.
from_class_groups
builtin
from_class_groups(
axis: str, groups: Sequence[tuple[str, Sequence[int]]]
) -> Breakdown
Construct from class-id-keyed groups.
groups is a sequence of (label, class_ids) pairs, one per
group. Group order on input determines the group axis index
(first pair is index 0). Strict partition discipline is enforced
— no class id may appear in two groups.
Raises ValueError on:
- empty
groups; - any group with empty
class_ids; - duplicate group labels;
- the same class id appearing in more than one group.
from_ranges
builtin
from_ranges(
axis: str, buckets: Sequence[tuple[str, float, float]]
) -> Breakdown
Construct from f64-keyed buckets.
buckets is a sequence of (label, lo, hi) triples, one per
bucket. [lo, hi] is closed on both ends per ADR-0016 (quirk
D6); an annotation whose key sits exactly on a boundary lands in
both adjacent buckets.
Raises ValueError on:
- empty
buckets; - NaN or infinite
lo/hi; lo < 0;lo > hi;- duplicate bucket labels.
ClassPanopticStats
PartialDatasetMismatch
Bases: builtins.RuntimeError
Distributed-eval partial was computed against a different dataset than the receiving evaluator.
Attributes: expected (bytes), actual (bytes).
PartialFormatMismatch
Bases: builtins.RuntimeError
Distributed-eval partial blob is structurally malformed (magic / version / CRC / kernel kind / parity / retain_iou / grid dimensions / rkyv archive).
Attributes: kind (string discriminator).
PartialParamsMismatch
Bases: builtins.RuntimeError
Distributed-eval partial was computed against different evaluation params than the receiving evaluator.
Attributes: expected (bytes), actual (bytes).
PartialPartitionOverlap
Bases: builtins.RuntimeError
Two distributed-eval partials cover the same image_id (sampler bug).
Attributes: rank_a, rank_b, image_id.
PartialRankCollision
Bases: builtins.RuntimeError
Two strict-mode distributed-eval partials share a rank_id.
Attributes: rank_id.
Dataset
Parsed-once panoptic ground-truth handle. Build via
[PyPanopticDataset::from_arrays]; pass to
[evaluate_panoptic].
from_arrays
staticmethod
from_arrays(
label_maps: dict[int, NDArray[uint32]],
segments_info: bytes,
categories: bytes,
) -> PanopticDataset
Build a dataset from pre-decoded uint32 label maps. label_maps
maps image id (int) to a 2-D numpy.ndarray of dtype uint32
whose pixel values are panoptic segment ids
(id = R + 256*G + 256²*B, post-rgb2id). segments_info and
categories are JSON byte strings.
Predictions
Parsed-once panoptic prediction handle. Sibling shape to
[PyPanopticDataset]; predictions never carry a category taxonomy
(quirk S9).
from_arrays
staticmethod
from_arrays(
label_maps: dict[int, NDArray[uint32]],
segments_info: bytes,
) -> PanopticPredictions
Build predictions from pre-decoded uint32 label maps. Mirrors
[PyPanopticDataset::from_arrays] but takes no categories.
The DT-side validation runs the full S1/S11 PNG-vs-segments_info
cross-check, recomputes per-segment areas from PNG marginals
(quirk S3), and rejects duplicate ids
([PanopticError::DuplicateSegmentId]).
Summary
Top-level panoptic evaluation result. Read via field accessors.
per_class
method descriptor
per_class() -> dict[int, ClassPanopticStats]
Per-class rows keyed by category id. Returns a Python dict
(constructed fresh on each call from the underlying BTreeMap).
per_group
method descriptor
Per-group rollup keyed by group label (ADR-0042). Empty when
the evaluator was run without class_grouping. Returns a fresh
dict on each call.
to_dict
method descriptor
dict[str, dict | None] matching panopticapi's pq_compute
return shape exactly when strict=True (the count fields are
dropped from per_class to match W8). Useful for round-tripping
with downstream tools that expect the upstream dict layout.
CategoryFilterByGrouping
dataclass
Match every class id in the named group of the active
class_grouping breakdown.
Only meaningful when the Evaluator's class_grouping is also
set; the validator at __post_init__ rejects ByGrouping
when no grouping is configured or when label is not a
grouping label.
CategoryFilterByIds
dataclass
Match an explicit set of class / category ids.
CategoryFilterFrequency
dataclass
Match by LVIS frequency tag ("r", "c", "f").
Valid only on instance evaluation against an LVIS-shaped dataset (ADR-0026). Semantic and panoptic Evaluators reject this variant at construction time per ADR-0041 / ADR-0042 — frequency tags are a sum type that doesn't generalize to non-numeric axes; class groupings carry the user's per-group rollup intent on those paradigms.
InvalidEvalParams
Bases: ValueError
Base for paradigm-specific Evaluator construction errors.
Raised at Evaluator.__post_init__ time by every paradigm in
response to invalid parameter values (out of range, wrong shape,
duplicate, conflicting, etc.). Per ADR-0039, validation runs at
construction so misconfiguration surfaces fast — evaluate()
cannot fail on misconfigured params, only on bad data.
Each subclass carries the offending field name, the offending value, and a one-line remediation pointer (typically the relevant ADR or doc page).
InvalidPanopticParams
EvalResult
dataclass
EvalResult(
summary: PanopticSummary,
_per_class_batch: object | None = None,
)
StuffThingPartition
dataclass
User-supplied override of the GT-derived stuff/thing split (ADR-0042).
Setting this on :class:Evaluator.stuff_thing_partition overrides
the dataset's isthing flag per category for the purpose of the
PQ_St / PQ_Th rollup. stuff and things must be disjoint
and both non-empty (validated at construction). Membership against
the dataset's category set is checked at evaluate() time once
the Dataset is in scope.
Evaluator
dataclass
Evaluator(
parity_mode: ParityMode = "corrected",
things_stuff_split: bool = True,
boundary: bool = False,
pq_iou_threshold: float | None = None,
category_filter: CategoryFilter | None = None,
class_grouping: Breakdown | None = None,
stuff_thing_partition: StuffThingPartition
| None = None,
)
Panoptic-quality (PQ) evaluator (ADR-0025, ADR-0042).
Sibling to :class:vernier.instance.Evaluator. Per category,
PQ_c is computed directly as
iou_c / (TP_c + 0.5*FP_c + 0.5*FN_c) (panopticapi form, quirk
W1) — algebraically equal to SQ_c * RQ_c but f64
non-associative, so the direct form is what holds bit-equality.
Global PQ / SQ / RQ are unweighted means over the
present-categories subset (W2, W3, W7); things / stuff buckets are
independent unweighted means over their subsets (W4).
Defaults match panopticapi's pq_compute shape, except for
parity_mode, which defaults to "corrected" (the ADR-0002
recommendation for net-new users); migrating users wanting
bit-exact panopticapi behavior should set parity_mode="strict".
boundary=True raises :class:NotImplementedError. Boundary PQ
is deferred to a follow-up ADR (ADR-0025 §"explicitly does not
decide" Q3 / Z1). The composition rule in the bowenc0221 fork is
not the instance-case min(mask_iou, boundary_iou) and resolving
it requires its own pass.
The four ADR-0042 fields parameterize the evaluation scope and
rollup: pq_iou_threshold overrides the canonical 0.5 PQ match
threshold (single float, not a ladder); category_filter and
class_grouping mirror the semantic surface (ADR-0041);
stuff_thing_partition overrides the dataset-derived
stuff/thing split for the PQ_St / PQ_Th rollup.
PR scope cut: kernel-side plumbing for honoring the four
custom fields (per_group rollups, threshold-aware matching,
partition override) lands alongside the ADR-0039 distributed-eval
phase. Until then, evaluate() raises
:class:NotImplementedError when any custom field is set; the
surface — fields, validation, StuffThingPartition value type —
is in place.
evaluate
evaluate(
gt: PanopticDataset,
dt: PanopticPredictions,
*,
tables: None = None,
) -> PanopticSummary
evaluate(
gt: PanopticDataset,
dt: PanopticPredictions,
*,
tables: Literal["all"] | tuple[TableName, ...],
) -> EvalResult
evaluate(
gt: PanopticDataset,
dt: PanopticPredictions,
*,
tables: Literal["all"]
| tuple[TableName, ...]
| None = None,
) -> PanopticSummary | EvalResult
Run the panoptic-quality evaluation.
gt and dt must have been built via
:meth:Dataset.from_arrays / :meth:Predictions.from_arrays
(file-loading helpers ship in a follow-up).
tables= is the opt-in keyword for result tables (ADR-0038).
Defaults to None, returning :class:Summary (existing
behavior, bit-identical to the pre-tables release). Pass
"all" or a tuple of :data:TableName values to opt into
the wider :class:EvalResult return type.
evaluate_to_partial
evaluate_to_partial(
images: Iterable[
tuple[
int,
NDArray[uint32],
bytes,
NDArray[uint32],
bytes,
]
],
*,
categories: bytes,
rank_id: int,
retain_per_image_deltas: bool = False,
) -> bytes
Run the panoptic evaluation as a per-rank streaming submit and return the serialized partial bytes (ADR-0032, ADR-0035).
images is an iterable of per-image tuples of the form
(image_id, gt_label_map, gt_segments_info, dt_label_map,
dt_segments_info) — the same shape the streaming substrate's
update consumes. The asymmetry with :meth:evaluate
(which takes pre-built :class:Dataset / :class:Predictions)
is intentional: PanopticDataset does not yet expose
per-image accessors, so the streaming path consumes per-image
records directly. A future ADR may close the gap by adding
:class:Dataset accessors.
rank_id identifies this evaluator's rank in a multi-process
eval. retain_per_image_deltas=True is required on every
rank for strict-mode bit-equality across the merge (ADR-0032
§"Determinism") at ~2x streaming memory cost.
The partial bytes can be gathered across ranks and merged on
the head rank with :meth:from_partials to produce a global
:class:Summary.
Per ADR-0042, raises :class:InvalidPanopticParams when any
of pq_iou_threshold / category_filter / class_grouping
/ stuff_thing_partition is set: extending the ADR-0032
wire format to carry the resolved custom axes is a follow-up.
Single-rank custom-params eval works today via :meth:evaluate.
from_partials
classmethod
from_partials(
categories: bytes,
partials: Sequence[bytes],
/,
*,
parity_mode: ParityMode = "corrected",
things_stuff_split: bool = True,
retain_per_image_deltas: bool = False,
) -> PanopticSummary
Merge partials (one per rank) into a global :class:Summary
(ADR-0032, ADR-0035).
categories, parity_mode, things_stuff_split, and
retain_per_image_deltas must match what each rank used to
produce its partial. Mismatches raise the structured
Partial* errors re-exported on this module.
background
background(
categories: bytes,
*,
retain_per_image_deltas: bool = False,
rank_id: int | None = None,
queue_capacity: int = 8,
worker_affinity: int | None = None,
worker_nice: int = 5,
shutdown_timeout_seconds: float = 5.0,
) -> BackgroundPanopticEvaluator
Build a :class:BackgroundEvaluator (ADR-0014 + ADR-0032)
that shares this evaluator's parity_mode and
things_stuff_split.
The returned wrapper owns a single dedicated worker thread
running a :class:StreamingEvaluator of the same shape;
:meth:BackgroundEvaluator.submit enqueues per-image
(gt_label_map, gt_segments_info, dt_label_map,
dt_segments_info) tuples and returns immediately. Use this
when the panoptic-quality kernel measurably stalls the
training loop — the per-image attribute pass is the dominant
cost at COCO-panoptic scale.
retain_per_image_deltas=True enables strict-mode bit-
equality across distributed-eval ranks (ADR-0032
§"Determinism") at ~2x streaming memory cost. The five
queueing / scheduling knobs mirror
:class:vernier.instance.Evaluator.background.
decode_label_map_png
Decode a panoptic RGB PNG into a (H, W) uint32 segment-id
label map via the rgb2id convention (panopticapi/evaluation.py
rgb2id: r + 256*g + 256²*b).
Lazy-imports Pillow; raises a structured :class:ImportError if it
isn't installed. Three channels are required — a non-RGB PNG is
rejected with :class:ValueError. Single-channel class-id label
maps (semantic-segmentation) belong in
:func:vernier.semantic.Dataset.from_files, which has its own
decoder.