Skip to content

vernier.panoptic

Panoptic-quality (PQ) evaluation. Evaluator + evaluate(...) mirror the instance API; Predictions / Dataset carry the panoptic-specific payload (segment id maps + segment-info JSON).

Panoptic-quality (PQ) evaluation surface (ADR-0025).

Per ADR-0029, the panoptic-segmentation evaluation paradigm lives under vernier.panoptic. Sibling to :mod:vernier.instance and :mod:vernier.semantic. The Rust kernel ships in the vernier-panoptic crate; this module is a thin Python wrapper.

BackgroundEvaluator

BackgroundEvaluator(
    categories: bytes,
    parity_mode: str,
    *,
    things_stuff_split: bool = ...,
    retain_per_image_deltas: bool = ...,
    rank_id: int | None = ...,
    queue_capacity: int = ...,
    worker_affinity: int | None = ...,
    worker_nice: int = ...,
    shutdown_timeout_seconds: float = ...,
)

Background panoptic-quality evaluator (ADR-0014 + ADR-0032).

Wraps a single dedicated worker thread that owns the underlying [StreamingPanopticEvaluator]. submit(image_id, gt_label_map, gt_segments_info, dt_label_map, dt_segments_info) posts one image; snapshot / finalize block on a worker reply. Mirrors the instance and semantic background surfaces.

retain_per_image_deltas=True opts into the strict-mode bit- equality property at merge time (ADR-0032 §"Determinism") at the cost of ~2× streaming memory, same as the sibling streaming evaluator.

n_categories property

n_categories: int

Number of categories in the taxonomy.

n_images property

n_images: int

Mirror of the underlying evaluator's n_images. Advisory.

queue_depth property

queue_depth: int

Approximate count of Update messages waiting in the channel.

finalize method descriptor

finalize() -> PanopticSummary

Drain the queue, finalize the evaluator, and join the worker.

finalize_to_partial method descriptor

finalize_to_partial() -> bytes

ADR-0032 / ADR-0035: drain, serialize the final state, and shut the worker down.

submit method descriptor

submit(
    image_id: int,
    gt_label_map: NDArray[uint32],
    gt_segments_info: bytes,
    dt_label_map: NDArray[uint32],
    dt_segments_info: bytes,
    *,
    timeout: float | None = None,
) -> None

Submit one image's GT/DT pair to the worker. timeout mirrors the instance / semantic background surfaces.

submit_png method descriptor

submit_png(
    image_id: int,
    gt_png_bytes: bytes,
    gt_segments_info: bytes,
    dt_png_bytes: bytes,
    dt_segments_info: bytes,
    *,
    timeout: float | None = None,
) -> None

Variant of [Self::submit] that takes panoptic PNG byte blobs instead of pre-decoded uint32 ndarrays. Fuses libpng decode + RGB→id + (DT side) S3 area marginals + S1/S11 validation in one Rust pass; skips the Pillow → numpy → uint32 round-trip the Python wrapper would otherwise drive on the main thread. Strictly equivalent to submit(decode_label_map_png(p), ...) on the result; the diff is wall-time, not correctness.

Breakdown

Python wrapper around [Breakdown] / [ClassGroupBreakdown].

axis property

axis: str

Axis name (e.g., "area", "vehicle_taxonomy").

buckets property

buckets: list[tuple[str, float, float]]

Range buckets as a list of (label, lo, hi) triples in construction order.

Raises AttributeError if this Breakdown was built via from_class_groups. Use class_groups instead.

class_groups property

class_groups: list[tuple[str, list[int]]]

Class-id groups as a list of (label, class_ids) pairs in construction order.

Raises AttributeError if this Breakdown was built via from_ranges. Use buckets instead.

kind property

kind: Literal['range', 'class_groups']

Variant discriminator: "range" for from_ranges-constructed breakdowns, "class_groups" for from_class_groups-constructed ones. Use this to dispatch in validators that accept a Breakdown of a specific shape.

from_class_groups builtin

from_class_groups(
    axis: str, groups: Sequence[tuple[str, Sequence[int]]]
) -> Breakdown

Construct from class-id-keyed groups.

groups is a sequence of (label, class_ids) pairs, one per group. Group order on input determines the group axis index (first pair is index 0). Strict partition discipline is enforced — no class id may appear in two groups.

Raises ValueError on:

  • empty groups;
  • any group with empty class_ids;
  • duplicate group labels;
  • the same class id appearing in more than one group.

from_ranges builtin

from_ranges(
    axis: str, buckets: Sequence[tuple[str, float, float]]
) -> Breakdown

Construct from f64-keyed buckets.

buckets is a sequence of (label, lo, hi) triples, one per bucket. [lo, hi] is closed on both ends per ADR-0016 (quirk D6); an annotation whose key sits exactly on a boundary lands in both adjacent buckets.

Raises ValueError on:

  • empty buckets;
  • NaN or infinite lo / hi;
  • lo < 0;
  • lo > hi;
  • duplicate bucket labels.

ClassPanopticStats

Per-class PQ row exposed to Python (W8 strict-superset shape).

iou_sum property

iou_sum: float

n_fn property

n_fn: int

n_fp property

n_fp: int

n_tp property

n_tp: int

pq property

pq: float

rq property

rq: float

sq property

sq: float

PartialDatasetMismatch

Bases: builtins.RuntimeError

Distributed-eval partial was computed against a different dataset than the receiving evaluator.

Attributes: expected (bytes), actual (bytes).

PartialFormatMismatch

Bases: builtins.RuntimeError

Distributed-eval partial blob is structurally malformed (magic / version / CRC / kernel kind / parity / retain_iou / grid dimensions / rkyv archive).

Attributes: kind (string discriminator).

PartialParamsMismatch

Bases: builtins.RuntimeError

Distributed-eval partial was computed against different evaluation params than the receiving evaluator.

Attributes: expected (bytes), actual (bytes).

PartialPartitionOverlap

Bases: builtins.RuntimeError

Two distributed-eval partials cover the same image_id (sampler bug).

Attributes: rank_a, rank_b, image_id.

PartialRankCollision

Bases: builtins.RuntimeError

Two strict-mode distributed-eval partials share a rank_id.

Attributes: rank_id.

Dataset

Parsed-once panoptic ground-truth handle. Build via [PyPanopticDataset::from_arrays]; pass to [evaluate_panoptic].

num_categories property

num_categories: int

Number of categories.

num_images property

num_images: int

Number of images in the dataset.

from_arrays staticmethod

from_arrays(
    label_maps: dict[int, NDArray[uint32]],
    segments_info: bytes,
    categories: bytes,
) -> PanopticDataset

Build a dataset from pre-decoded uint32 label maps. label_maps maps image id (int) to a 2-D numpy.ndarray of dtype uint32 whose pixel values are panoptic segment ids (id = R + 256*G + 256²*B, post-rgb2id). segments_info and categories are JSON byte strings.

Predictions

Parsed-once panoptic prediction handle. Sibling shape to [PyPanopticDataset]; predictions never carry a category taxonomy (quirk S9).

num_images property

num_images: int

Number of images for which we have predictions.

from_arrays staticmethod

from_arrays(
    label_maps: dict[int, NDArray[uint32]],
    segments_info: bytes,
) -> PanopticPredictions

Build predictions from pre-decoded uint32 label maps. Mirrors [PyPanopticDataset::from_arrays] but takes no categories. The DT-side validation runs the full S1/S11 PNG-vs-segments_info cross-check, recomputes per-segment areas from PNG marginals (quirk S3), and rejects duplicate ids ([PanopticError::DuplicateSegmentId]).

Summary

Top-level panoptic evaluation result. Read via field accessors.

n property

n: int

n_stuff property

n_stuff: int | None

n_things property

n_things: int | None

pq property

pq: float

pq_stuff property

pq_stuff: float | None

pq_things property

pq_things: float | None

rq property

rq: float

rq_stuff property

rq_stuff: float | None

rq_things property

rq_things: float | None

sq property

sq: float

sq_stuff property

sq_stuff: float | None

sq_things property

sq_things: float | None

per_class method descriptor

per_class() -> dict[int, ClassPanopticStats]

Per-class rows keyed by category id. Returns a Python dict (constructed fresh on each call from the underlying BTreeMap).

per_group method descriptor

per_group() -> dict[str, GroupPanopticStats]

Per-group rollup keyed by group label (ADR-0042). Empty when the evaluator was run without class_grouping. Returns a fresh dict on each call.

to_dict method descriptor

to_dict(*, strict: bool = False) -> dict[str, Any]

dict[str, dict | None] matching panopticapi's pq_compute return shape exactly when strict=True (the count fields are dropped from per_class to match W8). Useful for round-tripping with downstream tools that expect the upstream dict layout.

CategoryFilterAll dataclass

CategoryFilterAll()

Match every category. The COCO default.

CategoryFilterByGrouping dataclass

CategoryFilterByGrouping(label: str)

Match every class id in the named group of the active class_grouping breakdown.

Only meaningful when the Evaluator's class_grouping is also set; the validator at __post_init__ rejects ByGrouping when no grouping is configured or when label is not a grouping label.

CategoryFilterByIds dataclass

CategoryFilterByIds(ids: frozenset[int])

Match an explicit set of class / category ids.

CategoryFilterFrequency dataclass

CategoryFilterFrequency(tag: Literal['r', 'c', 'f'])

Match by LVIS frequency tag ("r", "c", "f").

Valid only on instance evaluation against an LVIS-shaped dataset (ADR-0026). Semantic and panoptic Evaluators reject this variant at construction time per ADR-0041 / ADR-0042 — frequency tags are a sum type that doesn't generalize to non-numeric axes; class groupings carry the user's per-group rollup intent on those paradigms.

InvalidEvalParams

InvalidEvalParams(
    *, field: str, value: object, remediation: str
)

Bases: ValueError

Base for paradigm-specific Evaluator construction errors.

Raised at Evaluator.__post_init__ time by every paradigm in response to invalid parameter values (out of range, wrong shape, duplicate, conflicting, etc.). Per ADR-0039, validation runs at construction so misconfiguration surfaces fast — evaluate() cannot fail on misconfigured params, only on bad data.

Each subclass carries the offending field name, the offending value, and a one-line remediation pointer (typically the relevant ADR or doc page).

InvalidPanopticParams

InvalidPanopticParams(
    *, field: str, value: object, remediation: str
)

Bases: InvalidEvalParams

Invalid vernier.panoptic.Evaluator parameter (ADR-0042).

EvalResult dataclass

EvalResult(
    summary: PanopticSummary,
    _per_class_batch: object | None = None,
)

Opt-in result of :meth:Evaluator.evaluate when tables= is passed. Carries :class:Summary plus a polars DataFrame view of the per-class panoptic-quality breakdown.

per_class cached property

per_class: DataFrame

One row per category. Columns: category_id, pq, sq, rq, n_tp, n_fp, n_fn, iou_sum.

StuffThingPartition dataclass

StuffThingPartition(
    stuff: frozenset[int], things: frozenset[int]
)

User-supplied override of the GT-derived stuff/thing split (ADR-0042).

Setting this on :class:Evaluator.stuff_thing_partition overrides the dataset's isthing flag per category for the purpose of the PQ_St / PQ_Th rollup. stuff and things must be disjoint and both non-empty (validated at construction). Membership against the dataset's category set is checked at evaluate() time once the Dataset is in scope.

Evaluator dataclass

Evaluator(
    parity_mode: ParityMode = "corrected",
    things_stuff_split: bool = True,
    boundary: bool = False,
    pq_iou_threshold: float | None = None,
    category_filter: CategoryFilter | None = None,
    class_grouping: Breakdown | None = None,
    stuff_thing_partition: StuffThingPartition
    | None = None,
)

Panoptic-quality (PQ) evaluator (ADR-0025, ADR-0042).

Sibling to :class:vernier.instance.Evaluator. Per category, PQ_c is computed directly as iou_c / (TP_c + 0.5*FP_c + 0.5*FN_c) (panopticapi form, quirk W1) — algebraically equal to SQ_c * RQ_c but f64 non-associative, so the direct form is what holds bit-equality. Global PQ / SQ / RQ are unweighted means over the present-categories subset (W2, W3, W7); things / stuff buckets are independent unweighted means over their subsets (W4).

Defaults match panopticapi's pq_compute shape, except for parity_mode, which defaults to "corrected" (the ADR-0002 recommendation for net-new users); migrating users wanting bit-exact panopticapi behavior should set parity_mode="strict".

boundary=True raises :class:NotImplementedError. Boundary PQ is deferred to a follow-up ADR (ADR-0025 §"explicitly does not decide" Q3 / Z1). The composition rule in the bowenc0221 fork is not the instance-case min(mask_iou, boundary_iou) and resolving it requires its own pass.

The four ADR-0042 fields parameterize the evaluation scope and rollup: pq_iou_threshold overrides the canonical 0.5 PQ match threshold (single float, not a ladder); category_filter and class_grouping mirror the semantic surface (ADR-0041); stuff_thing_partition overrides the dataset-derived stuff/thing split for the PQ_St / PQ_Th rollup.

PR scope cut: kernel-side plumbing for honoring the four custom fields (per_group rollups, threshold-aware matching, partition override) lands alongside the ADR-0039 distributed-eval phase. Until then, evaluate() raises :class:NotImplementedError when any custom field is set; the surface — fields, validation, StuffThingPartition value type — is in place.

evaluate

evaluate(
    gt: PanopticDataset,
    dt: PanopticPredictions,
    *,
    tables: None = None,
) -> PanopticSummary
evaluate(
    gt: PanopticDataset,
    dt: PanopticPredictions,
    *,
    tables: Literal["all"] | tuple[TableName, ...],
) -> EvalResult
evaluate(
    gt: PanopticDataset,
    dt: PanopticPredictions,
    *,
    tables: Literal["all"]
    | tuple[TableName, ...]
    | None = None,
) -> PanopticSummary | EvalResult

Run the panoptic-quality evaluation.

gt and dt must have been built via :meth:Dataset.from_arrays / :meth:Predictions.from_arrays (file-loading helpers ship in a follow-up).

tables= is the opt-in keyword for result tables (ADR-0038). Defaults to None, returning :class:Summary (existing behavior, bit-identical to the pre-tables release). Pass "all" or a tuple of :data:TableName values to opt into the wider :class:EvalResult return type.

evaluate_to_partial

evaluate_to_partial(
    images: Iterable[
        tuple[
            int,
            NDArray[uint32],
            bytes,
            NDArray[uint32],
            bytes,
        ]
    ],
    *,
    categories: bytes,
    rank_id: int,
    retain_per_image_deltas: bool = False,
) -> bytes

Run the panoptic evaluation as a per-rank streaming submit and return the serialized partial bytes (ADR-0032, ADR-0035).

images is an iterable of per-image tuples of the form (image_id, gt_label_map, gt_segments_info, dt_label_map, dt_segments_info) — the same shape the streaming substrate's update consumes. The asymmetry with :meth:evaluate (which takes pre-built :class:Dataset / :class:Predictions) is intentional: PanopticDataset does not yet expose per-image accessors, so the streaming path consumes per-image records directly. A future ADR may close the gap by adding :class:Dataset accessors.

rank_id identifies this evaluator's rank in a multi-process eval. retain_per_image_deltas=True is required on every rank for strict-mode bit-equality across the merge (ADR-0032 §"Determinism") at ~2x streaming memory cost.

The partial bytes can be gathered across ranks and merged on the head rank with :meth:from_partials to produce a global :class:Summary.

Per ADR-0042, raises :class:InvalidPanopticParams when any of pq_iou_threshold / category_filter / class_grouping / stuff_thing_partition is set: extending the ADR-0032 wire format to carry the resolved custom axes is a follow-up. Single-rank custom-params eval works today via :meth:evaluate.

from_partials classmethod

from_partials(
    categories: bytes,
    partials: Sequence[bytes],
    /,
    *,
    parity_mode: ParityMode = "corrected",
    things_stuff_split: bool = True,
    retain_per_image_deltas: bool = False,
) -> PanopticSummary

Merge partials (one per rank) into a global :class:Summary (ADR-0032, ADR-0035).

categories, parity_mode, things_stuff_split, and retain_per_image_deltas must match what each rank used to produce its partial. Mismatches raise the structured Partial* errors re-exported on this module.

background

background(
    categories: bytes,
    *,
    retain_per_image_deltas: bool = False,
    rank_id: int | None = None,
    queue_capacity: int = 8,
    worker_affinity: int | None = None,
    worker_nice: int = 5,
    shutdown_timeout_seconds: float = 5.0,
) -> BackgroundPanopticEvaluator

Build a :class:BackgroundEvaluator (ADR-0014 + ADR-0032) that shares this evaluator's parity_mode and things_stuff_split.

The returned wrapper owns a single dedicated worker thread running a :class:StreamingEvaluator of the same shape; :meth:BackgroundEvaluator.submit enqueues per-image (gt_label_map, gt_segments_info, dt_label_map, dt_segments_info) tuples and returns immediately. Use this when the panoptic-quality kernel measurably stalls the training loop — the per-image attribute pass is the dominant cost at COCO-panoptic scale.

retain_per_image_deltas=True enables strict-mode bit- equality across distributed-eval ranks (ADR-0032 §"Determinism") at ~2x streaming memory cost. The five queueing / scheduling knobs mirror :class:vernier.instance.Evaluator.background.

decode_label_map_png

decode_label_map_png(path: str | Path) -> NDArray[uint32]

Decode a panoptic RGB PNG into a (H, W) uint32 segment-id label map via the rgb2id convention (panopticapi/evaluation.py rgb2id: r + 256*g + 256²*b).

Lazy-imports Pillow; raises a structured :class:ImportError if it isn't installed. Three channels are required — a non-RGB PNG is rejected with :class:ValueError. Single-channel class-id label maps (semantic-segmentation) belong in :func:vernier.semantic.Dataset.from_files, which has its own decoder.

Type aliases