`vernier.semantic`

Semantic-segmentation evaluation (mIoU + per-class confusion matrix). Evaluator + evaluate(...) mirror the instance API; the *_IGNORE_LABEL / *_N_CLASSES constants are convenience defaults for common public datasets.

Semantic-segmentation evaluation surface (ADR-0028).

Per ADR-0029, the semantic-segmentation evaluation paradigm lives under vernier.semantic. Sibling to :mod:vernier.instance (the AP fold) and :mod:vernier.panoptic (panoptic-quality). The Rust kernel ships in the vernier-semantic crate; this module is a thin Python wrapper.

Surface:

:class:Dataset / :class:Predictions — frozen dataclasses carrying the per-image label maps + dataset-level config (n_classes / ignore_label).
:class:Evaluator — frozen dataclass holding parity_mode and optional label_remap. Evaluator.evaluate(gt, dt) returns a :class:Summary (the FFI pyclass).
:class:Summary / :class:ClassSemanticStats / :class:ConfusionMatrix — re-exported FFI pyclasses (under their unprefixed names per ADR-0029).
Per-dataset presets — :meth:Dataset.cityscapes, :meth:Dataset.ade20k, :meth:Dataset.pascal_voc — bake the canonical ignore-label and class-count conventions; the user only passes the PNG paths.

BackgroundEvaluator

BackgroundEvaluator(
    n_classes: int,
    parity_mode: str,
    *,
    ignore_label: int | None = ...,
    rank_id: int | None = ...,
    queue_capacity: int = ...,
    worker_affinity: int | None = ...,
    worker_nice: int = ...,
    shutdown_timeout_seconds: float = ...,
)

Background semantic-segmentation evaluator (ADR-0014 + ADR-0032).

Wraps a single dedicated worker thread that owns the underlying [StreamingSemanticEvaluator]. submit(image_id, gt, dt) posts one image; snapshot() / finalize() block on a worker reply.

Mirrors the panoptic and instance background surfaces: same constructor knobs (queue_capacity, worker_affinity, worker_nice, shutdown_timeout_seconds), same context-manager lifecycle, same to_partial / finalize_to_partial for distributed-eval gather (ADR-0032).

n_classes `property`

n_classes: int

Number of evaluation classes (constant for the lifetime of the evaluator).

n_images `property`

n_images: int

Mirror of the underlying evaluator's n_images. Advisory — updated by the worker after each successful submit.

queue_depth `property`

queue_depth: int

Approximate count of Update messages waiting in the channel.

finalize `method descriptor`

finalize() -> SemanticSummary

Drain the queue, finalize the evaluator, and join the worker.

finalize_to_partial `method descriptor`

finalize_to_partial() -> bytes

ADR-0032 / ADR-0035: drain, serialize the final state, and shut the worker down.

submit `method descriptor`

submit(
    image_id: int,
    gt: NDArray[unsignedinteger[Any]],
    dt: NDArray[unsignedinteger[Any]],
    *,
    timeout: float | None = None,
) -> None

Submit one image's (gt, dt) label-map pair to the worker. Accepts uint8 / uint16 / uint32 2-D ndarrays (ADR-0037); the worker walks at native dtype without an upcast. timeout mirrors the instance background:

None (default) → block until a slot is free
0.0 → single non-blocking attempt; raise QueueFullError if the queue is full
t > 0.0 → wait up to t seconds; raise QueueFullError on timeout

submit_png `method descriptor`

submit_png(
    image_id: int,
    gt_png_bytes: bytes,
    dt_png_bytes: bytes,
    *,
    timeout: float | None = None,
) -> None

Submit one image's (gt_png_bytes, dt_png_bytes) 8-bit grayscale PNG pair to the worker (ADR-0037). Decodes synchronously on the FFI thread (under py.detach) and sends the native-width u8 label maps across the channel; the worker folds at native width without a 4× upcast.

Strictly equivalent to submit(image_id, decode_label_map_png(gt_path).astype(uint32), decode_label_map_png(dt_path).astype(uint32), ...); the diff is wall-time, not correctness.

Format contract: 8-bit grayscale only. timeout mirrors submit.

Breakdown

Python wrapper around [Breakdown] / [ClassGroupBreakdown].

axis `property`

axis: str

Axis name (e.g., "area", "vehicle_taxonomy").

buckets `property`

buckets: list[tuple[str, float, float]]

Range buckets as a list of (label, lo, hi) triples in construction order.

Raises AttributeError if this Breakdown was built via from_class_groups. Use class_groups instead.

class_groups `property`

class_groups: list[tuple[str, list[int]]]

Class-id groups as a list of (label, class_ids) pairs in construction order.

Raises AttributeError if this Breakdown was built via from_ranges. Use buckets instead.

kind `property`

kind: Literal['range', 'class_groups']

Variant discriminator: "range" for from_ranges-constructed breakdowns, "class_groups" for from_class_groups-constructed ones. Use this to dispatch in validators that accept a Breakdown of a specific shape.

from_class_groups `builtin`

from_class_groups(
    axis: str, groups: Sequence[tuple[str, Sequence[int]]]
) -> Breakdown

Construct from class-id-keyed groups.

groups is a sequence of (label, class_ids) pairs, one per group. Group order on input determines the group axis index (first pair is index 0). Strict partition discipline is enforced — no class id may appear in two groups.

Raises ValueError on:

empty groups;
any group with empty class_ids;
duplicate group labels;
the same class id appearing in more than one group.

from_ranges `builtin`

from_ranges(
    axis: str, buckets: Sequence[tuple[str, float, float]]
) -> Breakdown

Construct from f64-keyed buckets.

buckets is a sequence of (label, lo, hi) triples, one per bucket. [lo, hi] is closed on both ends per ADR-0016 (quirk D6); an annotation whose key sits exactly on a boundary lands in both adjacent buckets.

Raises ValueError on:

empty buckets;
NaN or infinite lo / hi;
lo < 0;
lo > hi;
duplicate bucket labels.

ClassSemanticStats

Per-class semantic-segmentation row exposed to Python.

Mirrors [vernier_semantic::ClassSemanticStats] one-to-one. The NaN-vs-0.0 disposition for zero-support classes (quirk AL2) is already baked in by the time the row reaches Python; the Python side reads the values as-is.

accuracy `property`

accuracy: float

class_id `property`

class_id: int

iou `property`

iou: float

n_dt_pixels `property`

n_dt_pixels: int

n_gt_pixels `property`

n_gt_pixels: int

precision `property`

precision: float

ConfusionMatrix

(N, N) confusion-matrix view exposed to Python.

The flat Vec<u64> storage on the Rust side is exposed as a 2-D numpy.ndarray via [PyConfusionMatrix::counts]. ADR-0028 §F1 promotes the matrix to a first-class output: downstream calibration / error-decomposition / model-diff tools consume it directly.

n_classes `property`

n_classes: int

Number of evaluation classes; the matrix is (n_classes, n_classes).

total `property`

total: int

Total pixel count across all cells. Equals sum(counts) and is useful for sanity checks.

trace `property`

trace: int

Trace (number of correct-class pixels). Equals sum(diag(counts)) and is the numerator of pixel accuracy.

counts `method descriptor`

counts() -> NDArray[uint64]

(N, N) numpy view of the confusion matrix as a fresh numpy.uint64 array. The buffer is materialized on each call (cheap for typical N ≤ 150) so the FFI boundary doesn't have to manage a long-lived borrow into the Rust-side Vec<u64>.

get `method descriptor`

get(g: int, d: int) -> int

counts[g, d] lookup. Returns the integer pixel count for the given (gt_class, pred_class) cell. Bounds-checked.