vernier
A parity-preserving COCO-style evaluator for instance segmentation, panoptic
segmentation, boundary IoU, OKS keypoints, semantic segmentation, and LVIS
federated evaluation. Bit-exact against pycocotools==2.0.11, panopticapi,
and lvis-api in strict parity mode; semantic mIoU is calibrated against
a vendored mmsegmentation IoUMetric. See the per-paradigm matrix in the
README §Status & validation
for the full per-paradigm picture, plus a documented quirks survey for every
place the reference implementations disagree with themselves.
Why vernier
pycocotools==2.0.11 is the reference for COCO evaluation and it is slow,
unmaintained, and full of edge-case quirks that downstream tools either
silently fix or silently inherit. Faster reimplementations exist, but each
chooses its own quirk dispositions, leaving users to discover the divergences
empirically. vernier takes a third path:
- Auditable parity. Every divergence from
pycocotoolsis filed in the quirks survey under ADR-0002 as eitherstrict(bit-equal output, even when vernier's implementation is structurally different) orcorrected(opt-in opinionated fix). The default is strict; corrected fixes are itemized so you know exactly when your numbers diverge from a reference run. - A unified evaluation toolkit. bbox / segm / keypoints AP, boundary IoU,
panoptic PQ, semantic mIoU, and LVIS federated evaluation all live in one
package, behind one Python API and one CLI. No more wrestling with
fragmented
pycocotools,boundary-iou-api,panopticapi,lvis-api, andmmsegmentationinstalls (each has a per-paradigm migration guide). - Drop-in shim.
vernier.patch_pycocotools()swaps theCOCOevalsymbol in place — existing pycocotools-based scripts switch with one line (ADR-0007). - Rust core. The matching kernel is Rust with runtime SIMD dispatch via
pulp; the FFI layer is data conversion only. The CLI ships as a static binary, so CI pipelines can call vernier without provisioning a Python interpreter.
Install
For the CLI binary on its own:
60-second example
from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator
gt_bytes = Path("instances_val2017.json").read_bytes()
dt_bytes = Path("detections.json").read_bytes()
dataset = CocoDataset.from_json(gt_bytes)
summary = Evaluator(iou=Bbox()).evaluate(dataset, dt_bytes)
for line in summary.pretty_lines():
print(line)
Three evaluation paradigms
vernier ships three sibling submodules — pick the one whose input shape matches your model's output:
import vernier
# Detections (bbox / segm / boundary / keypoints) with scores → AP fold
vernier.instance.Evaluator()
# RGB-encoded panoptic PNGs + segments_info JSON → PQ
vernier.panoptic.Evaluator()
# Single-channel class-id label maps → mIoU / FWIoU / pAcc / mAcc
vernier.semantic.Evaluator()
The submodules are mutually exclusive (different data models, different matching rules, different parity oracles). See Three paradigms for when to use which.
Where to go next
- New to vernier? Start with Tutorials.
- Migrating from pycocotools, faster-coco-eval, panopticapi, lvis-api, or mmsegmentation? See Migrate.
- Comparing alternatives? How vernier compares is a per-library decision aid (when to pick vernier, when to keep what you have).
- Curious about speed? Benchmarks carries the per-cell medians and methodology.
- Looking for a specific recipe? See How-to.
- Need API details? See Reference.
- Want to understand the design? See Explanation or browse the ADRs.