How vernier compares
vernier sits in a small ecosystem of COCO-style evaluation libraries. This page is a decision aid — when does vernier fit your case, when would you still pick a specific alternative, and what does each alternative actually provide. For mechanical "rewrite my imports" instructions, see the migration guides.
At a glance
| Library | Paradigms | Parity contract | Performance vs vernier | When you'd still pick it |
|---|---|---|---|---|
pycocotools |
instance (bbox / segm / keypoints) | The reference | ~7–18× slower | You need the literal pycocotools printed table for an external system that scrapes it |
faster-coco-eval |
instance (bbox / segm / keypoints / boundary) | "Faster, mostly compatible" — quirks chosen silently | ~4–13× slower | You're already running it in production and don't need vernier's auditable parity surface |
panopticapi |
panoptic | The reference | ~1.07× slower | You explicitly need the pq_compute_* script outputs unchanged |
lvis-api |
LVIS federated | The reference | Vernier reuses orchestration only | Your tooling depends on the LVISEval instance attributes |
boundary-iou-api |
boundary IoU only | The reference | ~20× slower on val2017 perfect-DT | You're running an external evaluation script that loads boundary_iou.coco_instance_api.COCOeval by name |
mmsegmentation |
semantic | One of three references vernier targets | Vernier-only baseline today; cross-impl bench externally blocked | You need the full mmseg.evaluation registry surface (it's a training framework, not just an evaluator) |
Numbers above reference the benchmarks page and the engineering benchmarking notes.
pycocotools
pycocotools
is the reference COCO evaluation library — the source of truth that every
other library (including vernier) calibrates against. It ships COCOeval
for bbox / segm / keypoints AP, the Mask module for RLE codec and polygon
rasterization, and the COCO GT loader.
vernier reproduces pycocotools==2.0.11 semantics bit-for-bit in strict
parity mode (the default). Every quirk — float casting in IoU computation,
the setDetParams() defaults, the < vs <= comparison in score-tied
matches — is filed in
ADR-0002
as either strict (bit-equal output) or corrected (opt-in opinionated
fix). The full table lives in
docs/engineering/pycocotools-quirks.md.
vernier is ~7–18× faster on val2017 across bbox / segm / keypoints (see the
benchmarks). The drop-in shim
(ADR-0007)
keeps from pycocotools.cocoeval import COCOeval working in existing
scripts via vernier.patch_pycocotools().
Pick pycocotools instead when an external system parses
COCOeval.summarize()'s text output and you need that exact text. (Even
then, vernier's strict-mode --emit text matches it byte-for-byte —
ADR-0015
pins the behavior — but if your dependency reaches into private attributes
of COCOeval, the shim is the safer path.)
faster-coco-eval
faster-coco-eval is the
most prominent fast reimplementation of pycocotools' COCOeval. It exposes
a COCOeval_faster class with the same API and an init_as_pycocotools()
helper that monkey-patches pycocotools.cocoeval.COCOeval.
The contract is "faster, mostly compatible." Each pycocotools quirk gets fixed or kept based on the maintainer's judgment, and the project doesn't publish a quirks table. In practice this means a faster-coco-eval run will sometimes diverge from the reference and you have to read source to know why and where.
vernier targets the same drop-in pattern but with auditable parity. Every quirk has a row in the disposition table; strict mode reproduces pycocotools bit-for-bit; corrected fixes are listed and opt-in. The performance gap is real on val2017 — vernier is ~4–13× faster on bbox / segm / keypoints / boundary — but the headline benefit is "you can prove what your numbers mean".
Pick faster-coco-eval instead when you have an existing CI pipeline
running it stably and the auditable-parity property doesn't justify a
migration. The migration cost is small (one-line shim) but real.
panopticapi
panopticapi is the reference
panoptic-quality (PQ) evaluator. It ships pq_compute_single_core and
pq_compute_multi_core over the COCO panoptic GT format (RGB-encoded PNGs
with id = R + G*256 + B*256²).
vernier-panoptic is a sibling crate to vernier-core; both depend on vernier-mask, neither depends on the other. PQ and AP have different matching rules and different data models — the architectural firewall keeps the two folds from drifting toward each other (see ADR-0025). Strict-mode parity is bit-equal at per-class TP/FP/FN counts.
vernier-panoptic is ~1.1× faster than panopticapi on val2017 perfect-DT
(after the round-2 streaming refactor + FxHash optimization, PR #188). The
gap is not the headline — panopticapi is already efficient. The headline
is the unified surface: vernier.panoptic.Evaluator lives next to
vernier.instance.Evaluator and vernier.semantic.Evaluator, and the CLI
covers all three.
Pick panopticapi instead when an external system parses the
pq_compute_* script's stdout output and you need that exact text. (As
above, vernier reproduces it in strict mode.)
lvis-api
lvis-api extends pycocotools to
the LVIS dataset, with federated evaluation: the not_exhaustive and
neg_category_ids fields per image scope which categories actually
contribute to the AP fold. The library ships LVISEval and friends.
vernier's LVIS support
(ADR-0026)
implements federated evaluation in vernier-core's existing AP fold — no
fork of matching.rs or accumulate.rs, just a different category set per
image. The semantics match lvis-api's outputs.
Pick lvis-api instead when your code depends on the LVISEval
instance's attribute layout. There's no equivalent bit-for-bit text
output to scrape (LVIS doesn't print a fixed-format summary), so the
choice is mostly about migration cost.
The original ADR-0026 §"Known follow-up" called out a >22 GB structural
peak from the dense Vec<Option<PerImageEval>> orchestrator grid (95M
slots × 232 B). PR #179 collapsed the slot type via the Box-niche
trick; the structural floor is now under 1 GB on full LVIS val. The
small per-cell precision-tensor drift that briefly tracked here as a
follow-up was root-caused to the oracle's area > 0 GT filter (quirk
AG6) and is now mirrored in strict mode; the full-val bbox cell passes
bit-equal.
boundary-iou-api
boundary-iou-api is the
reference for boundary IoU — the metric that ranks segmentation by how
well a prediction's boundary aligns with the ground truth, downweighting
the well-aligned interior. It ships a COCOeval subclass that swaps the
IoU kernel; everything else inherits from pycocotools.
vernier-core implements boundary IoU as an isolated subsystem
(ADR-0010)
with its own oracle and quirks file. The dilation ratio default (0.02)
matches bowenc0221's reference value.
vernier is significantly faster on boundary IoU — the round 2026-05 perf push (PRs #181/#182/#184/#185/#186) brought val2017 perfect-DT from ~21 s to ~3 s, post-bbox-cropped erode and bbox-cropped XOR scan. See the boundary cell of the benchmarks for the current margins.
Pick boundary-iou-api instead when an evaluation script imports
boundary_iou.coco_instance_api.COCOeval by name and you can't or
don't want to redirect the import.
mmsegmentation
mmsegmentation is a full
training framework for semantic segmentation, not just an evaluator. Its
mmseg.evaluation.IoUMetric is one of three references vernier-semantic
calibrates against (the other two are
mcordts/cityscapesScripts
and the Pascal VOC / ADE20K reference scripts).
vernier-semantic
ships per-class IoU, mIoU, FWIoU, pAcc, and mAcc. The vendored oracle
harness lands under
ADR-0036
(still proposed); per-paradigm parity status lives in
README §Status & validation.
Pick mmsegmentation instead when you need the broader training
framework: model registry, training loop, dataset pipelines, config
system. vernier-semantic is the evaluation half only; the training half
is out of scope.
What vernier doesn't do (yet)
A short, honest list:
- Visualization tooling. vernier produces numbers and tables; it does not draw bounding boxes on images. Tools like supervision and fiftyone cover that ground.
- Training-loop integration beyond two supported entry points.
Evaluator.evaluate()at end-of-epoch is the default; theBackgroundEvaluatorsurface (ADR-0014) is the secondary one for in-loop submission without GIL stalls. Multi-rank rank-local + gather lives at distributed-eval how-to. Full callbacks-and-loggers integration is downstream-framework territory. - Pretty HTML reports. The CLI emits text and JSON; HTML report generation is a follow-up tool that consumes the JSON output.
- A model zoo / pretrained predictor. vernier evaluates predictions
you already have. Generating predictions is a different product
(
mmsegmentation,mmdetection,detectron2).