Benchmarks

Comparison of vernier against the third-party libraries it targets parity against, on a single machine and a single git revision. The numbers below are the median total-stage wall time over the non-warmup reps recorded by the local bench harness (ADR-0017, extended cross-paradigm in ADR-0033). The IQR column reports the spread (Q3 - Q1) across the 10 measurement reps and the same value as a percentage of the median; release mode gates each cell at 5% relative IQR.

Provenance — git SHA 1fd5720bf56c · machine fingerprint 1655eb18a194 · CPU AMD EPYC-Milan Processor (x86_64) · harness mode release · build profile = cargo release defaults (opt-level=3, lto=thin, codegen-units=1, no target-cpu). The release wheel on PyPI is built with the same profile — no benchmarking-only flags.

Baselines pinned for these numbers — faster-coco-eval==1.7.2 · pycocotools==2.0.11 · boundary-iou-api @ 37d2558 · panopticapi @ 7bb4655 · mmsegmentation @ c685fe6 · lvis-api @ 031ac21 (PyPI lvis==0.5.3). Each baseline is locked in its own uv-managed venv per ADR-0017.

The LVIS section below was measured at HEAD e9d9c4d71303 after the bench paradigm landed; every other section is at 1fd5720bf56c. The next full bench refresh will collapse the LVIS section into the same SHA as the others.

For the full per-cell deep-dive (per-stage breakdown, RSS evolution, parity gating, narrative on what moved each round), see docs/engineering/benchmarking/.

This page is regenerated from the harness result tree by tools/render_benchmarks.py. To refresh after a new bench run, see the release runbook §0.

Instance — bbox / segm / boundary / keypoints (AP)

Workload: `coco_val2017_jittered_seed0`

bbox

impl	median	IQR	RSS (max)	vs vernier
vernier	360.0 ms	3.8 ms (1.07%)	236 MiB	1.00×
faster-coco-eval	2.127 s	21.8 ms (1.03%)	661 MiB	5.91×
pycocotools	5.820 s	65.8 ms (1.13%)	576 MiB	16.17×

segm

impl	median	IQR	RSS (max)	vs vernier
vernier	967.7 ms	13.4 ms (1.38%)	236 MiB	1.00×
faster-coco-eval	3.605 s	63.7 ms (1.77%)	721 MiB	3.73×
pycocotools	6.853 s	71.8 ms (1.05%)	569 MiB	7.08×

boundary

impl	median	IQR	RSS (max)	vs vernier
vernier	3.130 s	21.7 ms (0.69%)	238 MiB	1.00×
faster-coco-eval	17.837 s	48.8 ms (0.27%)	794 MiB	5.70×
boundary-iou-api	62.233 s	228.1 ms (0.37%)	666 MiB	19.88×

Workload: `coco_val2017_keypoints_jittered_seed0`

keypoints

impl	median	IQR	RSS (max)	vs vernier
vernier	135.7 ms	2.4 ms (1.76%)	102 MiB	1.00×
faster-coco-eval	1.700 s	20.1 ms (1.18%)	154 MiB	12.53×
pycocotools	2.317 s	13.3 ms (0.57%)	163 MiB	17.07×

Panoptic — PQ

Workload: `coco_panoptic_val2017_perfect`

pq

impl	median	IQR	RSS (max)	vs vernier
vernier	11.615 s	605.4 ms (5.21%) *	118 MiB	1.00×
panopticapi	35.327 s	344.5 ms (0.98%)	145 MiB	3.04×

Semantic — mIoU

Workload: `coco_val2017_semantic_perfect`

miou

impl	median	IQR	RSS (max)	vs vernier
vernier	5.070 s	25.5 ms (0.50%)	92 MiB	1.00×
mmsegmentation	21.377 s	237.4 ms (1.11%)	648 MiB	4.22×

Workload: `synthetic_semantic_n200_c19_s0`

miou

impl	median	IQR	RSS (max)	vs vernier
vernier	63.1 ms	618.8 μs (0.98%)	88 MiB	1.00×
mmsegmentation	437.5 ms	46.5 ms (10.64%) *	631 MiB	6.93×

Cells marked * next to their IQR exceeded the release-mode 5% relative-IQR gate. Median still reported; treat the gap to the next impl as the load-bearing signal rather than the precise ratio.

Instance — LVIS federated AP

Workload: `lvis_v1_val_perfect`

LVIS v1 val (19809 images, 1203 categories), GT-as-DT (perfect bbox-shape DT). Federated semantics layer over the shared AP-fold core per ADR-0026.

bbox

impl	median	IQR	RSS (max)	vs vernier
vernier	3.691 s	46.7 ms (1.26%)	1.49 GiB	1.00×
lvis-api	210.086 s	9.72 s (4.63%)	15.01 GiB	56.92×

vernier reports AP=0.9983; lvis-api reports the same headline. Strict bit-equality on the (T, R, K, A) precision tensor passes on every one of the 4.86M cells. The headline 56.9× speedup and 10× lower peak RSS are unaffected — the timing + memory measurements stand on their own.

The 22 GB peak ADR-0026 called out at acceptance was the structural upper bound on the dense Vec<Option<PerImageEval>> orchestrator grid (95M slots × 232 B). PR #179 collapsed the slot type via the Box-niche trick; the measured peak above is 1.49 GiB.

Methodology in one paragraph

Every cell runs in its own subprocess with its own uv-managed venv (one per impl), so a single Python process never has competing pycocotools-flavored packages on its sys.path. The harness records (load, evaluate, accumulate, summarize, total) wall_ns per stage, discards the warmup reps, and reports the median total plus the inter-quartile range (IQR = Q3 - Q1, with the relative spread shown as a percentage of the median). Release mode (N=10 + 2 warmup) gates each impl on relative IQR ≤ 5%; cells where the gate failed are marked with * next to their IQR value — the median is still the best estimator, just with a wider confidence band than the gate accepts. Parity is a side effect of every timing run — strict-tier (vs pycocotools) and aligned-tier (vs faster-coco-eval) where applicable; failed parity fails the cell. Memory is getrusage(RUSAGE_CHILDREN).ru_maxrss, high-water-marked across the rep set.

Benchmarks

Instance — bbox / segm / boundary / keypoints (AP)

Workload: coco_val2017_jittered_seed0

Workload: coco_val2017_keypoints_jittered_seed0