Skip to content

Benchmarks

Comparison of vernier against the third-party libraries it targets parity against, on a single machine and a single git revision. The numbers below are the median total-stage wall time over the non-warmup reps recorded by the local bench harness (ADR-0017, extended cross-paradigm in ADR-0033). The IQR column reports the spread (Q3 - Q1) across the 10 measurement reps and the same value as a percentage of the median; release mode gates each cell at 5% relative IQR.

Provenance — git SHA 1fd5720bf56c · machine fingerprint 1655eb18a194 · CPU AMD EPYC-Milan Processor (x86_64) · harness mode release · build profile = cargo release defaults (opt-level=3, lto=thin, codegen-units=1, no target-cpu). The release wheel on PyPI is built with the same profile — no benchmarking-only flags.

Baselines pinned for these numbersfaster-coco-eval==1.7.2 · pycocotools==2.0.11 · boundary-iou-api @ 37d2558 · panopticapi @ 7bb4655 · mmsegmentation @ c685fe6 · lvis-api @ 031ac21 (PyPI lvis==0.5.3). Each baseline is locked in its own uv-managed venv per ADR-0017.

The LVIS section below was measured at HEAD e9d9c4d71303 after the bench paradigm landed; every other section is at 1fd5720bf56c. The next full bench refresh will collapse the LVIS section into the same SHA as the others.

For the full per-cell deep-dive (per-stage breakdown, RSS evolution, parity gating, narrative on what moved each round), see docs/engineering/benchmarking/.

This page is regenerated from the harness result tree by tools/render_benchmarks.py. To refresh after a new bench run, see the release runbook §0.

Instance — bbox / segm / boundary / keypoints (AP)

Workload: coco_val2017_jittered_seed0

bbox

impl median IQR RSS (max) vs vernier
vernier 360.0 ms 3.8 ms (1.07%) 236 MiB 1.00×
faster-coco-eval 2.127 s 21.8 ms (1.03%) 661 MiB 5.91×
pycocotools 5.820 s 65.8 ms (1.13%) 576 MiB 16.17×

segm

impl median IQR RSS (max) vs vernier
vernier 967.7 ms 13.4 ms (1.38%) 236 MiB 1.00×
faster-coco-eval 3.605 s 63.7 ms (1.77%) 721 MiB 3.73×
pycocotools 6.853 s 71.8 ms (1.05%) 569 MiB 7.08×

boundary

impl median IQR RSS (max) vs vernier
vernier 3.130 s 21.7 ms (0.69%) 238 MiB 1.00×
faster-coco-eval 17.837 s 48.8 ms (0.27%) 794 MiB 5.70×
boundary-iou-api 62.233 s 228.1 ms (0.37%) 666 MiB 19.88×

Workload: coco_val2017_keypoints_jittered_seed0

keypoints

impl median IQR RSS (max) vs vernier
vernier 135.7 ms 2.4 ms (1.76%) 102 MiB 1.00×
faster-coco-eval 1.700 s 20.1 ms (1.18%) 154 MiB 12.53×
pycocotools 2.317 s 13.3 ms (0.57%) 163 MiB 17.07×

Panoptic — PQ

Workload: coco_panoptic_val2017_perfect

pq

impl median IQR RSS (max) vs vernier
vernier 11.615 s 605.4 ms (5.21%) * 118 MiB 1.00×
panopticapi 35.327 s 344.5 ms (0.98%) 145 MiB 3.04×

Semantic — mIoU

Workload: coco_val2017_semantic_perfect

miou

impl median IQR RSS (max) vs vernier
vernier 5.070 s 25.5 ms (0.50%) 92 MiB 1.00×
mmsegmentation 21.377 s 237.4 ms (1.11%) 648 MiB 4.22×

Workload: synthetic_semantic_n200_c19_s0

miou

impl median IQR RSS (max) vs vernier
vernier 63.1 ms 618.8 μs (0.98%) 88 MiB 1.00×
mmsegmentation 437.5 ms 46.5 ms (10.64%) * 631 MiB 6.93×

Cells marked * next to their IQR exceeded the release-mode 5% relative-IQR gate. Median still reported; treat the gap to the next impl as the load-bearing signal rather than the precise ratio.

Instance — LVIS federated AP

Workload: lvis_v1_val_perfect

LVIS v1 val (19809 images, 1203 categories), GT-as-DT (perfect bbox-shape DT). Federated semantics layer over the shared AP-fold core per ADR-0026.

bbox

impl median IQR RSS (max) vs vernier
vernier 3.691 s 46.7 ms (1.26%) 1.49 GiB 1.00×
lvis-api 210.086 s 9.72 s (4.63%) 15.01 GiB 56.92×

vernier reports AP=0.9983; lvis-api reports the same headline. Strict bit-equality on the (T, R, K, A) precision tensor passes on every one of the 4.86M cells. The headline 56.9× speedup and 10× lower peak RSS are unaffected — the timing + memory measurements stand on their own.

The 22 GB peak ADR-0026 called out at acceptance was the structural upper bound on the dense Vec<Option<PerImageEval>> orchestrator grid (95M slots × 232 B). PR #179 collapsed the slot type via the Box-niche trick; the measured peak above is 1.49 GiB.

Methodology in one paragraph

Every cell runs in its own subprocess with its own uv-managed venv (one per impl), so a single Python process never has competing pycocotools-flavored packages on its sys.path. The harness records (load, evaluate, accumulate, summarize, total) wall_ns per stage, discards the warmup reps, and reports the median total plus the inter-quartile range (IQR = Q3 - Q1, with the relative spread shown as a percentage of the median). Release mode (N=10 + 2 warmup) gates each impl on relative IQR ≤ 5%; cells where the gate failed are marked with * next to their IQR value — the median is still the best estimator, just with a wider confidence band than the gate accepts. Parity is a side effect of every timing run — strict-tier (vs pycocotools) and aligned-tier (vs faster-coco-eval) where applicable; failed parity fails the cell. Memory is getrusage(RUSAGE_CHILDREN).ru_maxrss, high-water-marked across the rep set.