How to evaluate on a background thread
BackgroundEvaluator runs the evaluation kernel on a worker thread;
submit() enqueues a batch and returns immediately, so the calling
thread (typically a training loop) does not stall waiting for the
matching kernel to finish.
Submit and finalize
from pathlib import Path
import json
from vernier.instance import Bbox, CocoDataset, Evaluator
gt = CocoDataset.from_json(Path("instances_val2017.json").read_bytes())
evaluator = Evaluator(iou=Bbox())
with evaluator.background(gt) as bg:
for images, _ in val_loader:
detections = model(images)
bg.submit(json.dumps(detections).encode())
summary = bg.finalize()
print("final AP:", summary.stats[0])
Evaluator.background(gt) carries the evaluator's iou /
parity_mode / max_dets / use_cats / cast_inputs onto the
worker thread; passing a CocoDataset reuses the parsed-once GT and
its per-kernel derivation cache (ADR-0020) — meaningful on segm,
load-bearing on boundary IoU. If your harness already holds GT JSON
bytes and you do not reuse the dataset, you can also construct
directly: BackgroundEvaluator(gt_bytes, iou_type="bbox").
The context-manager form drains the worker queue and joins the
thread on exit. Without it, call evaluator.finalize() directly
(which also drains and joins).
submit(detections, *, timeout=None)enqueues a batch (either loadRes-shaped JSON bytes or aDetectionsdict / sequence of dicts — see array ingest); returns immediately on success or raisesQueueFullErrorif the queue is at capacity (default queue size is set at construction viaqueue_capacity=).finalize()drains the queue, finishes evaluation, and shuts the worker down. The returned Summary is canonical. Subsequent calls raise.finalize_with_tables(...)is the tables-aware variant; same drain-and-join semantics.finalize_to_partial()drains, serializes the worker's final state as a partial blob, and shuts down. Combine withEvaluator.from_partials(...)on the head rank for distributed evaluation — seedistributed-eval.md.
When to use it vs Evaluator.evaluate
| Scenario | Pick |
|---|---|
| End-of-epoch evaluation only. | Evaluator.evaluate(gt, dt). Simplest path; no thread to manage. |
Each evaluate(...) call adds visible latency to the training loop. |
BackgroundEvaluator. Frees the calling thread; same kernel. |
| Multi-rank distributed eval. | Evaluator.evaluate_to_partial per rank + Evaluator.from_partials on the head, or the BackgroundEvaluator variant if eval is in-loop. See distributed-eval.md. |
In a typical PyTorch training loop on a single GPU, the GPU is the
bottleneck and a plain Evaluator.evaluate call at end-of-epoch is
fine. BackgroundEvaluator is the right choice when the validation
batch size is large enough that JSON-encoding and matching show up
in the profiler.
Queue capacity and back-pressure
The queue is bounded. If submit is called faster than the worker
can drain, the queue fills and submit raises QueueFullError
(or blocks until timeout expires when timeout= is set):
try:
evaluator.submit(detections, timeout=0.5)
except QueueFullError as e:
log_metrics(step, dropped_batch=True, queue_capacity=e.queue_capacity)
Sizing: the queue absorbs bursts where submit runs faster than
the worker drains. 2-4 is the safe default; raise it only if the
profiler shows submit blocking on a full queue under your actual
batch cadence.
Memory budget
BackgroundEvaluator honors the memory_budget_bytes= knob.
Exceeding the budget surfaces as an OutOfBudgetError from the
calling thread on the next submit, not silently from the worker.
See also
- ADR-0014 — the worker-thread resource discipline (single worker, bounded queue, GIL drop).
- ADR-0035 — why the
public surface is
submit/finalize/finalize_to_partialwith no snapshot path. tutorials/first-evaluation.md— the in-loop walkthrough (Path B), end-to-end on COCO val2017.distributed-eval.md— multi-rank pattern.