Skip to content

How to evaluate on a background thread

BackgroundEvaluator runs the evaluation kernel on a worker thread; submit() enqueues a batch and returns immediately, so the calling thread (typically a training loop) does not stall waiting for the matching kernel to finish.

Submit and finalize

from pathlib import Path
import json
from vernier.instance import Bbox, CocoDataset, Evaluator

gt = CocoDataset.from_json(Path("instances_val2017.json").read_bytes())
evaluator = Evaluator(iou=Bbox())

with evaluator.background(gt) as bg:
    for images, _ in val_loader:
        detections = model(images)
        bg.submit(json.dumps(detections).encode())
    summary = bg.finalize()
print("final AP:", summary.stats[0])

Evaluator.background(gt) carries the evaluator's iou / parity_mode / max_dets / use_cats / cast_inputs onto the worker thread; passing a CocoDataset reuses the parsed-once GT and its per-kernel derivation cache (ADR-0020) — meaningful on segm, load-bearing on boundary IoU. If your harness already holds GT JSON bytes and you do not reuse the dataset, you can also construct directly: BackgroundEvaluator(gt_bytes, iou_type="bbox").

The context-manager form drains the worker queue and joins the thread on exit. Without it, call evaluator.finalize() directly (which also drains and joins).

  • submit(detections, *, timeout=None) enqueues a batch (either loadRes-shaped JSON bytes or a Detections dict / sequence of dicts — see array ingest); returns immediately on success or raises QueueFullError if the queue is at capacity (default queue size is set at construction via queue_capacity=).
  • finalize() drains the queue, finishes evaluation, and shuts the worker down. The returned Summary is canonical. Subsequent calls raise.
  • finalize_with_tables(...) is the tables-aware variant; same drain-and-join semantics.
  • finalize_to_partial() drains, serializes the worker's final state as a partial blob, and shuts down. Combine with Evaluator.from_partials(...) on the head rank for distributed evaluation — see distributed-eval.md.

When to use it vs Evaluator.evaluate

Scenario Pick
End-of-epoch evaluation only. Evaluator.evaluate(gt, dt). Simplest path; no thread to manage.
Each evaluate(...) call adds visible latency to the training loop. BackgroundEvaluator. Frees the calling thread; same kernel.
Multi-rank distributed eval. Evaluator.evaluate_to_partial per rank + Evaluator.from_partials on the head, or the BackgroundEvaluator variant if eval is in-loop. See distributed-eval.md.

In a typical PyTorch training loop on a single GPU, the GPU is the bottleneck and a plain Evaluator.evaluate call at end-of-epoch is fine. BackgroundEvaluator is the right choice when the validation batch size is large enough that JSON-encoding and matching show up in the profiler.

Queue capacity and back-pressure

The queue is bounded. If submit is called faster than the worker can drain, the queue fills and submit raises QueueFullError (or blocks until timeout expires when timeout= is set):

try:
    evaluator.submit(detections, timeout=0.5)
except QueueFullError as e:
    log_metrics(step, dropped_batch=True, queue_capacity=e.queue_capacity)

Sizing: the queue absorbs bursts where submit runs faster than the worker drains. 2-4 is the safe default; raise it only if the profiler shows submit blocking on a full queue under your actual batch cadence.

Memory budget

BackgroundEvaluator honors the memory_budget_bytes= knob. Exceeding the budget surfaces as an OutOfBudgetError from the calling thread on the next submit, not silently from the worker.

See also

  • ADR-0014 — the worker-thread resource discipline (single worker, bounded queue, GIL drop).
  • ADR-0035 — why the public surface is submit / finalize / finalize_to_partial with no snapshot path.
  • tutorials/first-evaluation.md — the in-loop walkthrough (Path B), end-to-end on COCO val2017.
  • distributed-eval.md — multi-rank pattern.