Storage And Runs¶

This document describes durable execution state in batchor.

If you only read one implementation-oriented document after the getting-started guides, this should probably be it. The central idea in the repo is that a Run is durable, rehydratable state, not just an in-memory helper object.

Read ARCHITECTURE.md first if you want the canonical runtime diagrams and module-boundary view. This document picks up where that one stops: durable state, resume rules, operator controls, and artifact lifecycle.

Run model¶

BatchRunner.start(job) returns a durable Run handle immediately.

The public handle exposes:

run_id
cached status
cached control_state
cached control_reason
is_finished
refresh()
wait()
summary()
snapshot()
pause()
resume()
cancel()
results()
read_terminal_results()
export_terminal_results()
export_artifacts()
prune_artifacts()

Important semantics:

status is cached from the last summary read or refresh
refresh() performs one poll-and-submit pass
wait() repeatedly advances through the internal executor until the run is terminal, and skips the poll-interval sleep when the structured cycle outcome reports durable progress
results() and artifact lifecycle operations are terminal-only
terminal currently means either completed or completed_with_failures
read_terminal_results() and export_terminal_results() are incremental APIs for already-terminal items and are safe to call before the whole run finishes

Control plane and payload plane¶

batchor persists durable state in two layers.

The control plane is the state store:

run config
run control state
run control reason
item rows and attempts
active batch rows
parsed outputs and failure records
terminal result sequence metadata
artifact pointers
ingest checkpoints

The payload plane is the artifact store:

request JSONL files
raw output JSONL
raw error JSONL

This split is intentional. Large artifacts remain files, while the durable store keeps the indexed state needed for orchestration and rehydration.

Current durable backends¶

SQLite is the default durable backend.

Postgres is also implemented for shared control-plane state when callers explicitly construct PostgresStorage(...).

In-memory storage exists for tests and short-lived local runs.

The SQLite/OpenAI path is covered by the default smoke test. Gemini provider wiring is covered with fake-client integration tests, and an opt-in live Vertex/GCS smoke test covers the real control and payload planes. Postgres storage compatibility is validated in a dedicated storage-contract CI job and requires BATCHOR_TEST_POSTGRES_DSN for equivalent local coverage.

Storage responsibilities¶

Current storage responsibilities include:

persisting public run config
persisting run control state
persisting run control reason
persisting deterministic-source ingest checkpoints when available
persisting item state and attempts
persisting terminal result sequence metadata for incremental reads
persisting submitted batch metadata
persisting request-artifact pointers for replayable submissions
persisting batch output/error artifact pointers for raw provider payload retention
persisting provider outputs/errors needed for rehydration
reconstructing structured results on reload
persisting storage metadata such as schema version

Durable artifacts now flow through the ArtifactStore contract. The built-in implementation is LocalArtifactStore, which stores replayable request JSONL and optional raw output/error files on local disk or a shared mounted volume. New local directories/files are created with owner-only permissions where the platform supports them.

For provider configs that contain secrets, durable storage persists only the public provider config. Secret material such as the API key must come from in-memory config or the environment when a rehydrated run needs to talk to the provider again.

SQLite behavior¶

SQLite is the default because the package is optimized first for local durable runs.

Current SQLite characteristics:

WAL mode is enabled by default
hot-path indexes exist for pending-item, active-batch, and artifact-pointer queries
schema version is exposed through SQLiteStorage.schema_version
startup migration behavior is additive and documented separately in STORAGE_MIGRATIONS.md

For SQLite-backed runs, the default artifact root is a sibling *_artifacts/ directory beside the database.

Postgres behavior¶

Postgres exists as an opt-in control-plane backend for cases where SQLite is not enough, such as shared state across processes or hosts. PostgresStorage uses psycopg v3. Plain postgresql:// and postgres:// DSNs are normalized to postgresql+psycopg:// before engine creation; explicit SQLAlchemy driver schemes are preserved.

Important operational rule:

if you use Postgres across machines or fresh processes, provide a shared artifact root explicitly

Postgres stores the control plane, not the large request/output files themselves.

Attempt accounting¶

Item attempt_count records consumed provider attempts.

Successful submitted completions increment the counter once.
Counted item-level provider or parse failures increment the counter before retry/permanent-failure decisions.
Local pre-submission rejections, control-plane quota pauses, batch-level reset-to-pending paths, and missing output rows marked with count_attempt=False do not increment it.

This keeps terminal exports aligned with audit expectations: a first-attempt success reports attempt_count == 1, and a retryable counted failure followed by success reports attempt_count == 2.

File-source checkpoints¶

For deterministic built-in sources, storage persists a source checkpoint with:

source kind
source path/reference
source fingerprint
next durable item index
opaque checkpoint payload
ingestion completion flag

The opaque checkpoint payload is source-owned. batchor persists it but does not attempt to interpret arbitrary custom-source resume tokens. CompositeItemSource keeps this storage contract unchanged by presenting one logical source to the runner:

source_kind is composite
source_ref is canonical JSON of the ordered child source identities
source_fingerprint is the hash of that ordered child identity list
checkpoint_payload tracks the active child source plus that child's opaque checkpoint

There is still one ingest-checkpoint row per run, not one row per child source.

This enables start(job, run_id=...) to resume ingestion rather than rematerializing already-ingested rows, as long as the source still matches the stored identity.

Every run receives a generic incomplete-ingestion row atomically with run creation, including arbitrary iterables. This prevents the empty-run terminal calculation from winning while the first slice is still being rendered. Non-checkpointable iterables still do not support fresh-process mid-ingest recovery; manual pause retains their iterator only in the execution owner process and resumes it exactly once.

Schema upgrade backfills every run missing a generic marker as ingestion-incomplete and records its current materialized item count. Storage cannot distinguish a completed arbitrary iterable from one paused between chunks, even when some items already exist, so assuming completion would preserve the false-terminal race. An empty source can be safely reattached from index zero; a partially materialized non-checkpointable legacy source must be treated as an abandoned tail and explicitly cancelled after any active batches drain.

Rehydration rules¶

runner.get_run(run_id) must work from a fresh runner when it points at the same durable storage.

Resume happens at multiple layers:

runner.get_run(run_id) rehydrates an existing durable run
start(job, run_id=...) resumes the same run instead of creating a new one
fresh-process recovery requeues local-but-not-submitted items back to pending
deterministic built-in sources can resume ingestion from a checkpoint when the source fingerprint still matches

Successful rehydration depends on:

the durable storage still containing the run rows
the configured artifact store still containing any artifacts needed for retry/resume
structured output model classes still being importable
credentials being available when a refresh needs to talk to the provider

Fresh-process resume also requeues any queued_local items back to pending before submission resumes. This recovery happens when an execution owner advances the run, not when a caller merely reads a handle with get_run() or changes control state. When start(job, run_id=...) resumes a run with active provider batches, it first performs a poll-only reconciliation pass. Terminal provider batches are consumed, failed batches update retry backoff, and any active backoff prevents new source materialization or submission until the backoff expires.

While ingestion is active, the normal fast path persists 1,000 items at once. A separate work clock flushes a smaller atomic slice when slow source or prompt work crosses the cooperative time budget. At each durable boundary, active provider batches are polled when the provider cadence is due even if submission retry backoff is active. Poll-induced pause, cancellation, or retry backoff blocks further checkpointed materialization/submission and leaves the source incomplete at its last durable position.

Provider batch creation uses a durable intent written before the remote call. Successful creation is finalized in one local transaction that registers the batch and links every submitted item. A crash or ambiguous connection failure between the remote call and finalization leaves an indeterminate intent; advancement raises RunSubmissionIndeterminateError until an operator either attaches the confirmed remote batch or verifies that no batch exists. This is the explicit boundary where a generic provider cannot supply exactly-once creation or discovery.

Finalized intents also reserve OpenAI enqueue capacity in an atomic counter keyed by provider, model, and configured quota scope. The reservation spans runs and is released idempotently only after terminal batch consumption/reset succeeds. Provider-terminal batches with submitted items or unreleased capacity remain locally reconcilable, so download, parsing, partial item writes, and post-consumption crashes replay safely.

Active submitted rows created before submission intents existed are included dynamically as legacy reservations when a scope admits new work. This prevents an upgrade from granting the full configured budget on top of already-enqueued provider batches.

Resume compatibility intentionally ignores non-persisted secret fields such as provider API keys. Rehydrated OpenAI runs need OPENAI_API_KEY when no explicit in-memory config is supplied. Gemini Developer API runs need GEMINI_API_KEY; Vertex runs need Application Default Credentials plus the persisted or ambient project, location, and GCS prefix configuration.

For deterministic-source resume, the caller must also reuse the same run_id and provide the same source identity/fingerprint. An incomplete ingest checkpoint is part of terminal-state calculation. In the same process, Run.resume() continues ingestion when the original job remains attached. A rehydrated run without that source stays non-terminal and raises RunIngestionSourceRequiredError from refresh()/wait(), directing the caller to start(job, run_id=...). For composite sources, that includes the same ordered child identities; changing the child order or swapping one file changes the logical source identity.

Built-in deterministic sources currently include:

CompositeItemSource
CsvItemSource
JsonlItemSource
ParquetItemSource

For checkpointed ingestion, durable stores append materialized item chunks and advance the ingest checkpoint in one storage operation. Parquet and composite sources can also identify an end-of-source checkpoint from cheap metadata, so a resume after all rows were materialized but before the final completion flag was written can mark ingestion complete without re-reading the source data or re-rendering prompts.

Once an item has a durable request artifact pointer, batchor prunes large inline request-building fields from the control-plane store and relies on the artifact for later retries. Provider-specific request rows are preserved in those artifacts. During replay, OpenAI updates custom_id; Gemini updates either the Developer API key or the Vertex correlation label. All map back to the same internal durable attempt identifier.

Artifact lifecycle¶

Once an item has a durable request artifact pointer, batchor may rely on that artifact for later retries instead of rebuilding the prompt from source data.

Once the whole run is terminal:

users may call Run.prune_artifacts() or BatchRunner.prune_artifacts(run_id) to remove replayable request files and clear their pointers
this is explicit lifecycle management, not automatic garbage collection

Raw output/error artifacts follow a stricter rule:

when raw retention is enabled, users must call Run.export_artifacts(...) first
only then may they call Run.prune_artifacts(include_raw_output_artifacts=True)

That rule keeps destructive cleanup of raw provider evidence explicit.

After a terminal run, the usual flow is:

Call export_artifacts(...) if you want a portable bundle containing raw files plus results.jsonl.
Call prune_artifacts() to remove replayable request artifacts once replay durability is no longer needed.
Call prune_artifacts(include_raw_output_artifacts=True) only after export if you also want to remove raw provider payloads.

Run control semantics¶

Run lifecycle status and control state are separate:

lifecycle status answers whether the run is still active or terminal
control state answers whether local execution should keep ingesting/submitting/polling

Current control states are:

running
paused
cancel_requested

Semantics:

pause stops new ingestion, new item claiming/submission, and provider polling
resume restarts those local activities from persisted state
cancel stops new ingestion/submission, keeps polling active provider batches, and then permanently fails remaining local non-terminal items with error_class="run_cancelled"

If checkpointed ingestion is incomplete when cancellation is requested, the unmaterialized source tail is intentionally abandoned. After active batches drain and remaining materialized items are cancelled, the stored ingest checkpoint is finalized so the run can become terminal without reopening the source.

Manual pauses record control_reason="manual". OpenAI upload/create/polling and batch-level insufficient-quota pauses record control_reason="openai_insufficient_quota", clear batch-control-plane backoff, and preserve affected items for later retry without consuming attempts.

Row-level insufficient-quota records inside completed batch outputs do not pause the run. Those items are stored as failed_retryable with error_class="openai_insufficient_quota", do not consume attempts, and rely on retry backoff before they are submitted again.

Auto-pause must not overwrite cancel_requested. Cancellation is non-reversible, so quota errors observed while draining already-submitted batches leave the run in cancel flow.

For non-checkpointed finite iterables, ingestion continues materializing remaining items after an auto-pause so resume() cannot silently lose input rows that were not yet persisted. Checkpointed sources may stop on pause because their stored checkpoint can resume materialization later.

Manual pause stops both checkpointed and arbitrary ingestion at a cooperative boundary. Arbitrary iterators are retained only in-process; automatic quota pause still materializes their finite tail to avoid losing rows that cannot be reconstructed.

wait() fails fast on paused runs instead of sleeping indefinitely. The raised RunPausedError carries the same control_reason as the summary. Its optional timeout is propagated through ingestion and polling, caps sleeps, and prevents starting another library-controlled operation after expiry. It cannot safely preempt an arbitrary callback or provider SDK call already executing in the current thread; those operations retain their own timeout contract. Provider-side remote cancellation is still TBD.

Python API versus CLI¶

The Python API is the main product surface. It supports:

in-memory, SQLite, and Postgres control-plane storage
custom item iterables and built-in deterministic file sources
custom artifact roots
direct access to Run
run control and incremental terminal-result APIs

The CLI is intentionally narrower. It is designed for operator workflows around:

one or more explicit CSV and JSONL files
SQLite durability
JSON summaries and result export

The CLI does not yet expose pause/resume/cancel or incremental terminal-result commands.

Incremental terminal results¶

Storage assigns a monotonic terminal_result_sequence the first time an item reaches a terminal item state. Incremental readers/exporters page on that sequence rather than item_index, so late completion of lower-index items does not break downstream consumption.

The public incremental APIs are:

Run.read_terminal_results(after_sequence=..., limit=...)
BatchRunner.read_terminal_results(run_id, after_sequence=..., limit=...)
Run.export_terminal_results(destination, after_sequence=..., append=True)

The returned/exported cursor is next_after_sequence. This API is library-first today; the CLI does not expose it yet.

Lineage conventions¶

BatchItem.metadata remains user-owned, but batchor reserves metadata["batchor_lineage"] as an object when built-in sources or callers want to persist lightweight join metadata. Recommended keys are:

source_ref
partition_id
source_item_index
source_primary_key
source_namespace

Built-in deterministic sources populate what they know without replacing user metadata. When CompositeItemSource is used, the durable public item_id becomes the auto-namespaced run identifier, and the original child-source row ID is preserved in source_primary_key.

Summary recomputation¶

Storage mutations do not force a full run-summary recomputation after every write. Summary aggregation happens on explicit summary reads and refresh boundaries instead. That reduces write amplification for larger runs.

Current gaps¶

deterministic-source ingestion is synchronous during start() and only a small set of built-in source adapters implement crash-safe mid-ingest resume today
the only built-in artifact backend today is LocalArtifactStore
arbitrary non-checkpointable iterables do not support mid-ingest crash recovery
there is no partial-result API for non-terminal provider state
provider-side remote cancellation is not implemented
the CLI does not expose run control or incremental terminal-result APIs yet

TBD¶

SQLite/Postgres migration story
automated retention windows and archive/export workflow beyond explicit manual export + prune
partial-result read APIs for non-terminal provider state
resumable ingestion for additional deterministic non-file/custom sources beyond the built-in set