Skip to main content

Operational Table Maintenance Contract

This page defines the current maintenance contract for operational StateStore tables: what bounds them, which job prunes them, and what operators can rely on during recovery.

Use this page alongside Data lifecycle and retention, Backplane (outbox contract), and Presence.

Contract

TableBounding mechanismPrune triggerSafety invariantObservability
presence_entriesTTL via expires_at_ms; websocket heartbeats also enforce presence.maxEntries opportunistically.StateStoreLifecycleScheduler background sweep every 5 minutes; websocket heartbeat loop also prunes expired rows inline.Delete only rows whose TTL has expired; presence is derived inventory, not durable truth.lifecycle_prune_rows_total{scheduler="statestore",table="presence_entries"} and statestore.lifecycle_pruned.
connectionsTTL via expires_at_ms.StateStoreLifecycleScheduler background sweep every 5 minutes; edge heartbeat loop also clears expired ownership.Delete only expired directory rows; active connections must keep refreshing TTL.lifecycle_prune_rows_total{scheduler="statestore",table="connections"} and statestore.lifecycle_pruned.
channel_inbound_dedupeTTL via expires_at_ms.Best-effort inline cleanup during enqueue, plus StateStoreLifecycleScheduler background sweep.Delete only expired dedupe keys; replay safety remains bounded to the configured dedupe window.lifecycle_prune_rows_total{scheduler="statestore",table="channel_inbound_dedupe"} and statestore.lifecycle_pruned.
channel_inboxTerminal retention window via deployment config lifecycle.channels.terminalRetentionDays (default 7).StateStoreLifecycleScheduler background sweep.Failed rows age out after the terminal window. Completed rows are deleted only after dependent channel_outbox rows are gone.lifecycle_prune_rows_total{scheduler="statestore",table="channel_inbox.failed"} and lifecycle_prune_rows_total{scheduler="statestore",table="channel_inbox.completed"}, plus statestore.lifecycle_pruned.
channel_outboxSuccessful rows are deleted inline after send; failed rows use the same terminal retention window as channel_inbox.Successful sends prune immediately in the delivery path; failed rows are pruned by StateStoreLifecycleScheduler.Retention must never be required for canonical transcript recovery; failed rows remain available through the terminal window for debugging/retry analysis.lifecycle_prune_rows_total{scheduler="statestore",table="channel_outbox.failed"} and statestore.lifecycle_pruned.
lane_leasesLease TTL via lease_expires_at_ms.StateStoreLifecycleScheduler background sweep.Delete only expired leases; workers must reacquire before doing serialized lane work.lifecycle_prune_rows_total{scheduler="statestore",table="lane_leases"} and statestore.lifecycle_pruned.
workspace_leasesLease TTL via lease_expires_at_ms.StateStoreLifecycleScheduler background sweep.Delete only expired leases; active owners must keep renewing.lifecycle_prune_rows_total{scheduler="statestore",table="workspace_leases"} and statestore.lifecycle_pruned.
oauth_pendingRequest expiry via expires_at.StateStoreLifecycleScheduler background sweep; callback consumption also deletes the row inline.Delete only expired or consumed authorization requests; live handshakes must still complete inside the advertised expiry window.lifecycle_prune_rows_total{scheduler="statestore",table="oauth_pending"} and statestore.lifecycle_pruned.
oauth_refresh_leasesLease TTL via lease_expires_at_ms.StateStoreLifecycleScheduler background sweep.Delete only expired refresh leases; active refresh owners must renew/reacquire.lifecycle_prune_rows_total{scheduler="statestore",table="oauth_refresh_leases"} and statestore.lifecycle_pruned.
models_dev_refresh_leasesLease TTL via lease_expires_at_ms.StateStoreLifecycleScheduler background sweep.Delete only expired catalog-refresh leases; the cache row is separate and remains bounded independently.lifecycle_prune_rows_total{scheduler="statestore",table="models_dev_refresh_leases"} and statestore.lifecycle_pruned.
outboxTime retention via OutboxLifecycleScheduler (default 24h).OutboxLifecycleScheduler background sweep every 5 minutes.Retention bounds replay history, but recovery must remain possible from durable StateStore truth after rows age out.lifecycle_prune_rows_total{scheduler="outbox",table="outbox"} and outbox.lifecycle_pruned.
outbox_consumersTime retention via updated_at and the same outbox retention window.OutboxLifecycleScheduler background sweep every 5 minutes.Delete only stale consumer cursors that have not advanced inside the retention window.lifecycle_prune_rows_total{scheduler="outbox",table="outbox_consumers"} and outbox.lifecycle_pruned.

Explicit non-goals

These rows are operational but do not need separate prune jobs today:

  • models_dev_cache is bounded by a singleton primary key (id = 1), so it cannot grow without limit.
  • Successful channel_outbox rows are deleted inline after delivery; a second background job would duplicate that lifecycle.

Failure handling

  • Background lifecycle jobs run under a single-writer DB lock/lease in clustered deployments so pruning stays correct under replica races.
  • Tick failures are logged as statestore.lifecycle_tick_failed or outbox.lifecycle_tick_failed.
  • Prometheus exposes lifecycle_tick_errors_total{scheduler=...} for alerting on repeated maintenance failures.