What resumability guarantees
A task in-flight when the process dies gets picked up + continued by another process (or the same one after restart), producing the same final result.
Advertisement
The requirements
- State persisted at safe checkpoints.
- State loadable by any worker in the fleet.
- All tool calls idempotent OR guarded by an outbox.
- Wake-up mechanism to schedule resumption.
Advertisement
Enable via config
RuntimeConfig cfg = RuntimeConfig.builder()
.resumability(ResumabilityConfig.builder()
.taskStore(postgresTaskStore)
.checkpointOnToolCall(true)
.maxTaskLifetime(Duration.ofHours(24))
.build())
.build();Cost
Every checkpoint is a DB write. Add latency (~5-20ms) per checkpointed step. Only enable for tasks that actually need it.