What resumability guarantees

A task in-flight when the process dies gets picked up + continued by another process (or the same one after restart), producing the same final result.

Advertisement

The requirements

  • State persisted at safe checkpoints.
  • State loadable by any worker in the fleet.
  • All tool calls idempotent OR guarded by an outbox.
  • Wake-up mechanism to schedule resumption.
Advertisement

Enable via config

RuntimeConfig cfg = RuntimeConfig.builder()
    .resumability(ResumabilityConfig.builder()
        .taskStore(postgresTaskStore)
        .checkpointOnToolCall(true)
        .maxTaskLifetime(Duration.ofHours(24))
        .build())
    .build();

Cost

Every checkpoint is a DB write. Add latency (~5-20ms) per checkpointed step. Only enable for tasks that actually need it.