Conversation state across turns lives somewhere. Where you put it shapes latency, cost, scale, and capability. Three layers — in-memory, fast KV, persistent — each fit different requirements. Getting this wrong creates lag, lost context, or runaway memory.
In-memory per process
Simplest. Conversation lives in agent runtime memory. Works for single-server, small-scale. Breaks on restart (lost state) and horizontal scaling (server affinity needed). Fine for hackathons; not production.
Fast KV (Redis, DynamoDB)
Conversation state serialized + stored. Read/write per turn (5-20ms added latency). Survives restarts. Scales horizontally. TTL for expiry. The production default for chat agents.
Append-only event log
Each turn is an event written to a log (Kafka, EventStore). Current state = fold of events. Auditable, replayable, debuggable. Higher write cost; lower for queries. Right for compliance-sensitive domains.
What to store
Full transcript: simple, expensive (long convos blow up). Summarized recent + verbatim last N: balanced. Structured state (extracted facts, current intent) + transcript reference: clean separation but more code. Pick by recall needs.
Cleanup policy
Conversations end. Define end-of-session (idle > 30 min, explicit end, user logout). Archive ended conversations to cold storage if you need long-term retention. Don't pile active session data forever; costs and discoverability both suffer.