The protocol

Phase 1 (prepare): coordinator asks all participants 'can you commit?'. Each writes a tentative log entry and replies yes/no. Phase 2 (commit/abort): if all yes, coordinator says commit; participants apply and ack. If any no or timeout, abort.

Advertisement

The blocking problem

If coordinator crashes after phase 1 but before phase 2, participants are blocked holding locks until coordinator recovers. With many participants and slow coordinators, this can cascade.

Advertisement

Where 2PC still fits

Cross-shard transactions in a single trust domain (e.g., CockroachDB, Spanner). Short critical-section transactions (<100ms) with bounded participants (2-5). Backed by a Raft-replicated coordinator for high-availability of the coordinator.

Saga as the alternative

Long transactions across services with retry/compensation. Each step has an inverse. Used by checkout flows, payment processing. No blocking; weaker isolation (intermediate states visible). The right shape when 2PC's locks would cause too much contention.

Per-message idempotency

Whatever protocol you pick, make participants idempotent. Coordinators crash, retries happen, network duplicates messages. An operation that's safe to repeat survives all of this; an operation that isn't requires careful 'have I done this?' tracking.