Byzantine Fault Tolerance — When It Earns Its Cost

The Byzantine fault model

Crash failures: node stops. Byzantine: node sends arbitrary messages — could be malicious, could be corrupted memory, could be buggy. To tolerate f Byzantine nodes you need 3f+1 total (vs 2f+1 for crash-only). The extra node-count is the floor cost.

Advertisement

PBFT and its descendants

Practical Byzantine Fault Tolerance (Castro & Liskov, 1999) was the first viable production BFT. Three-phase commit (pre-prepare, prepare, commit). O(n²) messages per decision. Newer variants (HotStuff) reduce this to O(n).

Advertisement

Tendermint and HotStuff

Tendermint (Cosmos): leader-based BFT with timeouts. HotStuff (Libra/Diem): linear message complexity, pipelined. These are the modern blockchain-targeted BFT protocols. Production-quality, well-audited.

Where BFT earns its cost

Public blockchains (untrusted participants). Cross-org consortium systems (members don't fully trust each other). Critical infrastructure where adversarial node compromise is in scope (defense, financial settlement). Rarely needed in a single-org internal system.

Where to skip

Inside one trust domain (your company's services): use Raft, not BFT. The cost-benefit doesn't work. If you're seeing 'BFT' in an internal architecture proposal, push back — usually it's cargo culting from the blockchain world.