Single-node Prometheus is great until it's not. A 14-day retention limit and one node's worth of memory cap the natural use. Mimir, Cortex, and Thanos solve the scaling problem in three different shapes; picking among them shapes operational cost for years.
Single-node Prometheus limits
Memory: ~10M active series per 64GB node. Retention: practically 2-4 weeks before disk and queries get slow. HA: scrape duplication, not data replication. Past these, you're scaling out.
Remote write — the shared on-ramp
Prometheus pushes samples via remote_write to a long-term-storage backend. All three big systems consume this protocol. The push side is solved; the storage side is where they differ.
Mimir vs Cortex vs Thanos
Mimir (Grafana Labs): multi-tenant from the start, S3-backed blocks, horizontally scalable ingesters. The most operationally polished option in 2026. Cortex: older sibling, similar architecture, less active. Thanos: sidecar pattern, uploads Prometheus blocks to S3 directly. Simpler to bolt on; harder to scale to many tenants.
Query federation
All three federate queries across many backends. Push-down predicates matter for performance. PromQL compatibility is high but not 100% — test critical queries against the new backend before commit.
Choosing in 2026
New deployment, multi-tenant or large: Mimir. Existing Prometheus + want long retention: Thanos (smallest lift). Cortex existing deployments are largely migrating to Mimir.