Jaeger is an open-source distributed tracing system originally from Uber. It collects spans from your services, stores them, and lets you visualize request flows across systems. The most actionable observability signal for microservice architectures.

Advertisement

Trace anatomy

A trace is a tree of spans. Root span = the entry point (HTTP request). Child spans = internal operations (DB query, downstream API call). Each span has start/end time, status, attributes (HTTP method, URL, error message). Trace ID propagates via HTTP headers (W3C Trace Context standard).

Sampling strategy

100% sampling = too much data. Head-based: decide at trace start (e.g., sample 1% of GET, 10% of POST). Simple, predictable storage. Tail-based: collect all spans, decide after — sample errors at 100%, slow requests at 100%, others at 1%. More useful, more complex.

Advertisement

Storage

Jaeger supports Cassandra, Elasticsearch, ClickHouse backends. Retention typically 7-30 days; older traces archived or dropped. At 1B spans/day with ES, expect ~50GB/day at default sampling — plan accordingly.

Root-cause workflow

User reports slow request →
  Find their trace by user_id attribute →
  Click into trace timeline →
  Identify span with longest self-time →
  Read span attributes for parameters →
  Reproduce locally

Integration with OpenTelemetry

Modern setup: instrument services with OTel SDK, export to OTel Collector, Collector forwards to Jaeger. Decouples instrumentation from storage. Switch from Jaeger to Tempo or Honeycomb later without re-instrumenting.

OTel instrumentation + Collector + Jaeger storage. Tail-based sampling for actionable traces. Trace ID in every log.