Trace Sampling: The Cost Lever That Matters
Sampling decides which traces you keep. It is the single biggest lever on tracing cost. At 1,000 RPS with 10 spans per request and 100% sampling, you produce ~26 billion spans per month. At Datadog's pricing, that's $33,000+/month for tracing alone. Drop to 1% and the bill is $330. The trick is keeping the interesting traces.
Sampling Strategies
- Head-based — decide at trace start. Simple, but may drop the slow or errored traces you actually want to see.
- Tail-based — buffer all spans in the Collector, decide at trace end. Keep 100% of errors and slow requests, drop 99% of fast successful ones. Best signal-to-cost ratio.
- Probabilistic head + tail — keep 1% of all traces (statistical baseline) + 100% of errors + 100% of traces above a latency threshold (interesting traces).
Sampling Rate by Traffic
| Traffic | Sampling |
|---|---|
| < 100 RPS | 100% — don't sample |
| 100-1,000 RPS | 10-25% head-based, OR 100% + tail-based |
| 1,000-10,000 RPS | 1-5% head-based, OR aggressive tail |
| > 10,000 RPS | 0.1-1% head-based + tail for errors |
See the Distributed Tracing Guide for context on OpenTelemetry and the full sampling decision tree.