“We guarantee 99.9% uptime.” You’ve seen this in SLAs. But what does it actually mean? How much downtime is that? And why do fractions of a percent matter so much?
Uptime, Defined
Uptime is the percentage of time a system is operational and accessible to users. If your website was up for 29 days and 12 hours in a 30-day month, and down for 12 hours, your uptime was:
(29.5 / 30) x 100 = 98.33%That might sound high, but 12 hours of downtime per month is catastrophic for most businesses. This is why small percentage differences matter enormously.
Use the uptime calculator to convert any percentage to actual downtime.
The Nines System
The industry uses “nines” as shorthand for availability levels:
| Nines | Percentage | Downtime/Year | Downtime/Month |
|---|---|---|---|
| Two nines | 99% | 3.65 days | 7.31 hours |
| Three nines | 99.9% | 8.76 hours | 43.8 minutes |
| Four nines | 99.99% | 52.6 minutes | 4.38 minutes |
| Five nines | 99.999% | 5.26 minutes | 26.3 seconds |
Each additional nine represents a 10x reduction in allowed downtime and roughly a 10x increase in engineering effort and cost to achieve.
SLA, SLO, and SLI
These three terms form a hierarchy:
SLI (Service Level Indicator)
What you measure. The raw metric. Examples:
- Request success rate (% of requests returning 2xx)
- Response time (P99 latency under 500ms)
- Throughput (requests per second)
SLO (Service Level Objective)
What you target internally. A goal set by the engineering team. Example: “99.95% of requests succeed within 500ms over any 30-day window.”
SLOs are internal. You can change them. They drive your error budget.
SLA (Service Level Agreement)
What you promise customers. A contractual commitment with consequences (usually service credits) if breached. Example: “99.9% monthly availability. If breached, customer receives 10% credit.”
SLAs should be lower than your SLO to provide a buffer. If your SLO is 99.95% and your SLA is 99.9%, you have room to miss your internal target without breaching the customer contract.
What Each Level Means in Practice
99% (Two Nines)
7.3 hours of downtime per month. Acceptable for internal tools, dev environments, and non-critical services. Most startups without dedicated ops teams operate here.
99.9% (Three Nines)
43.8 minutes of downtime per month. The standard for most SaaS products. Achievable with good fundamentals: health checks, auto-restart, basic redundancy. Most cloud providers guarantee this or better for their core services.
99.95%
21.9 minutes of downtime per month. A good target for production SaaS serving paying customers. Requires load balancing, automated failover, and proactive monitoring.
99.99% (Four Nines)
4.38 minutes of downtime per month. Requires multi-region deployments, automated failover, rigorous change management, and comprehensive uptime monitoring. Most teams underestimate the investment needed.
99.999% (Five Nines)
26 seconds of downtime per month. Reserved for critical infrastructure: financial systems, emergency services, core telecom. Requires massive redundancy, active-active multi-region, and near-zero deployment risk.
The Hidden Cost of Higher Availability
The relationship between nines and cost is exponential, not linear:
- 99% → 99.9%: Add health checks, auto-scaling, basic redundancy. ~2x infrastructure cost
- 99.9% → 99.99%: Multi-region, automated failover, comprehensive monitoring. ~5-10x cost
- 99.99% → 99.999%: Active-active multi-region, chaos engineering, 24/7 SRE team. ~10-50x cost
Before committing to a higher SLA, use the downtime cost calculator to check whether the business impact of downtime justifies the engineering investment.
Uptime vs Availability
These terms are often used interchangeably, but there’s a subtle difference:
- Uptime = The system is running (powered on, process active)
- Availability = The system is accessible and functioning correctly for users
A server can have 100% uptime (never rebooted) while the application on it has 95% availability (crashes frequently, returns errors). SLAs should measure availability, not just uptime.
How to Monitor Uptime
You need three things:
- External monitoring — Check your service from outside your infrastructure, from multiple regions
- Appropriate check frequency — Match your SLA. At 99.99%, you need 10-30 second checks
- Reliable alerting — Get notified immediately when something fails
See our complete guide to uptime monitoring and tool comparison for practical setup guidance.
Related tools:
- Uptime Calculator — Convert any SLA to allowed downtime
- Error Budget Calculator — Calculate your reliability budget
- Downtime Cost Calculator — Quantify the business impact