What is Uptime? SLAs, Nines, and What They Actually Mean

Plain-language explanation of uptime, the nines system, SLAs, SLOs, and SLIs. What each level of availability actually means for your business.

· Project Helena · 4 min read ·
uptime monitoring SLA fundamentals

“We guarantee 99.9% uptime.” You’ve seen this in SLAs. But what does it actually mean? How much downtime is that? And why do fractions of a percent matter so much?

Uptime, Defined

Uptime is the percentage of time a system is operational and accessible to users. If your website was up for 29 days and 12 hours in a 30-day month, and down for 12 hours, your uptime was:

(29.5 / 30) x 100 = 98.33%

That might sound high, but 12 hours of downtime per month is catastrophic for most businesses. This is why small percentage differences matter enormously.

Use the uptime calculator to convert any percentage to actual downtime.

The Nines System

The industry uses “nines” as shorthand for availability levels:

NinesPercentageDowntime/YearDowntime/Month
Two nines99%3.65 days7.31 hours
Three nines99.9%8.76 hours43.8 minutes
Four nines99.99%52.6 minutes4.38 minutes
Five nines99.999%5.26 minutes26.3 seconds

Each additional nine represents a 10x reduction in allowed downtime and roughly a 10x increase in engineering effort and cost to achieve.

SLA, SLO, and SLI

These three terms form a hierarchy:

SLI (Service Level Indicator)

What you measure. The raw metric. Examples:

  • Request success rate (% of requests returning 2xx)
  • Response time (P99 latency under 500ms)
  • Throughput (requests per second)

SLO (Service Level Objective)

What you target internally. A goal set by the engineering team. Example: “99.95% of requests succeed within 500ms over any 30-day window.”

SLOs are internal. You can change them. They drive your error budget.

SLA (Service Level Agreement)

What you promise customers. A contractual commitment with consequences (usually service credits) if breached. Example: “99.9% monthly availability. If breached, customer receives 10% credit.”

SLAs should be lower than your SLO to provide a buffer. If your SLO is 99.95% and your SLA is 99.9%, you have room to miss your internal target without breaching the customer contract.

What Each Level Means in Practice

99% (Two Nines)

7.3 hours of downtime per month. Acceptable for internal tools, dev environments, and non-critical services. Most startups without dedicated ops teams operate here.

99.9% (Three Nines)

43.8 minutes of downtime per month. The standard for most SaaS products. Achievable with good fundamentals: health checks, auto-restart, basic redundancy. Most cloud providers guarantee this or better for their core services.

99.95%

21.9 minutes of downtime per month. A good target for production SaaS serving paying customers. Requires load balancing, automated failover, and proactive monitoring.

99.99% (Four Nines)

4.38 minutes of downtime per month. Requires multi-region deployments, automated failover, rigorous change management, and comprehensive uptime monitoring. Most teams underestimate the investment needed.

99.999% (Five Nines)

26 seconds of downtime per month. Reserved for critical infrastructure: financial systems, emergency services, core telecom. Requires massive redundancy, active-active multi-region, and near-zero deployment risk.

The Hidden Cost of Higher Availability

The relationship between nines and cost is exponential, not linear:

  • 99% → 99.9%: Add health checks, auto-scaling, basic redundancy. ~2x infrastructure cost
  • 99.9% → 99.99%: Multi-region, automated failover, comprehensive monitoring. ~5-10x cost
  • 99.99% → 99.999%: Active-active multi-region, chaos engineering, 24/7 SRE team. ~10-50x cost

Before committing to a higher SLA, use the downtime cost calculator to check whether the business impact of downtime justifies the engineering investment.

Uptime vs Availability

These terms are often used interchangeably, but there’s a subtle difference:

  • Uptime = The system is running (powered on, process active)
  • Availability = The system is accessible and functioning correctly for users

A server can have 100% uptime (never rebooted) while the application on it has 95% availability (crashes frequently, returns errors). SLAs should measure availability, not just uptime.

How to Monitor Uptime

You need three things:

  1. External monitoring — Check your service from outside your infrastructure, from multiple regions
  2. Appropriate check frequency — Match your SLA. At 99.99%, you need 10-30 second checks
  3. Reliable alerting — Get notified immediately when something fails

See our complete guide to uptime monitoring and tool comparison for practical setup guidance.


Related tools:

Stay in the loop

Get notified about new posts, product updates, and engineering insights.

Join the waitlist →