Website Downtime Tracking: How to Measure and Report Outages

Knowing your website went down is step one. Tracking when, how long, and how often is what turns incidents into actionable data. Here’s how to build a downtime tracking practice that improves reliability over time.

What is Downtime Tracking?

Downtime tracking is the systematic recording of every period when your service is unavailable or degraded. It goes beyond uptime monitoring (which detects incidents in real-time) to provide historical data for SLA reporting, trend analysis, and capacity planning.

A complete downtime record includes:

Start time — When the outage began (detected by monitoring)
End time — When the service recovered
Duration — Total downtime in minutes/seconds
Impact — Which services/users were affected
Root cause — What caused the outage
Category — Infrastructure, deployment, dependency, etc.

How to Measure Downtime

Method 1: External Monitoring (Recommended)

External uptime monitoring tools check your service from outside your network and record every failure. This is the most accurate method because it measures what users experience.

The measurement formula:

Downtime = Number of failed checks x Check interval

At 1-minute check intervals, if 5 consecutive checks fail, you had approximately 5 minutes of downtime. Higher-frequency checks (every 10 seconds) give more precise measurements.

Method 2: Log Analysis

Parse your server access logs and error logs to identify periods with elevated error rates or zero traffic. This catches issues monitoring might miss but requires log infrastructure.

Method 3: Real User Monitoring (RUM)

Collect availability data from actual user sessions. This shows real user impact but can’t detect outages during low-traffic periods (middle of the night, holidays).

Combining Methods

Best practice is external monitoring as your primary source of truth, supplemented by log data and RUM for context. The monitoring tool defines when downtime starts and ends; logs and RUM explain the impact.

Calculating Availability

The standard formula:

Availability % = ((Total minutes - Downtime minutes) / Total minutes) x 100

For a 30-day month (43,200 minutes) with 45 minutes of downtime:

((43,200 - 45) / 43,200) x 100 = 99.896%

Use the uptime calculator to convert between downtime duration and availability percentage.

Creating Downtime Reports

Monthly SLA Report

Include:

Overall availability — The headline number (e.g., 99.95%)
Incident summary — Each outage with duration, impact, and root cause
Trend chart — Monthly availability over the last 12 months
Error budget status — How much of your error budget was consumed
Action items — What you’re doing to prevent recurrence

Incident Report (Per Outage)

For each significant outage, create a postmortem:

Timeline of events
Root cause analysis
Impact assessment (users affected, financial cost)
Corrective actions with owners and deadlines

Tracking Downtime Over Time

The most useful metric isn’t a single month’s availability, it’s the trend. Track:

Incidents per month — Is frequency increasing or decreasing?
Mean Time Between Failures (MTBF) — Average time between incidents. Higher is better
Mean Time To Detect (MTTD) — How fast you find problems. Determined by check interval
Mean Time To Resolve (MTTR) — How fast you fix problems. Improved by runbooks and automation
Availability trend — Month-over-month and quarter-over-quarter

A service with 99.9% availability and improving MTTR is in better shape than a service with 99.99% availability and worsening incident frequency.

Tools for Downtime Tracking

Uptime monitoring tools — Warden, Uptime Robot, Better Uptime. These are your primary data source
Status page platforms — Public-facing incident history for customers
Incident management tools — PagerDuty, Opsgenie. Track incident lifecycle
Spreadsheets — Surprisingly effective for small teams. Log every incident manually
Custom dashboards — Grafana, Datadog. Visualize trends from monitoring data

Common Mistakes

Not tracking scheduled maintenance — Even planned downtime should be recorded (but flagged as planned)
Rounding generously — 4 minutes and 20 seconds of downtime is not “about 4 minutes.” Precision matters for SLA calculations
Ignoring partial outages — If 30% of users can’t access your API, that’s partial downtime worth tracking
No root cause categorization — Without categories, you can’t identify systemic issues (e.g., “50% of outages are deployment-related”)
Tracking without action — Data without follow-up actions is just noise. Every report should include improvement actions

Start tracking today. Even a simple spreadsheet with date, duration, and cause will reveal patterns within a few months.

Related tools:

Uptime Calculator — Convert downtime to availability percentage
Downtime Cost Calculator — Quantify the cost of each outage
Postmortem Template — Document incidents systematically