Every website goes down eventually. The question is whether you understand why it happens, how to prevent it, and how to detect it when it does. According to industry data, the average website experiences between 3 and 5 hours of downtime per month, and most of that downtime traces back to a short list of common causes.
This guide explains the 12 most common causes of website downtime, how each one works, and what you can do to prevent it. Whether you are a developer, a business owner, or an IT manager, understanding these causes is the first step toward keeping your site online.
The 12 Causes at a Glance
| Cause | How Common | Typical Duration | Preventable? |
|---|---|---|---|
| Hosting/server failures | Very common | Minutes to hours | Partially |
| Traffic overload | Common | Minutes to hours | Yes |
| DNS failures | Occasional | Minutes to days | Mostly |
| Code deployments | Common | Minutes | Yes |
| Database failures | Common | Minutes to hours | Mostly |
| Expired SSL certificate | Common | Hours to days | Yes |
| Expired domain name | Occasional | Hours to days | Yes |
| DDoS attacks | Occasional | Hours to days | Partially |
| Third-party service failures | Common | Minutes to hours | Partially |
| CDN outages | Occasional | Minutes to hours | Partially |
| Plugin/theme conflicts | Very common (CMS) | Minutes to hours | Yes |
| Server misconfiguration | Common | Minutes to hours | Yes |
1. Hosting and Server Failures
The most fundamental cause of downtime: the physical or virtual server that hosts your website stops working. This includes hardware failures (disk crashes, memory failures, power supply issues), hypervisor crashes on virtual servers, and hosting provider outages that affect entire data centers.
Why It Happens
- Hardware degradation: Hard drives fail, memory modules develop errors, network cards malfunction. Physical hardware has a finite lifespan.
- Shared hosting resource limits: On shared hosting, your site shares a server with hundreds of others. If another tenant uses excessive resources, or if you exceed your allocation, your site gets throttled or suspended.
- Provider-level outages: Even major providers like AWS, Google Cloud, and DigitalOcean experience outages. A single availability zone going down can take thousands of sites offline simultaneously.
- Kernel panics and OS crashes: The server's operating system itself can crash due to driver bugs, memory corruption, or kernel-level errors
How to Prevent It
- Choose a hosting provider with a strong uptime track record and an SLA (Service Level Agreement)
- Use managed hosting or cloud platforms with automatic failover
- If uptime is critical, deploy across multiple availability zones or regions
- Upgrade from shared hosting to a VPS or dedicated server when you outgrow shared resource limits
Detection: External uptime monitoring catches hosting failures immediately, since your site becomes unreachable from outside the data center. If you only check your site manually, a hosting outage at 3am could go unnoticed for hours.
2. Traffic Overload
Your server has a finite capacity. When incoming traffic exceeds what your infrastructure can handle, the server either slows to a crawl or stops responding entirely. This is different from a DDoS attack (covered in #8) because the traffic is legitimate.
Common Triggers
- Viral content: A Reddit post, Hacker News submission, or tweet links to your site and sends thousands of visitors in minutes
- Marketing campaigns: A successful email blast, product launch, or ad campaign drives more traffic than anticipated
- Seasonal spikes: E-commerce sites during Black Friday, tax software in April, or event ticketing sites during on-sale dates
- News coverage: A news outlet writes about your product or company and links to your site
How to Prevent It
- Use a CDN: Cloudflare, AWS CloudFront, or Fastly can absorb traffic spikes by serving cached content from edge servers globally
- Enable page caching: Serve static HTML instead of generating pages dynamically for every request
- Auto-scaling: Cloud platforms like AWS and Google Cloud can automatically provision additional servers when traffic increases
- Load testing: Use tools like k6, Locust, or Apache JMeter to stress-test your infrastructure before anticipated traffic spikes
The error you will typically see during traffic overload is a 503 Service Unavailable or a 502 Bad Gateway.
3. DNS Failures
DNS (Domain Name System) is the phone book of the internet. It translates your domain name (example.com) into the IP address of your server. If DNS fails, browsers cannot find your server, and your site becomes unreachable even if the server itself is running perfectly.
DNS failures are particularly insidious because they are invisible from the server side. Your server logs show no errors. Your hosting dashboard says everything is healthy. But visitors cannot reach you.
Common DNS Problems
- DNS provider outage: Your DNS provider (Cloudflare, Route 53, GoDaddy) experiences a service disruption
- Incorrect records: An A record, CNAME, or nameserver gets accidentally changed or deleted
- Propagation delays: After changing DNS records, the old records can remain cached globally for hours due to TTL (Time to Live) settings
- Nameserver misconfiguration: The domain's nameservers point to the wrong provider or to servers that no longer exist
How to Prevent It
- Use a reliable DNS provider with high availability (Cloudflare and AWS Route 53 both have 100% uptime SLAs)
- Lower your TTL before making DNS changes so mistakes propagate out quickly
- Document your DNS records and review them periodically
- Consider setting up secondary DNS for critical domains
Detection: Multi-location uptime monitoring is critical for catching DNS failures. DNS issues often affect some geographic regions but not others. A monitor checking from only one location may miss a regional DNS outage.
4. Code Deployments Gone Wrong
Deploying new code to production is one of the most common causes of self-inflicted downtime. A syntax error, a missing dependency, a broken database migration, or an incompatible configuration change can all take a site offline instantly.
What Goes Wrong
- Syntax or runtime errors: A typo in a config file or a coding error that only manifests in the production environment
- Missing environment variables: The production server is missing a required API key, database URL, or secret that was set on staging but not on production
- Failed database migrations: A migration that worked on staging fails on production due to data differences or timeout limits
- Dependency conflicts: A library version that exists locally or on staging is not installed on the production server
- Resource exhaustion: The new code uses significantly more CPU or memory, pushing the server past its limits
How to Prevent It
- Use a staging environment: Test every deployment on a production-like staging server first
- Automate rollbacks: Have a one-command or automated rollback process that can revert to the previous version in seconds
- Blue-green or canary deployments: Route a small percentage of traffic to the new version before switching everyone over
- Health checks in CI/CD: After deployment, automatically verify the site returns a 200 status code before marking the deploy as successful
- Deploy during low-traffic windows: If something breaks, fewer users are affected
The typical error from a bad deployment is a 500 Internal Server Error.
5. Database Failures
For any dynamic website (WordPress, SaaS applications, e-commerce stores), the database is a critical dependency. If the database server crashes, runs out of connections, or becomes too slow to respond, your website fails even though the web server is running.
Common Database Problems
- Connection exhaustion: Every visitor creates database connections. When the
max_connectionslimit is reached, new requests fail with "too many connections" errors. - Slow queries: An unoptimized query scanning millions of rows blocks other queries and creates a cascade of failures
- Disk space exhaustion: The database grows over time. Transaction logs, temporary tables, and data accumulate until the disk is full, causing the database to crash.
- Table corruption: Power outages, unexpected shutdowns, or bugs in the database engine can corrupt tables
- Replication lag: If you use read replicas, lag between the primary and replicas can cause data inconsistencies that break application logic
How to Prevent It
- Use connection pooling (PgBouncer for PostgreSQL, ProxySQL for MySQL) to manage database connections efficiently
- Enable slow query logging and optimize the worst offenders with proper indexes
- Monitor disk space and set alerts at 80% usage
- Use a caching layer (Redis or Memcached) to reduce database load
- Schedule regular database maintenance (VACUUM for PostgreSQL, OPTIMIZE TABLE for MySQL)
- Automate backups and test restoration regularly
6. Expired SSL Certificate
SSL/TLS certificates have expiration dates. When a certificate expires, browsers block access to your site with a full-page security warning. Most visitors will not click through the warning. They will leave and may not come back.
This is one of the most preventable causes of downtime, yet it continues to happen even to large organizations. Auto-renewal failures, expired payment methods, server migrations that break certificate renewal, and organizational turnover where the person managing certificates leaves, these are all common triggers.
How to Prevent It
- Use Let's Encrypt: Free certificates that auto-renew every 90 days. Most hosting platforms support them natively.
- Use Cloudflare SSL: Cloudflare issues and renews certificates automatically with zero manual intervention
- Monitor certificate expiration: SSL monitoring tools warn you 7, 14, or 30 days before expiration
- Verify auto-renewal works: Do not assume it is working. Check the next renewal date after every certificate change.
- Update payment methods: If using a paid certificate, ensure the credit card on the account will not expire before the next renewal date
Note: the CA/Browser Forum has approved shorter certificate lifespans, moving to 47-day certificates by 2029. This makes automated renewal and monitoring even more important. See our SSL certificate monitoring guide for details.
7. Expired Domain Name
When your domain registration expires, the registrar either parks it (showing an "this domain is for sale" page), redirects it, or takes it offline entirely. Your entire website, email, and any services using that domain stop working.
This happens more often than anyone admits. Google, Microsoft, and Foursquare have all accidentally let domains expire. Common causes: the credit card on the registrar account expired, renewal emails went to a former employee's email, or auto-renewal was accidentally disabled.
How to Prevent It
- Enable auto-renewal on every domain you own
- Register for multiple years (5 to 10 years for critical domains). The cost is minimal.
- Keep payment methods current on your registrar account
- Use domain lock to prevent unauthorized transfers
- Monitor expiration dates: Use a domain expiration monitoring service or Notifier's free domain expiry checker to verify dates
- Use a reputable registrar like Cloudflare Registrar, Namecheap, or Porkbun
8. DDoS Attacks
A Distributed Denial of Service (DDoS) attack floods your server with malicious traffic from thousands of sources simultaneously. The goal is to overwhelm your server so it cannot serve legitimate visitors. Unlike traffic overload (#2), this traffic is intentional and malicious.
Types of DDoS Attacks
- Volumetric attacks: Flood your network bandwidth with massive amounts of traffic (UDP floods, DNS amplification)
- Protocol attacks: Exploit weaknesses in network protocols (SYN floods, ping of death)
- Application layer attacks: Target specific pages or endpoints with requests that look legitimate but are designed to be expensive (slow POST attacks, complex search queries)
How to Prevent It
- Use Cloudflare or similar: CDNs with built-in DDoS protection absorb attack traffic before it reaches your server. Cloudflare's free tier includes basic DDoS mitigation.
- Rate limiting: Limit requests per IP per minute at the web server or CDN level
- Web Application Firewall (WAF): Block known attack patterns and malicious payloads
- Anycast routing: Distribute traffic across multiple data centers so no single server bears the full attack
Detection: Uptime monitoring detects the downtime immediately. Response time monitoring is even more useful here, because DDoS attacks typically cause a gradual increase in response times (200ms to 1,000ms to 5,000ms to timeout) before the site goes completely offline. Seeing that pattern in your monitoring data helps distinguish a DDoS from other causes.
9. Third-Party Service Failures
Modern websites depend on external services: payment processors (Stripe, PayPal), authentication providers (Auth0, Firebase Auth), email services (SendGrid, Mailgun), analytics (Google Analytics), and dozens of API integrations. If your code makes synchronous calls to these services and one of them goes down, your site can go down with it.
How This Causes Downtime
- Synchronous API calls: Your page waits for a third-party API response before rendering. If the API times out after 30 seconds, your page takes 30 seconds to load (or fails entirely).
- Payment gateway outages: Checkout pages that depend on Stripe or PayPal fail when those services have incidents
- Authentication provider downtime: If login depends on an external auth service and it goes down, nobody can log in
- Cascading failures: One slow service creates a backlog of connections, exhausting your server's resources and bringing down unrelated pages
How to Prevent It
- Set aggressive timeouts: Never let a third-party API call block your page for more than 2 to 3 seconds. Fail gracefully instead of hanging.
- Use circuit breakers: After a few consecutive failures to a third-party service, stop calling it for a cooldown period and serve a fallback response
- Make calls asynchronous: Move non-critical API calls (analytics, logging, email) to background queues
- Monitor your dependencies: Set up uptime monitors for the critical third-party services you depend on, not just your own site
10. CDN Outages
If your traffic flows through a CDN (Content Delivery Network) or reverse proxy, you depend on that layer being healthy. When the CDN goes down, your visitors see errors even though your origin server is running fine. Major CDN outages have taken large portions of the internet offline. Cloudflare, Fastly, and Akamai have all had incidents that caused widespread downtime.
Why CDN Outages Happen
- Configuration errors: A bad configuration push to the CDN's edge servers (this caused the Fastly outage in June 2021 that took down Amazon, Reddit, and the BBC)
- Regional failures: A specific CDN point of presence (PoP) goes down, affecting users in that region
- Origin connectivity issues: The CDN cannot reach your origin server due to network routing problems, causing 502 errors
- SSL/TLS issues: Certificate problems between the CDN and your origin server
How to Prevent It
- Have a bypass plan: Know how to update your DNS to point directly to your origin server if the CDN has a prolonged outage
- Use proper health checks: Configure the CDN to detect when your origin is slow and adjust timeouts accordingly
- Monitor from multiple locations: CDN outages are often regional. Multi-location monitoring catches regional failures that single-location checks miss.
11. Plugin and Theme Conflicts
This is the leading cause of downtime for WordPress sites, and it affects other CMS platforms (Joomla, Drupal, Shopify) as well. A plugin or theme update introduces a bug, two plugins conflict with each other, or a core platform update breaks an older plugin.
The average WordPress site runs 20 to 30 plugins. Each plugin is developed by a different team on a different schedule. There is no guarantee that Plugin A version 3.2 will work with Plugin B version 5.1. When they don't, the result is usually a 500 Internal Server Error or a white screen of death.
How to Prevent It
- Test updates on staging first: Never update plugins directly on your production site
- Update one plugin at a time: If something breaks, you immediately know which update caused it
- Keep a minimal plugin set: Fewer plugins means fewer potential conflicts. Remove anything you are not actively using.
- Use well-maintained plugins: Check the last update date, active installs, and support forum before installing
- Disable auto-updates for critical plugins: Manual updates give you control over timing and testing
If your website is already down from a plugin conflict, see our detailed guide on why your website keeps going down for step-by-step recovery instructions.
12. Server Misconfiguration
A surprisingly common cause of downtime is simply configuring the server incorrectly. This includes web server configuration (Nginx, Apache), firewall rules, file permissions, PHP settings, and environment variables. One wrong character in a config file can take an entire site offline.
Common Misconfiguration Mistakes
- Nginx/Apache syntax errors: A typo in
nginx.confor.htaccessthat prevents the web server from starting - File permission errors: Setting permissions to 000 on critical files, or running the web server as the wrong user
- Firewall rules blocking traffic: A new firewall rule accidentally blocks port 80 or 443
- PHP memory limits: Setting
memory_limittoo low causes scripts to crash - Redirect loops: Misconfigured redirects where Page A redirects to Page B, which redirects back to Page A
- Incorrect virtual host configuration: Pointing the wrong domain to the wrong directory
How to Prevent It
- Test before applying: Use
nginx -torapachectl configtestto validate configuration syntax before reloading - Use configuration management: Tools like Ansible, Chef, or Terraform ensure server configurations are consistent and version-controlled
- Keep backups of working configs: Before changing any configuration file, copy the current working version
- Document your setup: Write down what is configured where, especially for complex setups with reverse proxies, load balancers, and multiple servers
How to Prevent and Detect Downtime
You cannot prevent every outage. Hardware fails, providers have incidents, and humans make mistakes. But you can dramatically reduce both the frequency and impact of downtime with three strategies.
1. Set Up Uptime Monitoring
The single most important thing you can do is set up external monitoring that checks your site continuously and alerts you immediately when it goes down. Without monitoring, you find out about outages from customer complaints (or worse, you don't find out at all).
Notifier checks your website every 1 to 5 minutes (depending on your plan) from multiple locations and sends alerts via email, SMS, or phone call when something goes wrong. The free plan includes 10 monitors, 5 status pages, and alerts via email, SMS, and phone. See our setup guide to get started in 5 minutes.
2. Build Redundancy
Single points of failure cause the longest outages. Reduce risk by building redundancy at each layer:
- Hosting: Use multiple availability zones or a hosting provider with automatic failover
- DNS: Use a provider with built-in redundancy (Cloudflare, Route 53) or set up secondary DNS
- Database: Run replicas so a single database server failure does not take your site offline
- CDN: Know how to bypass your CDN and serve directly from origin if needed
3. Create a Response Plan
When downtime occurs, the speed of your response determines the impact. Have a documented plan that covers:
- Who gets notified and how (email, SMS, phone, Slack)
- How to quickly diagnose the cause (check the error code, check recent changes, check server resources)
- How to communicate with users (update your status page)
- How to rollback recent changes if they are the suspected cause
- A post-incident review to prevent the same issue from recurring
For more on building a response process, see our guide on website outage alerts and response playbooks.
The bottom line:
Most website downtime comes from a predictable set of causes. Understanding these causes, building prevention measures, and setting up monitoring to catch what you cannot prevent is the formula for keeping your site online. Start with monitoring. It is the foundation everything else builds on.