Deploy applications with zero downtime

Application deployments cause downtime, service disruptions, and deployment risk when updates deploy directly to production. Traditional deployment approaches take applications offline during updates, causing revenue loss and poor user experience. Deploy applications using zero-downtime strategies like blue/green, canary, rolling, or a combination to maintain availability during updates, reduce deployment risk, and enable safe rollback.

Your deployment strategy depends on your infrastructure, such as virtual machines or containers, and orchestration tools, like Nomad or Kubernetes. Use load balancers and orchestrators to gradually shift traffic, test changes with production load, and rollback instantly when issues occur.

Why deploy applications with zero downtime

Deploying applications with zero-downtime strategies addresses the following operational challenges:

Lower service disruptions and revenue loss: Application downtime during deployments causes lost revenue, frustrated users, and damaged reputation. Zero-downtime deployments maintain service availability throughout updates, ensuring users experience no interruptions.
Reduce deployment risk with gradual rollouts: Deploying application changes to all users simultaneously creates high risk. If issues occur, all users are affected. Canary and rolling deployments gradually shift traffic, limiting the blast radius and allowing you to catch issues before full rollout.
Enable instant rollback capabilities: When application updates cause bugs or performance issues, traditional deployments require time-consuming rollback procedures. Blue/green deployments maintain the previous version running, allowing traffic switching back to the working version.
Test changes with production traffic: Canary deployments let you test changes with real production traffic on a small user subset, validating performance and functionality before full deployment.

Choose a deployment strategy

Select your deployment strategy based on application requirements, infrastructure constraints, and risk tolerance.

Use the following criteria to choose a deployment strategy:

Use blue/green deployments when you need:

Instant rollback capability for critical applications
Complete validation before switching traffic
Ability to maintain two full environments simultaneously
Predictable cutover timing

Use canary deployments when you need:

Risk reduction for high-impact changes
Gradual validation with real production traffic
Early detection of issues before full rollout
Ability to test with a subset of users first

Use rolling deployments when you need:

Resource efficiency with minimal overhead
Gradual replacement without double infrastructure costs
Continuous availability during updates
Automated orchestration with Kubernetes or Nomad

Combine strategies for comprehensive safety. For example, you can use blue/green deployment with canary testing to deploy to green environment, route 10% traffic for canary validation, then switch all traffic if successful.

Deploy applications on virtual machines with load balancers

Blue/green and canary deployments work well for applications on virtual machines. Load balancers and reverse proxies manage traffic between blue and green environments, enabling you to direct a subset of users for canary testing and control traffic for rolling deployments.

Load balancers route traffic between application environments during updates, supporting blue/green deployments and canary releases. They allow you to gradually shift users to new versions while maintaining the ability to roll back if issues occur. By continuously monitoring application health and automatically routing around failed instances, load balancers increase service availability throughout the deployment process.

Regardless of your cloud provider, you can use Terraform to manage load balancers and proxies. Using Terraform for infrastructure as code allows you to version control your load balancer configurations alongside your application code, ensuring that changes are tracked, reviewed, and rolled back if needed. You can define target groups, health check parameters, routing rules, and SSL certificates declaratively, then apply these configurations automatically as part of your CI/CD pipeline.

The following example shows Terraform configuration for canary deployment using AWS Application Load Balancer with weighted target groups:

# Create target group for stable (blue) version
resource "aws_lb_target_group" "blue" {
  name     = "app-blue"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    path                = "/health"
    timeout             = 5
  }
}

# Create target group for new (green) version
resource "aws_lb_target_group" "green" {
  name     = "app-green"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    path                = "/health"
    timeout             = 5
  }
}

# Configure listener with weighted traffic distribution
resource "aws_lb_listener" "app" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "forward"

    forward {
      target_group {
        arn    = aws_lb_target_group.blue.arn
        weight = 90  # 90% of traffic to stable version
      }

      target_group {
        arn    = aws_lb_target_group.green.arn
        weight = 10  # 10% of traffic to canary version
      }

      stickiness {
        enabled  = false
        duration = 600
      }
    }
  }
}

The Terraform configuration creates two target groups and distributes traffic with a 90/10 split for canary testing. To gradually shift traffic, you can update the weight values for example, to 50/50, then 0/100, and run terraform apply. The load balancer immediately adjusts traffic distribution without downtime.

Canary testing workflow

After the green environment is ready, the load balancer sends a small fraction of traffic to the green environment (in this example, 10%).

Canary test/deployment. All traffic is directed to the blue environment initially. When you perform a canary test, 10% of the traffic is directed to the green environment.

If the canary test succeeds without errors, you can incrementally direct traffic to the green environment over time. In the end state, you redirect all traffic to the green environment. After verifying the new deployment, you can destroy the old blue environment. The green environment is now your current production service.

Rolling deployment. After the initial canary test, traffic to the green environment is split evenly with the blue environment (50/50). Finally, all traffic is directed to the green environment.

To learn how to implement canary deployments with AWS Application Load Balancers, follow the blue-green and canary deployments tutorial.

Deploy containerized applications with orchestration tools

Containers support rolling, blue/green, and canary deployments through orchestration tools like Nomad and Kubernetes. Orchestrators automate the deployment process, manage health checks, and handle traffic routing during updates.

The following deployment strategies lower downtime risk:

Blue/green deployments: Provide instant rollback capability by maintaining two identical environments and switching traffic between them, ensuring zero downtime but requiring double the resources.
Rolling deployments: Gradually replace instances one by one, minimizing resource usage while maintaining availability, making them efficient for resource-constrained environments.
Canary deployments: Mitigate risk by releasing to a small subset of users first, allowing you to validate changes and catch issues before full rollout.

Rolling deployments with Nomad

Nomad supports rolling updates as a first-class feature. Use the update block to control how Nomad replaces old allocations with new ones during deployment.

The following example shows a Nomad job specification with rolling update configuration:

job "web-app" {
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel      = 2          # Update 2 instances at a time
    health_check      = "checks"   # Wait for health checks to pass
    min_healthy_time  = "10s"      # Minimum time to be healthy
    healthy_deadline  = "5m"       # Maximum time to become healthy
    progress_deadline = "10m"      # Overall deployment timeout
    auto_revert       = true       # Automatically revert on failure
    canary            = 2          # Deploy 2 canary instances first
  }

  group "web" {
    count = 6  # Total 6 instances

    network {
      port "http" {
        to = 8080
      }
    }

    service {
      name = "web-app"
      port = "http"

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "app" {
      driver = "docker"

      config {
        image = "myregistry/myapp:1.0.0"
        ports = ["http"]
      }
    }
  }
}

The Nomad job specification deploys 6 instances with rolling updates. Nomad first deploys 2 canary instances, waits for health checks to pass for 10 seconds, then progressively updates 2 instances at a time. If any instance fails health checks, Nomad automatically reverts to the previous version. The progress_deadline ensures the entire deployment completes within 10 minutes or fails.

To learn how to implement rolling and canary deployments with Nomad, follow the Nomad job updates tutorials.

Rolling deployments with Kubernetes

Kubernetes uses rolling updates by default. Kubernetes incrementally replaces current pods with new ones, scheduling new pods on nodes with available resources and waiting for them to become ready before removing old pods.

Both Nomad and Kubernetes support blue/green deployments. Before sending all traffic to your new deployment, use canary testing to validate the new version works correctly with production traffic.

HashiCorp resources:

Learn about zero-downtime deployment strategies overview
Deploy blue/green infrastructure for zero downtime
Deploy with traffic splitting using service mesh
Implement atomic deployments for infrastructure
Learn how to package applications for deployment
Implement automated testing before deploying
Use infrastructure as code to manage deployments

Terraform load balancer resources:

Learn to use Application Load Balancers for blue-green and canary deployments
Use AWS Load Balancer target groups for traffic routing
Use AWS Load Balancer listener for traffic distribution
Configure AWS Load Balancer listener rules for advanced routing
Read the Terraform AWS provider documentation for additional resources

Nomad deployment resources:

Nomad blue/green and canary deployments - Complete tutorial on Nomad deployment strategies
Nomad rolling updates - Tutorial on rolling deployments
Nomad update block reference - Configure rolling upgrades and canary deployments
Nomad job updates tutorials - All Nomad deployment tutorials
Read the Nomad documentation for comprehensive feature guide
Learn about Nomad job specifications for application deployments
Use the Nomad Terraform provider to manage jobs as code

External resources:

Kubernetes - Performing a rolling update

Next steps

In this section of Zero-downtime deployments, you learned how to deploy application changes using blue/green, canary, and rolling strategies with load balancers and orchestrators. Zero-downtime deployments are part of the Define and automate processes pillar.

Refer to the following documents to learn more about deployment strategies:

Learn zero-downtime deployments strategies
Deploy blue/green infrastructure
Deploy traffic splitting with service mesh