Skip to main content

Command Palette

Search for a command to run...

Terraform on AWS: Deploy a Highly Available Django App with Auto Scaling and Load Balancing

Deployed a Stateless Django Docker App on AWS with Terraform for Fault Tolerance and Autoscaling

Published
8 min read
Terraform on AWS: Deploy a Highly Available Django App with Auto Scaling and Load Balancing

In the world of cloud computing, terms like “highly available” and “scalable architecture” often float around in whitepapers, certification courses, and online tutorials. They’re buzzwords that sound impressive but can feel abstract until you’ve rolled up your sleeves and built one yourself. That’s exactly where I was before this project — a solid grasp of the individual pieces, but no hands-on experience seeing them orchestrate under real-world conditions.

I knew a load balancer distributes traffic to prevent any single server from becoming a bottleneck. I understood that Auto Scaling Groups (ASGs) dynamically adjust the number of instances based on demand. I’d read about private subnets shielding resources from the public internet and NAT gateways enabling outbound connections without exposing those resources. But reading is one thing; watching these elements collaborate to handle failures and spikes in load is entirely another.

This project was my opportunity to bridge that gap. I set out to deploy a simple web application on AWS using Terraform, focusing exclusively on networking, compute, scaling, and fault tolerance. No databases involved — just a stateless Django app running in Docker containers. The goal wasn’t to create a production-ready monolith but to observe how these components behave when pushed. I wanted to move from theoretical knowledge to empirical understanding, seeing the system self-heal and adapt in real time.

Designing the Foundation: Networking First

Every robust cloud architecture starts with a strong networking base, and in AWS, that means the Virtual Private Cloud (VPC). I began by provisioning a VPC with Terraform, ensuring DNS support and DNS hostnames were enabled. Why? Without these, even basic resolutions — like accessing the load balancer’s endpoint — could fail, leading to frustrating connectivity issues that derail the entire setup.

To infuse redundancy, I divided the VPC into four subnets: two public and two private, each spanning a different Availability Zone (AZ). This wasn’t arbitrary; AZs are physically isolated data centers within a region, so spreading resources across them ensures that a failure in one — like a power outage or network disruption — doesn’t take down the whole application. If AZ1 goes dark, AZ2 picks up the slack seamlessly.

The public subnets housed internet-facing resources: the Application Load Balancer (ALB) for handling incoming traffic and NAT Gateways for outbound connectivity from private resources. Meanwhile, the private subnets were dedicated to the EC2 instances running my web app. These instances were configured without public IP addresses, adhering to the principle of least privilege — exposing only what’s necessary to the outside world.

An Internet Gateway (IGW) was attached to the VPC to facilitate inbound and outbound traffic for public subnets. But the private EC2 instances still needed to reach the internet — for pulling Docker images from public repositories or fetching software updates. Enter NAT Gateways: I deployed one in each public subnet (one per AZ) to provide high-availability outbound routing. This setup routes traffic from private instances through the NAT, masking their origins and keeping them secure.

At this point, the networking layer felt tangible. Unlike my earlier toy projects with flat, single-subnet VPCs, this had deliberate separation of concerns, built-in redundancy, and a clear security posture. It was a foundation that screamed “enterprise-ready,” even for a modest app.

Putting the Load Balancer in Front

With the network skeleton in place, it was time to add the traffic director: the Application Load Balancer. Using Terraform, I configured the ALB to listen on port 80 (HTTP) and forward requests to a target group comprising my EC2 instances. The ALB was placed in the public subnets, making it the sole public entry point — users would never directly hit the backend servers.

Health checks were a critical detail here. I set them up on the root path (“/”) with a 200–299 success code threshold. This means the ALB periodically pings each instance; if it doesn’t get a healthy response, it stops sending traffic there. It’s a simple mechanism, but as I’d soon discover, it’s the linchpin for fault tolerance.

This configuration enhanced security by hiding the instances behind the ALB and improved resilience by intelligently routing around problems. Traffic flow: Internet → IGW → ALB → Healthy EC2 instances in private subnets. No direct exposure, no single point of failure — elegant in its simplicity.

Launch Templates and Auto Scaling

Behind the ALB, the real magic happens with the compute layer. I created a Launch Template to define the blueprint for EC2 instances: an Amazon Linux 2 AMI, t3.micro instance type (cost-effective for testing), appropriate security groups (allowing inbound from the ALB on port 80), and a user data script.

The user data script was the automation glue. On boot, it installed Docker, pulled my Django app image from Docker Hub, and ran the container with port mapping — exposing the app’s port 8000 to the host’s port 80. This ensured the ALB could communicate with the containerized app without fuss.

Next, the Auto Scaling Group. I configured it to use the Launch Template, spanning both private subnets (and thus both AZs). Minimum capacity: 1 instance; desired: 2; maximum: 4. Scaling policies were tied to CloudWatch metrics, specifically average CPU utilization. Scale out if CPU > 70% for 2 minutes; scale in if < 30% for 5 minutes. Warm-up and cooldown periods prevented rapid oscillations.

On paper — or in Terraform code — this looked flawless. But infrastructure as code is only as good as its runtime behavior. Would it actually scale under load? Handle failures gracefully? That was the litmus test.

Watching Health Checks in Action

After a terraform apply, the stack came to life. The ALB’s DNS name resolved in my browser, serving the Django app. Success! But digging into the AWS Console revealed a hiccup: one instance in the target group was marked “unhealthy.”

Panic set in briefly — was my config broken? Then, the ASG kicked in: it terminated the unhealthy instance and spun up a replacement. The new one initialized, passed health checks after a few cycles, and joined the pool.

The culprit? Bootstrapping time. The user data script took ~2–3 minutes to install Docker, pull the image (a few hundred MB), and start the container. During this window, health checks failed, triggering the ASG’s replacement logic. It wasn’t a bug; it was the system self-healing.

This moment was revelatory. High availability isn’t about perfection — it’s about detection and recovery. The infrastructure wasn’t brittle; it was resilient, proactively addressing issues before they impacted users.

Applying Load with Apache JMeter

Theory validated, now for scalability. I fired up Apache JMeter on my local machine to simulate traffic. Starting with 10 concurrent users ramping to 100, I hammered the ALB endpoint with GET requests.

At first… crickets. No new instances launched. CloudWatch showed CPU peaking at 40% — below my 70% threshold. Lesson learned: Scaling isn’t triggered by “busy-ness” alone; it’s metric-driven. The app was handling the load efficiently, so why add capacity?

To force the issue, I temporarily lowered the threshold to 50% and re-ran the test. CPU spiked, alarms fired, and the ASG bumped desired capacity to 3. I watched in the console as the new instance launched, bootstrapped, registered with the target group, passed health checks, and started receiving traffic. The ALB distributed requests evenly, CPU stabilized, and the system hummed.

It was exhilarating — seeing abstract concepts like “elasticity” manifest in logs and metrics.

Observing Scale-In and Connection Draining

Post-test, I halted JMeter. CPU plummeted, triggering the scale-in policy. But termination wasn’t abrupt: The ASG initiated connection draining, giving active sessions (up to 300 seconds) to complete before deregistering the instance from the ALB.

No dropped connections, no user disruption — just a graceful contraction. This underscored controlled scaling: Not just growing, but shrinking efficiently to optimize costs without chaos.

Understanding What “Highly Available” Really Means

Pre-project, I equated high availability with “multiple servers.” Now, I see it as a symphony of interdependent components:

  • The ALB distributes load and enforces health, acting as the vigilant gatekeeper.

  • The ASG monitors and adjusts capacity, replacing failures automatically.

  • CloudWatch metrics and alarms provide the intelligence for proactive decisions.

  • NAT Gateways ensure backend connectivity without compromising security.

  • Private subnets enforce isolation, minimizing attack surfaces.

It’s about clear roles and failover paths. AZ failure? Traffic reroutes. Instance crash? Auto-replacement. Load surge? Scale out. Downtime? Minimal, if any.

The beauty is in the predictability — no heroics required; the system adapts quietly.

Closing the Loop

With testing complete, a terraform destroy wiped the slate clean, reclaiming resources and closing the experiment. This project transformed my perspective: Cloud architecture isn’t rote memorization of services — it’s designing for behavior under stress.

For aspiring DevOps engineers or cloud architects, I recommend this: Build it, break it, observe it. Documentation pales against the clarity of real-time metrics and auto-recovery in action. High availability isn’t a checkbox; it’s a lived experience that builds intuition for crafting truly resilient systems.