Auto Scaling and Load Balancing: Building Resilient Applications

Sujeet Prajapati
Sep 19
9 min read

Publication Week: Week 3

In today's cloud-first world, building applications that can handle varying traffic loads while maintaining high availability is crucial. Auto scaling and load balancing are fundamental pillars of resilient architecture, ensuring your applications can automatically adapt to demand changes while distributing traffic efficiently across multiple instances.

Understanding Auto Scaling: The Foundation of Elasticity

Auto scaling is the practice of automatically adjusting the number of compute resources based on actual demand. Instead of manually provisioning servers or keeping them idle during low traffic periods, auto scaling ensures you have just the right amount of resources at any given time.

Why Auto Scaling Matters

Cost Optimization: Pay only for the resources you actually need, scaling down during off-peak hours and scaling up during traffic spikes.

Performance Consistency: Maintain response times and user experience even when traffic fluctuates dramatically.

Fault Tolerance: Automatically replace unhealthy instances, reducing the impact of hardware failures or software issues.

Operational Efficiency: Reduce manual intervention and the need for constant capacity planning.

Auto Scaling Groups: The Control Center

Auto Scaling Groups (ASGs) are the backbone of AWS auto scaling functionality. They define collections of EC2 instances that share similar characteristics and scaling requirements.

Key Components of Auto Scaling Groups

Desired Capacity: The ideal number of instances you want running at any given time.

Minimum Capacity: The lowest number of instances that should always be running, ensuring basic availability.

Maximum Capacity: The upper limit of instances to prevent runaway scaling and cost overruns.

Availability Zones: ASGs can span multiple AZs, providing geographic redundancy and fault tolerance.

Configuring Auto Scaling Groups

When setting up an ASG, you'll need to specify several critical parameters:

# Example ASG configuration using AWS CLI
aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name my-web-app-asg \
    --launch-template LaunchTemplateName=my-web-app-template,Version=1 \
    --min-size 2 \
    --max-size 10 \
    --desired-capacity 3 \
    --vpc-zone-identifier "subnet-12345,subnet-67890" \
    --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets \
    --health-check-type ELB \
    --health-check-grace-period 300

Scaling Policies: Smart Decision Making

Scaling policies determine when and how your ASG should add or remove instances. AWS offers three main types of scaling policies, each suited for different scenarios.

Target Tracking Scaling

Target tracking is the most straightforward approach, where you specify a target value for a specific metric, and AWS automatically adjusts capacity to maintain that target.

Common Target Metrics:

CPU Utilization (aim for 70% average)
Request Count per Target (maintain 1000 requests per instance)
Network In/Out (monitor bandwidth utilization)

Configuration Example:

{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
        "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleOutCooldown": 300,
    "ScaleInCooldown": 300
}

Best Practices: Target tracking works excellently for predictable workloads where you want to maintain consistent performance levels. It's ideal for web applications with steady traffic patterns.

Step Scaling

Step scaling provides more granular control by defining different scaling actions based on the size of the alarm breach. This approach is perfect when you need different responses to various levels of demand.

Step Scaling Configuration:

{
    "AdjustmentType": "ChangeInCapacity",
    "StepAdjustments": [
        {
            "MetricIntervalLowerBound": 0,
            "MetricIntervalUpperBound": 20,
            "ScalingAdjustment": 1
        },
        {
            "MetricIntervalLowerBound": 20,
            "MetricIntervalUpperBound": 40,
            "ScalingAdjustment": 2
        },
        {
            "MetricIntervalLowerBound": 40,
            "ScalingAdjustment": 3
        }
    ],
    "Cooldown": 300
}

Use Cases: Step scaling excels in scenarios with unpredictable traffic spikes, such as news websites during breaking news or e-commerce sites during flash sales.

Simple Scaling

Simple scaling triggers a single scaling action when an alarm threshold is breached. While less sophisticated than other methods, it's useful for basic scenarios or when you need predictable scaling behavior.

Simple Scaling Setup:

Create a CloudWatch alarm for your chosen metric
Define a single scaling adjustment
Set appropriate cooldown periods to prevent thrashing

When to Use: Simple scaling works well for applications with clear, binary scaling needs, such as batch processing workloads that need additional capacity when queue depth exceeds a threshold.

Load Balancers: Distributing the Load

Load balancers act as traffic directors, distributing incoming requests across multiple healthy instances. AWS offers three types of load balancers, each optimized for different use cases.

Application Load Balancer (ALB)

ALBs operate at the application layer (Layer 7) and are ideal for HTTP/HTTPS traffic. They offer advanced routing capabilities and are perfect for modern application architectures.

Key Features:

Host-based and path-based routing
Support for multiple target groups
Integration with AWS services like ECS and Lambda
Advanced request routing based on headers, query parameters, and more

Configuration Example:

aws elbv2 create-load-balancer \
    --name my-web-app-alb \
    --subnets subnet-12345 subnet-67890 \
    --security-groups sg-12345 \
    --scheme internet-facing \
    --type application \
    --ip-address-type ipv4

Best Use Cases: Web applications, microservices architectures, container-based applications, and any scenario requiring content-based routing.

Network Load Balancer (NLB)

NLBs operate at the transport layer (Layer 4) and are designed for ultra-high performance and low latency requirements. They can handle millions of requests per second while maintaining extremely low latencies.

Key Features:

Static IP addresses and Elastic IP support
Extreme performance with minimal latency
TCP and UDP load balancing
Preserves source IP addresses

Performance Characteristics: NLBs can handle sudden traffic spikes and provide consistent performance even under extreme load conditions.

Ideal Scenarios: Gaming applications, IoT data ingestion, real-time communications, and any application requiring ultra-low latency.

Classic Load Balancer (CLB)

While considered legacy, CLBs still serve specific use cases and provide basic load balancing functionality across both Layer 4 and Layer 7.

When to Consider CLBs: Existing applications that haven't migrated to newer load balancer types, simple use cases without advanced routing needs, or applications requiring basic SSL termination.

Health Checks: Ensuring Instance Reliability

Health checks are the eyes and ears of your load balancing and auto scaling system. They continuously monitor instance health and ensure traffic only reaches healthy instances.

Types of Health Checks

EC2 Health Checks: Monitor the underlying EC2 instance status, detecting hardware failures and instance-level issues.

ELB Health Checks: More comprehensive application-level checks that verify your application is responding correctly to requests.

Configuring Effective Health Checks

{
    "Protocol": "HTTP",
    "Port": 80,
    "Path": "/health",
    "IntervalSeconds": 30,
    "TimeoutSeconds": 5,
    "HealthyThresholdCount": 2,
    "UnhealthyThresholdCount": 3
}

Health Check Best Practices:

Create dedicated health check endpoints that verify critical application components
Set appropriate timeout and interval values based on your application's characteristics
Include dependency checks (database connectivity, external service availability) in your health endpoints
Monitor health check metrics to identify patterns and potential issues

Launch Templates vs Launch Configurations

Understanding the difference between launch templates and launch configurations is crucial for effective auto scaling implementation.

Launch Configurations (Legacy)

Launch configurations are the older method for defining instance parameters. While still functional, they have several limitations:

Immutable after creation
Limited feature support
No versioning capability
Cannot be used with mixed instance types

Launch Templates (Recommended)

Launch templates are the modern, feature-rich alternative that provides greater flexibility and functionality:

Advanced Features:

Version control and rollback capabilities
Support for mixed instance types and purchasing options
Enhanced networking configurations
Integration with newer AWS features

Launch Template Example:

{
    "LaunchTemplateName": "my-web-app-template",
    "LaunchTemplateData": {
        "ImageId": "ami-0123456789abcdef0",
        "InstanceType": "t3.medium",
        "KeyName": "my-keypair",
        "SecurityGroupIds": ["sg-12345"],
        "UserData": "base64-encoded-startup-script",
        "IamInstanceProfile": {
            "Name": "my-instance-role"
        },
        "TagSpecifications": [{
            "ResourceType": "instance",
            "Tags": [
                {"Key": "Name", "Value": "WebServer"},
                {"Key": "Environment", "Value": "Production"}
            ]
        }]
    }
}

Migration Strategy: If you're currently using launch configurations, plan a migration to launch templates to take advantage of new features and better management capabilities.

Monitoring and Observability

Effective monitoring is essential for maintaining healthy auto scaling and load balancing operations.

Key Metrics to Monitor

Auto Scaling Metrics:

Group size changes over time
Scaling activity frequency and triggers
Instance launch and termination patterns

Load Balancer Metrics:

Request count and latency distribution
Target response times and error rates
Healthy vs unhealthy target counts

Application-Level Metrics:

Custom business metrics that drive scaling decisions
Resource utilization patterns
User experience indicators

CloudWatch Integration

Set up comprehensive CloudWatch dashboards and alarms to maintain visibility into your scaling operations:

aws cloudwatch put-metric-alarm \
    --alarm-name "HighCPUUtilization" \
    --alarm-description "Alarm when CPU exceeds 80%" \
    --metric-name CPUUtilization \
    --namespace AWS/EC2 \
    --statistic Average \
    --period 300 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2

Hands-on Lab: Building an Auto-Scaling Web Application

Let's put theory into practice by building a complete auto-scaling web application from scratch.

Step 1: Create the Launch Template

First, create a launch template that defines how your web server instances should be configured:

# Create the launch template
aws ec2 create-launch-template \
    --launch-template-name WebAppTemplate \
    --launch-template-data '{
        "ImageId": "ami-0abcdef1234567890",
        "InstanceType": "t3.micro",
        "KeyName": "my-key-pair",
        "SecurityGroupIds": ["sg-web-servers"],
        "UserData": "'$(base64 -w 0 user-data.sh)'",
        "IamInstanceProfile": {
            "Name": "WebServerRole"
        },
        "TagSpecifications": [{
            "ResourceType": "instance",
            "Tags": [
                {"Key": "Name", "Value": "AutoScaling-WebServer"},
                {"Key": "Project", "Value": "WebAppDemo"}
            ]
        }]
    }'

Step 2: Set Up the Application Load Balancer

Create an ALB to distribute traffic across your instances:

# Create the load balancer
aws elbv2 create-load-balancer \
    --name WebAppALB \
    --subnets subnet-12345 subnet-67890 \
    --security-groups sg-load-balancer \
    --scheme internet-facing \
    --type application

# Create target group
aws elbv2 create-target-group \
    --name WebAppTargets \
    --protocol HTTP \
    --port 80 \
    --vpc-id vpc-12345 \
    --health-check-path /health \
    --health-check-interval-seconds 30 \
    --healthy-threshold-count 2 \
    --unhealthy-threshold-count 3

Step 3: Configure the Auto Scaling Group

Create an ASG that uses your launch template and targets your load balancer:

aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name WebAppASG \
    --launch-template LaunchTemplateName=WebAppTemplate,Version=1 \
    --min-size 2 \
    --max-size 6 \
    --desired-capacity 2 \
    --vpc-zone-identifier "subnet-12345,subnet-67890" \
    --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/WebAppTargets/abc123 \
    --health-check-type ELB \
    --health-check-grace-period 300 \
    --tags Key=Name,Value=WebAppASG,PropagateAtLaunch=true

Step 4: Implement Scaling Policies

Set up target tracking scaling to maintain optimal CPU utilization:

aws autoscaling put-scaling-policy \
    --auto-scaling-group-name WebAppASG \
    --policy-name CPUTargetTrackingScalingPolicy \
    --policy-type TargetTrackingScaling \
    --target-tracking-configuration '{
        "TargetValue": 70.0,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "ASGAverageCPUUtilization"
        },
        "ScaleOutCooldown": 300,
        "ScaleInCooldown": 300
    }'

Step 5: Test Your Auto Scaling Setup

Create a simple load testing script to verify your configuration:

#!/bin/bash
# Load test script
ALB_DNS="your-alb-dns-name.region.elb.amazonaws.com"

echo "Starting load test on $ALB_DNS"
for i in {1..1000}; do
    curl -s "$ALB_DNS" > /dev/null &
    if [ $((i % 50)) -eq 0 ]; then
        echo "Sent $i requests"
        sleep 1
    fi
done

wait
echo "Load test completed"

Step 6: Monitor and Validate

Use CloudWatch to monitor your application's behavior during the load test:

Check ASG activity history
Monitor CPU utilization across instances
Verify load balancer target health
Observe scaling actions and timing

Best Practices and Optimization Tips

Scaling Policy Optimization

Cooldown Periods: Set appropriate cooldown periods to prevent rapid scaling oscillations while ensuring responsiveness to legitimate demand changes.

Multiple Metrics: Consider using multiple scaling policies based on different metrics (CPU, memory, request count) to create more responsive scaling behavior.

Predictive Scaling: For predictable workloads, enable predictive scaling to provision capacity in advance of anticipated demand.

Cost Optimization Strategies

Mixed Instance Types: Use launch templates with multiple instance types to take advantage of spot instances and optimize costs.

Scheduled Scaling: Implement scheduled scaling for predictable patterns, such as business hours vs off-hours capacity needs.

Right-Sizing: Regularly review instance types and sizes to ensure you're using the most cost-effective options.

Security Considerations

Security Groups: Implement least-privilege security group rules, allowing only necessary traffic between load balancers and instances.

IAM Roles: Use IAM roles for instances instead of embedding credentials, and follow the principle of least privilege.

SSL/TLS: Implement SSL termination at the load balancer level and use ACM for certificate management.

Common Pitfalls and How to Avoid Them

Over-Aggressive Scaling

Problem: Scaling policies that react too quickly to short-term spikes, causing unnecessary costs and instability.

Solution: Implement appropriate evaluation periods and cooldowns, and use multiple data points before making scaling decisions.

Inadequate Health Checks

Problem: Health checks that don't accurately reflect application health, leading to traffic being sent to unhealthy instances.

Solution: Create comprehensive health check endpoints that verify all critical application components and dependencies.

Ignoring Dependencies

Problem: Scaling application tiers without considering database or external service limitations.

Solution: Implement holistic monitoring and consider the capacity of all system components when designing scaling policies.

Conclusion

Auto scaling and load balancing are essential components of modern, resilient applications. By implementing proper ASGs, choosing the right scaling policies, and selecting appropriate load balancer types, you can build applications that automatically adapt to changing demands while maintaining high availability and cost efficiency.

The key to success lies in understanding your application's specific requirements, implementing comprehensive monitoring, and continuously optimizing your scaling policies based on real-world performance data. Start with simple configurations and gradually add complexity as you gain experience and better understand your application's scaling patterns.

Remember that auto scaling and load balancing are not set-and-forget solutions. They require ongoing monitoring, tuning, and optimization to ensure they continue to meet your application's evolving needs. Regular review of metrics, scaling events, and cost impacts will help you maintain an efficient and effective scaling strategy.

As your applications grow and evolve, these foundational concepts will serve you well, enabling you to build truly resilient, scalable systems that can handle whatever demands your users place on them.