Securing Subnet Access with NAT Gateways and Security Groups

In our previous lesson, we designed a multi-AZ VPC with segregated public, private, and database subnets. However, our stateless backend applications sitting inside the private subnets are currently entirely cut off from the external world. They cannot download npm/mvn packages, fetch API keys from dynamic providers, or communicate with external payment gateways like Stripe.

To solve this securely, we must implement NAT Gateways for outbound egress and design highly restrictive Security Groups to control inbound traffic.

Understanding NAT Gateways (Network Address Translation)

A NAT Gateway is a managed AWS service that enables instances in a private subnet to connect to the internet or other AWS services, while preventing the internet from initiating a connection directly to those instances.

[ Private Instance: 10.0.11.5 ]
            │ (Inbound blocked, outbound allowed)
            ▼
[ NAT Gateway: 10.0.1.25 (Public Subnet) ]
            │ (Translates 10.0.11.5 -> Elastic IP: 54.120.32.4)
            ▼
[ Internet Gateway ] ──> [ External Internet (e.g. Stripe API) ]

Cost Optimization Architectural Choice:

Production: Provision 1 NAT Gateway per Availability Zone (Multi-AZ resilience). If one AZ goes down, other subnets still have working gateways.
Development/Staging: Provision 1 single NAT Gateway shared across all private subnets. This saves ~$35/month per gateway, as NAT Gateways are billed at an hourly rate plus processing fees.

Step 1: Provisioning the NAT Gateway & Elastic IP

Let's expand our VPC module to include the NAT Gateways:

# modules/vpc/main.tf (continued)

# Allocate Elastic IP (EIP) for NAT Gateways
resource "aws_eip" "nat" {
  count  = var.environment == "prod" ? length(var.availability_zones) : 1
  domain = "vpc"

  tags = {
    Name        = "${var.environment}-nat-eip-${count.index}"
    Environment = var.environment
  }
}

# Create the NAT Gateways in Public Subnets
resource "aws_nat_gateway" "nat" {
  count         = var.environment == "prod" ? length(var.availability_zones) : 1
  allocation_id = aws_eip.nat[count.index].id
  # NAT Gateway must be placed in a PUBLIC subnet
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name        = "${var.environment}-nat-gw-${count.index}"
    Environment = var.environment
  }

  depends_on = [aws_internet_gateway.igw]
}

Step 2: Configuring Route Tables for Private Subnets

Now we must route outbound private traffic (0.0.0.0/0) through our NAT Gateways:

# Create Private Route Tables
resource "aws_route_table" "private" {
  count  = length(var.availability_zones)
  vpc_id = aws_vpc.main.id

  # In Dev/Staging, route all AZs to the single NAT Gateway [0]
  # In Prod, route each AZ to its corresponding NAT Gateway [count.index]
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = var.environment == "prod" ? aws_nat_gateway.nat[count.index].id : aws_nat_gateway.nat[0].id
  }

  tags = {
    Name        = "${var.environment}-private-rt-${var.availability_zones[count.index]}"
    Environment = var.environment
  }
}

# Associate Private Route Table to Private Subnets
resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Step 3: Security Groups vs. NACLs

AWS provides two layers of firewall protection:

Network ACLs (NACLs): Stateless, applied at the subnet boundary. They evaluate traffic rules sequentially and require configuring both inbound and outbound ports manually.
Security Groups: Stateful, applied at the specific resource interface (ENI) level. If an inbound request is authorized on port 80, the outbound response is automatically allowed, regardless of outbound rules.

Best Practice: Layered Security Groups

We will define strict, stateful Security Groups that chain access between the Application Load Balancer, the Application containers, and the RDS database.

graph LR
    User([User Request]) -->|Port 443| SG_ALB[ALB Security Group]
    SG_ALB -->|Port 8080| SG_App[Application Container SG]
    SG_App -->|Port 5432| SG_DB[RDS Database SG]

Let's write this layout configuration:

# main.tf

# 1. Security Group for Public ALB
resource "aws_security_group" "alb" {
  name        = "${var.environment}-alb-sg"
  description = "Allows public inbound traffic to Load Balancer"
  vpc_id      = var.vpc_id

  # Allow HTTP
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # Allow HTTPS
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # Allow all outbound traffic
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# 2. Security Group for stateless microservice containers
resource "aws_security_group" "app" {
  name        = "${var.environment}-app-sg"
  description = "Allows traffic from ALB Security Group only"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 8080 # App listening port
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id] # Reference ALB Security Group id
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# 3. Security Group for database
resource "aws_security_group" "database" {
  name        = "${var.environment}-db-sg"
  description = "Allows traffic from App Security Group only"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 5432 # Postgres port
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id] # Only App can talk to DB
  }

  # Block all egress from database
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

By chaining security_groups instead of CIDR blocks, we ensure that even if an attacker gains control of a server in the public subnet, they cannot connect to the database. The database will only receive requests originating from instances belonging to the aws_security_group.app group.

Next Steps

Our network architecture is robustly designed and secured. Now we must turn our attention to identity management. Before we can spin up applications that write to S3 buckets, publish events to SQS queues, or fetch secrets from Secrets Manager, we need to understand how AWS handles authentication and authorization.

In the next lesson, we will cover IAM Least Privilege, AssumeRole mechanics, and setting up secure OpenID Connect (OIDC) identities for our GitOps pipelines.

Securing Subnet Access with NAT Gateways and Security Groups

Provision, secure, and automate production-grade cloud infrastructure at scale.