Lesson 10 of 15 4 minAdvanced Track

High-Availability Relational Databases: RDS Multi-AZ Setup

Provision a production-ready, highly available relational database on AWS RDS PostgreSQL with automatic failover and deletion protection.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • Multi-AZ replication provides synchronous hot-standby database failover in a separate Availability Zone.
  • Always enforce lifecycle deletion_protection and auto-snapshots in production environments.
  • Deploy database instances in highly isolated database subnet groups with zero direct internet access routes.
Recommended Prerequisites
terraform-aws-09-secrets-manager-kms

Premium outcome

Provision, secure, and automate production-grade cloud infrastructure at scale.

Backend and platform engineers who want to design, deploy, and automate robust production environments on AWS.

You leave with

  • A secure, modular, multi-environment AWS landing zone designed from scratch
  • A fully integrated GitOps deployment pipeline using GitHub Actions and Terraform S3 Backend
  • Hands-on expertise deploying containerized microservices (ECS Fargate + RDS) with secure IAM gating

High-Availability Relational Databases: RDS Multi-AZ Setup

Databases represent the state and source of truth of your entire application platform. If a web server goes down, you spin up a new container in seconds. If your primary database crashes and suffers data corruption, your entire platform halts, resulting in thousands of dollars in lost revenue and customer trust.

To secure our data layer, we must provision a highly resilient RDS PostgreSQL database utilizing Multi-AZ synchronous replication.


Multi-AZ vs. Read Replicas

Understanding the architectural difference is critical for both system design interviews and operational excellence:

Metric Multi-AZ (Active-Passive HA) Read Replicas (Active-Active Scaling)
Primary Use High Availability & Disaster Recovery Read Throughput Scalability
Replication Type Synchronous (Zero Data Loss) Asynchronous (Eventual Consistency)
Failover Automatic (DNS endpoint updates in ~60s) Manual promotion is required
Deployment Standby instance in a separate AZ (hidden) Active read-only instances (exposed via endpoint)
       [ App container ]
               │ (Writes to DNS endpoint: db.codesprintpro.com)
               ▼
   [ Primary RDS (AZ A) ] ◄── Synchronous Replication ──► [ Standby RDS (AZ B) ]
   (Active - read/write)                                  (Passive hot-standby)

Step 1: Provisioning the Subnet Group & Parameter Group

Before launching our database, we must associate it with the isolated database subnet group we created inside our VPC module and configure database parameters safely:

# modules/rds/main.tf

# 1. Custom PostgreSQL Parameter Group
resource "aws_db_parameter_group" "postgres" {
  name   = "${var.environment}-postgres15-pg"
  family = "postgres15"

  # Force TLS connections for data-in-transit security
  parameter {
    name  = "rds.force_ssl"
    value = "1"
  }

  # Enable slow query logging
  parameter {
    name  = "log_min_duration_statement"
    value = "1000" # Log query if execution exceeds 1000ms
  }
}

Step 2: Provisioning the Production-Grade RDS Postgres Instance

Now, we write the resource definition. We utilize conditionals (var.environment == "prod") to toggle expensive production-only safety features like Multi-AZ and deletion protection off in sandbox/dev environments to save costs.

# modules/rds/main.tf (continued)

resource "aws_db_instance" "this" {
  identifier             = "${var.environment}-app-db"
  engine                 = "postgres"
  engine_version         = "15.4"
  instance_class         = var.db_instance_class # e.g. "db.r6g.large" for prod
  allocated_storage      = var.allocated_storage_gb
  max_allocated_storage  = var.max_allocated_storage_gb # Autoscales storage up to this limit
  storage_type           = "gp3"
  storage_encrypted      = true
  
  db_name  = "appdb"
  username = "admin"
  password = var.database_password # Decrypted dynamically from Secrets Manager

  db_subnet_group_name   = var.db_subnet_group_name
  vpc_security_group_ids = [var.db_security_group_id]
  parameter_group_name   = aws_db_parameter_group.postgres.name

  # High Availability Configuration
  multi_az = var.environment == "prod"

  # Backup Configuration
  backup_retention_period   = var.environment == "prod" ? 7 : 1 # Keep 7 days in prod
  backup_window             = "03:00-04:00"                     # Daily maintenance window
  copy_tags_to_snapshot     = true
  skip_final_snapshot       = var.environment != "prod"
  final_snapshot_identifier = var.environment == "prod" ? "${var.environment}-app-db-final-snapshot" : null

  # Deletion Protection: Block accidental CLI/Console deletion actions
  deletion_protection = var.environment == "prod"

  tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
  }
}

Step 3: Expose Connection Outputs

Our application containers require the connection endpoint and port to connect to the database cluster:

# modules/rds/outputs.tf

output "db_endpoint" {
  value       = aws_db_instance.this.endpoint
  description = "The connection endpoint for the RDS instance"
}

output "db_port" {
  value       = aws_db_instance.this.port
  description = "The database connection port"
}

By encapsulating database storage groups, synchronous replication, parameter security, and strict lifecycle controls, you guarantee that your data layers are durable, auditable, and resilient to physical availability zone outages.

Next Steps

Now that our database is securely deployed inside the isolated networking layer, we are ready to build our compute infrastructure. In the next lesson, we will provision AWS ECS Fargate clusters to run containerized backend microservices securely and scale them dynamically.

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.