High-Availability Relational Databases: RDS Multi-AZ Setup

Databases represent the state and source of truth of your entire application platform. If a web server goes down, you spin up a new container in seconds. If your primary database crashes and suffers data corruption, your entire platform halts, resulting in thousands of dollars in lost revenue and customer trust.

To secure our data layer, we must provision a highly resilient RDS PostgreSQL database utilizing Multi-AZ synchronous replication.

Multi-AZ vs. Read Replicas

Understanding the architectural difference is critical for both system design interviews and operational excellence:

Metric	Multi-AZ (Active-Passive HA)	Read Replicas (Active-Active Scaling)
Primary Use	High Availability & Disaster Recovery	Read Throughput Scalability
Replication Type	Synchronous (Zero Data Loss)	Asynchronous (Eventual Consistency)
Failover	Automatic (DNS endpoint updates in ~60s)	Manual promotion is required
Deployment	Standby instance in a separate AZ (hidden)	Active read-only instances (exposed via endpoint)

Step 1: Provisioning the Subnet Group & Parameter Group

Before launching our database, we must associate it with the isolated database subnet group we created inside our VPC module and configure database parameters safely:

# modules/rds/main.tf

# 1. Custom PostgreSQL Parameter Group
resource "aws_db_parameter_group" "postgres" {
  name   = "${var.environment}-postgres15-pg"
  family = "postgres15"

  # Force TLS connections for data-in-transit security
  parameter {
    name  = "rds.force_ssl"
    value = "1"
  }

  # Enable slow query logging
  parameter {
    name  = "log_min_duration_statement"
    value = "1000" # Log query if execution exceeds 1000ms
  }
}

Step 2: Provisioning the Production-Grade RDS Postgres Instance

Now, we write the resource definition. We utilize conditionals (var.environment == "prod") to toggle expensive production-only safety features like Multi-AZ and deletion protection off in sandbox/dev environments to save costs.

# modules/rds/main.tf (continued)

resource "aws_db_instance" "this" {
  identifier             = "${var.environment}-app-db"
  engine                 = "postgres"
  engine_version         = "15.4"
  instance_class         = var.db_instance_class # e.g. "db.r6g.large" for prod
  allocated_storage      = var.allocated_storage_gb
  max_allocated_storage  = var.max_allocated_storage_gb # Autoscales storage up to this limit
  storage_type           = "gp3"
  storage_encrypted      = true
  
  db_name  = "appdb"
  username = "admin"
  password = var.database_password # Decrypted dynamically from Secrets Manager

  db_subnet_group_name   = var.db_subnet_group_name
  vpc_security_group_ids = [var.db_security_group_id]
  parameter_group_name   = aws_db_parameter_group.postgres.name

  # High Availability Configuration
  multi_az = var.environment == "prod"

  # Backup Configuration
  backup_retention_period   = var.environment == "prod" ? 7 : 1 # Keep 7 days in prod
  backup_window             = "03:00-04:00"                     # Daily maintenance window
  copy_tags_to_snapshot     = true
  skip_final_snapshot       = var.environment != "prod"
  final_snapshot_identifier = var.environment == "prod" ? "${var.environment}-app-db-final-snapshot" : null

  # Deletion Protection: Block accidental CLI/Console deletion actions
  deletion_protection = var.environment == "prod"

  tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
  }
}

Step 3: Expose Connection Outputs

Our application containers require the connection endpoint and port to connect to the database cluster:

# modules/rds/outputs.tf

output "db_endpoint" {
  value       = aws_db_instance.this.endpoint
  description = "The connection endpoint for the RDS instance"
}

output "db_port" {
  value       = aws_db_instance.this.port
  description = "The database connection port"
}

By encapsulating database storage groups, synchronous replication, parameter security, and strict lifecycle controls, you guarantee that your data layers are durable, auditable, and resilient to physical availability zone outages.

Next Steps

Now that our database is securely deployed inside the isolated networking layer, we are ready to build our compute infrastructure. In the next lesson, we will provision AWS ECS Fargate clusters to run containerized backend microservices securely and scale them dynamically.

High-Availability Relational Databases: RDS Multi-AZ Setup

Provision, secure, and automate production-grade cloud infrastructure at scale.