Lesson 14 of 15 4 minAdvanced Track

Drift Detection, State Surgery, and Refactoring

Reconcile manual console overrides, safely import existing cloud resources, and execute complex state refactors without service interruption.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • Infrastructure Drift occurs when manual changes in the AWS Console bypass your Terraform configurations.
  • Use terraform import to take ownership of untracked cloud assets without deleting them.
  • Run terraform state mv to rename resources or modules internally without triggering resource destruction.
Recommended Prerequisites
terraform-aws-13-github-actions-gitops

Premium outcome

Provision, secure, and automate production-grade cloud infrastructure at scale.

Backend and platform engineers who want to design, deploy, and automate robust production environments on AWS.

You leave with

  • A secure, modular, multi-environment AWS landing zone designed from scratch
  • A fully integrated GitOps deployment pipeline using GitHub Actions and Terraform S3 Backend
  • Hands-on expertise deploying containerized microservices (ECS Fargate + RDS) with secure IAM gating

Drift Detection, State Surgery, and Refactoring

No matter how disciplined your team is, reality eventually drifts from your codebase. During high-severity production incidents, an engineer might log into the AWS Management Console directly to increase a server's size or open a security group port.

When this happens, your Infrastructure has drifted.

If you run terraform apply after a manual change is made, one of two bad things will happen:

  1. Terraform will attempt to overwrite the manual change, reverting the server to its original state and potentially re-introducing the incident.
  2. The apply command will fail because it encounters unexpected cloud resources.

To maintain integrity, a DevOps engineer must master Drift Detection, Resource Importing, and State Surgery.


Step 1: Automated Drift Detection

To detect drift before it causes issues, you can run a scheduled, read-only plan:

# Detect drift between state and actual AWS resources
terraform plan -refresh-only

If drift has occurred, the terminal outputs detailed records:

Note: Objects have changed outside of Terraform.

Terraform detected the following changes state-only:
  # aws_security_group.app has been modified outside of IaC:
  ~ ingress {
      ~ cidr_blocks = [
          + "0.0.0.0/0",  # Manual port opening detected!
        ]
      ~ from_port   = 22
      ~ to_port     = 22
    }

Resolution Strategy:

  1. Revert: If the manual change was a temporary fix or a mistake, run terraform apply to overwrite it and restore your code's configuration.
  2. Reconcile: If the change was valid (e.g. permanent server scaling), update your HCL code to match the new values, and run terraform plan to confirm that the diff resolves to 0.

Step 2: Importing Untracked Cloud Assets

Often, you must absorb resources created manually inside an old AWS account into your Terraform workspace without destroying them.

First, write a blank placeholder resource block in your code:

# main.tf

# Placeholder for existing manual S3 bucket
resource "aws_s3_bucket" "legacy_assets" {
  # Leave arguments empty during import stage
}

Now, map the existing AWS resource to this placeholder using its cloud identifier:

# Command: terraform import [resource_type].[resource_name] [aws_identifier]
terraform import aws_s3_bucket.legacy_assets my-manually-created-bucket-name

Terraform connects to AWS, pulls down the resource configuration, and writes it directly into your remote state file. Next, run terraform plan to see what arguments are missing in your local HCL placeholder. Update your code until terraform plan returns zero changes, signifying that your code matches reality perfectly.


Step 3: Refactoring State (State Surgery)

When refactoring code (e.g. renaming a resource or moving it inside a module), Terraform's default behavior is to destroy the old resource and recreate the new one under the new name. For databases or load balancers, this results in catastrophic, unneeded downtime.

To rename a resource without destroying it, we perform State Surgery using the CLI:

# original.tf
resource "aws_s3_bucket" "old_name" { ... }

# After refactoring, we want it to be named:
resource "aws_s3_bucket" "new_name" { ... }

Before applying, rename the resource directly inside the state file:

# Command: terraform state mv [old_path] [new_path]
terraform state mv aws_s3_bucket.old_name aws_s3_bucket.new_name
Successfully moved 1 object(s).

Now, run terraform plan. Terraform will report 0 changes, because the database or S3 bucket wasn't modified in AWS—only its structural mapping key inside our state file was updated!

By using these advanced state operations, you manage infrastructure changes with absolute control, refactoring and expanding your topologies cleanly without ever risking accidental resource destruction.

Next Steps

We have reached the culmination of our DevOps pathway. In the final lesson, we will deploy our Capstone Project: designing and executing a fully automated, zero-downtime Blue-Green Infrastructure Upgrade on AWS.

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.