Loading
Terraform · Certification · IaC

Late Night Terraform: The Four Pillars

Defining the blocks: Terraform, providers, resources, and data sources.

Late Night Terraform: The Four Pillars

🎙️ Opening Monologue

The language is no longer the problem.

The files are written, the structure makes sense, but nothing has moved yet. Words alone don’t change the world. Infrastructure doesn’t exist just because it’s described; it exists because something connects those descriptions to reality.

Late nights make this obvious. You can write the cleanest configuration imaginable and still be staring at an empty cloud if nothing is wired up properly. At some point, intention needs a conduit.

Tonight, we meet the parts of Terraform that make that connection possible — the pieces that turn static definitions into living infrastructure. This is where code stops being descriptive and starts becoming effective.

If grammar gives infrastructure a voice, these are the words that make it act.

🎯Episode Objective

This episode aligns with the Terraform Associate (004) exam objectives listed below.

  • Use and differentiate resource and data blocks
  • Refer to resource attributes and create cross-resource references

The Anchor: Establishing Global Settings with the Terraform Block

The terraform block is a special top-level block used to configure the settings of Terraform itself. Think of it as the “System Settings” menu for your infrastructure.

One critical rule to remember for the 004 exam: The terraform block only accepts constant values. You cannot use variables, locals, or functions here. Why? Because Terraform needs to read this block to understand how to initialize the engine before it can even begin to process variables or functions.

Configuration Syntax

terraform {
  required_version = "<version>"
  required_providers {
    <PROVIDER> {
      version = "<version-constraint>"
      source = "<provider-address>"
    }
  }
  provider_meta "<LABEL>" { 
    # Shown for completeness but only used for specific cases     
  }
  backend "<TYPE>" {        
    # `backend` is mutually exclusive with `cloud` 
    "<ARGUMENTS>"
  }
  cloud {                   
    # `cloud` is mutually exclusive with `backend` 
    organization = "<organization-name>"
    workspaces {
      tags = [ "<tag>" ]
      name = "<workspace-name>"
      project = "<project-name>"
    }
    hostname = "app.terraform.io"
    token - "<TOKEN>"
  }
  experiments = [ "<feature-name>" ]
}

1. required_version

This is your safety rail. It specifies which version of the Terraform CLI is allowed to run your code.

  • The Enforcement: If a teammate tries to run your code with a version that doesn’t match your constraint (e.g., you require ~> 1.6.0 but they have 1.4.0), Terraform will print an error and refuse to run.
  • The Module Ripple: If you use child modules, and they also have required_version blocks, your CLI must satisfy all of them simultaneously. If there’s a conflict, Terraform exits.

2. required_providers

Terraform is a plugin-based system. This block acts as your “Dependency Manifest.”

  • Mapping: You give each provider a local name (like aws or mycloud) and map it to its source address (e.g., hashicorp/aws) and a version constraint.
  • Registry Connection: This tells terraform init exactly which binaries to download from the Terraform Registry.

3. provider_meta "<LABEL>"

This is a more advanced, niche argument. It allows individual modules to pass specific metadata fields that a provider might expect. This is independent of the general provider configuration and is only used by specific providers that require extra “out-of-band” context.

4. backend "<BACKEND_TYPE>"

This defines where your State File — the “Source of Truth” — lives.

  • Storage: You might use s3, azurerm, gcs, or local.
  • Exclusive Rule: You can only have one backend. Furthermore, you cannot use a backend block if you are using the cloud block.

5. cloud

The cloud block is the modern gateway to HCP Terraform (formerly Terraform Cloud) or Terraform Enterprise.

  • Functionality: It handles state storage, remote execution, and workspace management.
  • Constraints: Like the backend block, it cannot refer to variables or locals. It must be hard-coded so Terraform can establish the connection before doing anything else.

Merging and Overriding Behavior

When it comes to the terraform block, the merging rules are a bit more “surgical” than standard blocks. Terraform looks at the settings individually.

For required_version and the backend (either cloud or backend), the override file is the absolute boss.

  • If your main.tf has a required_version and your override.tf has one too, the original is completely ignored.
  • If your main.tf defines an S3 backend, but your override.tf defines an HCP Cloud block, Terraform will drop the S3 config entirely and use the Cloud block.

For required_providers, Terraform is more helpful. It merges on an element-by-element basis.

  • If your original code requires the aws and google providers, but your override file only mentions a new version for aws, your google provider settings stay exactly as they were. Only the aws constraint is updated.

🌙 Late Night Recap

“A favorite exam scenario: What happens if two different modules have conflicting required_version constraints? The answer: Terraform will fail. The CLI version must satisfy EVERY constraint across the entire configuration tree, including all child modules. This is why we usually use flexible constraints like ~> or >= in modules!”

The Gateway: Negotiating Access with the Provider Block

The provider block is where you handle the “How” and “Where” of your infrastructure—specifically, how to authenticate and which region to target.

Configuration Syntax

Unlike the terraform block, you can use expressions here. You can reference variables or local values, but there is a catch: you can only use values known before the apply (no computed resource IDs!).

provider "<PROVIDER_NAME>" {
  <PROVIDER_ARGUMENTS>
  alias   = "<ALIAS_NAME>"
}

By default, you have one configuration per provider. But what if you need to deploy a VPC in us-east-1 and a backup bucket in us-west-2? You use an alias.

  • Default Configuration: A provider block without an alias argument. Every resource uses this by default.
  • Aliased Configuration: A block with an alias. Resources must explicitly opt-in to use this using the provider meta-argument.

Many providers support shell environment variables or other alternate sources for their configuration values, which helps keep credentials out of your version-controlled Terraform configuration

Note: If you alias every provider block, Terraform creates an “implied empty” default. If a resource doesn’t specify an alias, it hits that empty config, which usually results in an authentication error.

Using Aliases in Child Modules

This is a high-level topic often seen in production environments. If a child module needs to use an aliased provider (like aws.west), it has to be “invited” in.

  1. Declaration: The child module must declare that it expects an alias using the configuration_aliases argument inside its required_providers block.
  2. Passing: The parent module then maps its alias to the child’s requirement when calling the module
# In the Child Module
terraform {
  required_providers {
    aws = {
      source                = "hashicorp/aws"
      configuration_aliases = [ aws.west ]
    }
  }
}

The “Empty” Provider

Sometimes, you don’t need any special configuration. If you’re using a provider like random or local that doesn’t need a region or API key, you can simply write: provider "random" { }

In fact, if you forget the block entirely, Terraform assumes an empty default configuration anyway!

🌙 Late Night Recap

“A key takeaway for the 004 exam: Provider configurations never cross-pollinate between modules automatically if aliases are involved. You must use configuration_aliases to bridge that gap. Also, remember that the version argument inside a provider block is deprecated—always put your version constraints in the terraform block under required_providers instead!”

The Nouns of Infrastructure: Defining Reality with Resource Blocks

A resource represents a physical or logical object in your cloud environment. Whether it’s a sprawling Virtual Private Cloud (VPC) or a single DNS record, if you want Terraform to manage it, you must define it as a resource.

The Anatomy of a Resource

resource "<TYPE>" "<LABEL>" {
  <PROVIDER_ARGUMENTS>
  count = <NUMBER>      # `for_each` and `count` are mutually exclusive
  depends_on = [ <RESOURCE.ADDRESS.EXPRESSION> ]
  for_each = {          # `for_each` and `count` are mutually exclusive
    <KEY> = <VALUE>
  }
  for_each = [       # `for_each` accepts a map or a set of strings
    "<VALUE>",
    "<VALUE>"
  ]
  provider = <REFERENCE.TO.ALIAS>
  
}

A resource block is defined by two distinct labels:

  • The Type (<TYPE>): This is defined by the provider (e.g., aws_instance). It tells Terraform what to build.
  • The Label (<LABEL>): This is a name you invent (e.g., web_server). It’s used to identify this specific resource within your code and state file. It does not change the name of the resource in the cloud.

To truly master resources, you have to understand the three different types of data associated with them:

  1. Arguments: Inputs you provide to configure the resource. Some are required, some are optional. Example: ami = “ami-12345”
  2. Attributes: Outputs generated by the cloud provider after creation. You read these; you don’t set them. Example: aws_instance.web.public_ip
  3. Meta-Arguments: Control Knobs built into Terraform itself. They change how Terraform manages the resource. Example: count, depends_on

Meta-arguments are “special powers” provided by the Terraform language. They work on any resource, regardless of the provider.

1. Scaling: count and for_each

These two are mutually exclusive (you can use one or the other, but never both).

  • count: Best for creating “identical twins.” You give it a number, and Terraform creates that many resources. (e.g., count = 3).
  • for_each: Best for “unique siblings.” You give it a map or a set of strings, and Terraform creates a resource for each item. This is much more flexible for complex scaling.

2. Ordering: depends_on

Terraform is usually smart enough to know that a Subnet must be created before a VM. This is called an Implicit Dependency. However, sometimes the link isn’t obvious.

  • Use depends_on to create an Explicit Dependency. Terraform will finish every operation on the “upstream” resource before it even touches the current one.

3. Targeting: provider

As we discussed in the last blog, if you have multiple aliases for a provider (like one for us-east-1 and one for us-west-2), you use this meta-argument to tell the resource which one to use.

  • Syntax: provider = aws.west

Special Resource: terraform_data

Sometimes you need a “resource” that doesn’t actually exist in the cloud — perhaps to trigger a script or store a value for later.

  • terraform_data replaced the older null_resource. It’s a built-in type that allows you to store data and trigger lifecycle actions (like provisioners) without needing any cloud provider at all.

Defining Operation Timeouts

Terraform usually has a built-in “patience” level for how long it waits for a resource to be created, updated, or deleted. However, some heavy-hitters — like Managed Databases (RDS) or Kubernetes Clusters — can take significantly longer than the default.

If Terraform gives up too early, it might mark a resource as “tainted” or fail the apply, even though the cloud is still working on it. To prevent this, many resources support the timeouts block.

How it Works

The timeouts block is a child block inside a resource that allows you to specify a duration for specific lifecycle stages:

  • create: How long to wait for the resource to become active.
  • update: Time allowed for modifying the resource.
  • delete: How long to wait for a clean teardown.
  • read: (Less common) Time allowed for refreshing the state.

The Syntax: Durations are written as strings with a unit suffix: "60m" (60 minutes), "10s" (10 seconds), or "2h" (2 hours).

Precision: Using Provider Aliases

By default, Terraform is a “Name Matcher.” If you define an aws_instance, it looks for a provider block named aws.

But what if your architecture spans across multiple AWS regions or even multiple AWS accounts? You can’t just have one “default” provider. This is where Provider Aliases come into play.

The Meta-Argument: provider

When you have multiple provider configurations (aliased in the provider block), you use the provider meta-argument inside your resource to tell it exactly which “translator” to use.

🌙 Late Night Recap

“A tip for the real world: Timeouts are your best friend for complex migrations. There’s nothing worse than a 45-minute database deployment failing at the 20-minute mark because of a default timeout. Also, remember that the provider meta-argument is a hard link. If you specify an alias that doesn’t exist, Terraform won’t ‘fall back’ to the default—it will simply error out during validation.”

While most resources represent heavy-duty cloud infrastructure, Terraform also provides “Local-only” or “Built-in” resources. These are incredibly useful for handling logic, triggers, and data that exist only within your Terraform state, rather than your cloud console.

terraform_data Resource

The terraform_data resource is a unique, built-in tool that doesn’t require any provider configuration. It replaced the older null_resource and is essentially a “blank slate” that follows the standard resource lifecycle (Create, Read, Update, Delete).

Why use it?

  • Storage: Storing a calculated value that needs to be preserved in the state.
  • Triggering Provisioners: If you need to run a local script (local-exec) but don’t have a specific VM or Database to attach it to.
  • Orchestration: Using it as a “checkpoint” in your dependency graph.

Arguments & Attributes:

  • input: A value you want to store. Whatever you put here is exported as the output attribute after the apply.
  • triggers_replace: This is the “reset button.” If the value you put here changes, Terraform will destroy and recreate this resource (and rerun any provisioners attached to it).
resource "terraform_data" "cluster_initializer" {
  input = var.cluster_id
  # If the version changes, recreate this resource to trigger a new script run
  triggers_replace = [    var.bootstrap_version
  ]
  provisioner "local-exec" {
    command = "echo Initializing cluster ${self.output}"
  }
}

Local-Only Resources

Beyond the built-in terraform_data, there is a whole category of resources that calculate values locally. These belong to providers like random, local, or tls.

Key Characteristics:

  • State-Only: They live only in your terraform.tfstate file.
  • No Cloud Footprint: Destroying a random_id resource doesn’t delete anything in AWS; it simply removes that specific string from your Terraform state.
  • Utility: They are perfect for generating unique bucket names, creating temporary SSH keys, or generating passwords.

Example: random_id

If you need to ensure an S3 bucket has a globally unique name, you can use a local-only resource to generate a suffix:

resource "random_id" "bucket_suffix" {
  byte_length = 4
}
resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-app-storage-${random_id.bucket_suffix.hex}"
}

🌙 Late Night Recap

“If you’re taking the exam, remember: terraform_data is part of the ‘built-in’ provider. You don’t need to declare it in your required_providers block. It’s always there, waiting in the wings to help you manage ‘orphaned’ provisioners or state values that don’t have a home in the cloud.

Merging and Overriding Resources

As we discussed in the HCL anatomy, an override.tf file can surgically strike your resource blocks. However, for resources, the rules are specific to ensure the “Source of Truth” remains stable.

Unlike most nested blocks that are replaced wholesale, the lifecycle block is smarter. It merges on an argument-by-argument basis.

  • Example: If your original code has ignore_changes = [tags] and your override file adds create_before_destroy = true, the resulting resource will have both settings active.

Some components are too complex to merge. If an override block contains these, the original is completely discarded:

  • provisioner blocks: If the override adds even one provisioner, all original provisioners are ignored.
  • connection blocks: The override configuration entirely replaces the original.

You cannot use the depends_on meta-argument in an override block. Doing so will trigger a syntax error. Terraform requires dependencies to be clear and established in the primary configuration to avoid “dependency cycles” that are impossible to resolve during an override.

Resource Dependencies: The Secret Sauce

Terraform is an “Ordered Engine.” It doesn’t just throw code at the cloud; it builds a Dependency Graph to determine the correct sequence.

1. Implicit Dependencies (The Default):

Most of the time, you don’t have to do anything. If Resource B references an attribute of Resource A (like an ID or an IP), Terraform automatically understands that A must exist before B can start.

  • Example: Attaching a Security Group to an EC2 instance by referencing aws_security_group.allow_tls.id.

2**. Explicit Dependencies (depends_on)**

Sometimes, a dependency is “invisible” to the code. For example, a Lambda function might need a specific IAM Role to be fully active and propagated across the cloud’s global database before it can successfully execute, even if the Lambda code doesn’t directly reference an attribute of that role.

  • The Use Case: Use depends_on only when there is a hidden requirement that isn’t captured by an attribute reference.
  • The Rule: Terraform will complete all operations (create/update) on the upstream resource before touching the one with the depends_on flag.

🌙 Late Night Recap:

“If you’re aiming for the 004, remember this: Implicit dependencies are always preferred. Over-using depends_on makes your code rigid and harder to maintain. Only reach for the ‘Explicit’ hammer when the Cloud API doesn’t provide a direct attribute link between two resources.”

A resource address is a unique string that points to zero or more instances. You use these addresses every day when you want to target a specific resource for a plan or move a resource in the state file.

An address follows a simple hierarchical formula: _[module path][resource spec]_

The Module Path:

This identifies where the resource lives in your module tree.

  • Syntax: module.<module_name>[module_index]
  • The “Root” Exception: If you omit the module path, Terraform assumes you are talking about the Root Module.
  • Nesting: For deep architectures, you chain them together: module.parent.module.child.
  • Indexing: If you used count or for_each on a module call, you must specify which one you mean, e.g., module.vpc[0].

The Resource Spec:

Once you are “inside” the module, the resource spec identifies the exact object.

  • Syntax: <resource_type>.<resource_name>[instance_index]
  • resource_type: The provider-defined type (e.g., aws_instance).
  • resource_name: Your user-defined label (e.g., web).

Addressing Multiple Instances (Indexes)

When you scale resources using count or for_each, the address requires an Index to distinguish between siblings.

For count:** Use a numerical index starting at 0.

  • Address: aws_instance.web[1] (Targets the second instance).

For for_each:** Use an alphanumerical key in quotes.

  • Address: aws_instance.web["api"] (Targets the instance keyed “api”).

Bulk Addressing: If you omit the index (e.g., aws_instance.web), the command will target every instance under that name.

🌙 Late Night Recap

“If you’re prepping for the 004, remember that addressing is case-sensitive and literal. If your for_each key is Prod, your address must be ["Prod"], not ["prod"]. Also, when using these in a shell (like Bash or Zsh), you often need to wrap the address in single quotes to stop the shell from misinterpreting the square brackets!”

The Reading Lens: Querying Existing Infrastructure with Data Sources

A Data Source allows Terraform to fetch information from external APIs, other Terraform workspaces, or local files. It is a read-only operation; a data source will never create, modify, or destroy your infrastructure.

The Anatomy of a Data Block

The syntax mirrors the resource block but uses the data keyword. You must provide a Type (defined by the provider) and a Label (your unique name for this query).

data "aws_vpc" "selected" {
  filter {
    name   = "tag:Name"
    values = ["production-vpc"]
  }
}
# Reference the data using: data.aws_vpc.selected.id

Specialized & Local Data Sources

Not all data comes from the cloud. Some sources act as internal utilities to process data during the Terraform run:

  • template_file: Renders text with variables (e.g., creating a dynamic startup script).
  • local_file: Reads a file from your actual hard drive into the configuration.
  • iam_policy_document: A helper that converts HCL into a properly formatted JSON policy for AWS.

The Data Lifecycle: Plan vs. Apply

One of the most critical concepts for the 004 exam is understanding when Terraform reads this data.

Plan-Time (The Default):

If all arguments for the data source are known (like a hardcoded string or a simple variable), Terraform fetches the data during the refresh phase. This allows the values to be used in the plan output so you can see exactly what IDs or IPs are being used.

Apply-Time (Deferred Reading):

Terraform defers reading the data source until the apply phase if the query depends on something that hasn’t been built yet.

  • Scenario: You are creating a new VPC and then trying to use a data source to look up subnets inside that VPC.
  • The Result: The plan will show (known after apply) because the VPC ID doesn’t exist yet, so the search can’t start until the VPC is finished.

Lifecycle and Custom Conditions

Data blocks support the lifecycle block, which allows you to add Preconditions and Postconditions. These act as “sanity checks” for your automation.

  • precondition: Evaluated before the data source is read. (e.g., “Check if the variable env is ‘prod’ before searching for the production VPC”).
  • postcondition: Evaluated after the data source is read. (e.g., “The VPC we found must have the ‘Project: Omega’ tag, or else fail the run”).

Tip: Use these to fail fast! It’s better to have Terraform stop with a custom error message than to deploy a resource into the wrong VPC because your data source filter was too broad.

Meta-Arguments for Data Sources

Data sources are fully featured and support standard meta-arguments to handle complexity:

  • count / for_each: Allows you to perform multiple lookups at once. The results are accessed via index (data.aws_vpc.main[0]) or key (data.aws_vpc.main["api"]).
  • depends_on: Forces Terraform to wait for a specific resource to be fully provisioned before attempting to read the data source.
  • provider: Directs the query to a specific regional alias (e.g., searching for an AMI in aws.west).

🌙 Late Night Recap

“If you’re building a reusable module, Data Sources are your best friends for portability. Instead of asking the user for a Subnet ID, ask them for a Subnet Name and use a data source to find the ID. It makes the user’s life easier and your code much more resilient to change!”

Named Values are expressions that reference an associated value. You can use them standalone or combine them to compute something new.

Note: While these look like object paths (e.g., _var.name_), they are strictly defined. You cannot use bracket notation like _var["name"]_ for these top-level paths—you must use the exact syntax provided

The Core References

  • Resources: Syntax -<TYPE>.<NAME>Accesses attributes like aws_instance.web.id.
  • Input Variables: Syntax -var.<NAME>Accesses user-provided values. Always follows the type constraint.
  • Local Values: Syntax -local.<NAME>Accesses temporary variables defined in a locals block.
  • Data Sources: Syntax -data.<TYPE>.<NAME>Accesses information fetched from the cloud.
  • Module Outputs: Syntax -module.<NAME>.<OUTPUT>Accesses the values exported by a child module.

Handling Lists and Maps (Count & For_Each)

When you use scaling meta-arguments, the way you reference them changes.

Resources with count: The reference becomes a List.

  • aws_instance.web[*].id (Splat expression) gives you a list of all IDs.
  • aws_instance.web[0].id gives you just the first one.

Resources with for_each: The reference becomes a Map.

  • aws_instance.web["api"].id gives you the ID of the specific “api” instance.
  • To get a list of IDs from a map, you use the values() function first: values(aws_instance.web)[*].id.

The Filesystem & Workspace Info

Terraform provides built-in “Context” values so your code knows where it is:

  • path.module: The path to the current module. Great for reading local config files but be careful with write operations!
  • path.root: The path to the root module where you ran terraform init.
  • terraform.workspace: The name of the active workspace (e.g., “prod” vs “dev”).

Sensitive Attributes & “Known After Apply”

Not everything in Terraform is visible or predictable. If a provider marks an attribute as sensitive (like a database password), Terraform will hide it behind a (sensitive value) tag in your terminal.

  • Propagation: If you use a sensitive value to calculate an output, that output must also be marked as sensitive = true, or Terraform will error.

The Mystery of the “Unknown”

During a plan, you will often see (known after apply).

  • Why? This happens for values the Cloud API generates on the fly, like a public IP or a unique ID.
  • The Constraint: You cannot use an unknown value for the count argument. Terraform needs to know exactly how many resources to create during the planning phase, not after!

🌙 Late Night Recap

“If you’re studying for the 004, memorize the Splat ([*]) vs Index ([0]) syntax. A common point of failure for beginners is trying to access aws_instance.web.id when they’ve used count = 5. Because count makes it a list, you’ll get an error unless you specify WHICH index or use a Splat!”

Closing Credits: The Core Four

That wraps up what is undoubtedly one of the weightiest chapters in the Late Night Terraform series. We’ve journeyed through the entire “Writing” foundation — from the system settings of the Terraform Block to the “translators” in Providers, the “nouns” in Resources, and the “eyes” of Data Sources.

It was a massive amount of ground to cover in a single sitting but understanding how these four pillars lock together the only way is to move from “copy-pasting HCL” to “architecting infrastructure.”

We’ve now established the static skeleton of our infrastructure. We know how to:

  • Configure the engine and set version constraints.
  • Authenticate with cloud providers using aliases and inheritance.
  • Provision resources while managing their unique lifecycles and timeouts.
  • Query existing infrastructure to make our code aware of its environment.

🌙 Late-Night Reflection

There is a profound shift that happens when you realize you aren’t just writing a document but managing a life cycle. Every resource you define is a commitment you’re making to the future. Respecting the connection between your code and the real world is what keeps automation from becoming a liability.

✅ Key Takeaways

  • Terraform configurations are built around four core blocks: terraform, provider, resource, and data.
  • The terraform block configures Terraform itself, not infrastructure, and only accepts constant values.
  • required_version and required_providers act as safety rails, preventing incompatible CLI or provider usage.
  • Provider blocks define authentication and targeting, and can use variables and locals — unlike the terraform block.
  • Provider aliases allow multi-region and multi-account deployments, but must be explicitly passed to child modules.
  • Resources represent managed infrastructure, defined by type and local name — not by cloud-side naming.
  • Arguments are inputs, attributes are outputs, and meta-arguments control lifecycle behavior.
  • count and for_each enable scaling, but are mutually exclusive and change how resources are referenced.
  • Implicit dependencies are preferred; depends_on should only be used when the API relationship is invisible.
  • terraform_data replaces null_resource and is useful for orchestration and state-only workflows.
  • Local-only providers (random, local, tls) generate values that exist only in state, not in the cloud.
  • Data sources are strictly read-only and may be evaluated at plan time or apply time.
  • References (var, local, data, resource, module) form Terraform’s dependency graph.
  • Unknown values (“known after apply”) cannot be used for count, because Terraform must know scale at plan time.
  • Resource addressing is precise and case-sensitive, especially when using for_each.

📚 Further Reading

🎬 What’s Next

The structure is standing — but it’s rigid, frozen in place. Change shouldn’t feel dangerous.

We’ll learn how to make infrastructure adaptable instead of brittle.

This post is part of a series
Late Night Terraform
Discussion

Comments