Using Terraform to create an AWS infrastructure

Terraform is an open source tool for configuring a cloud hosting infrastructure. It uses a declarative language to describe the configuration of cloud services. Through a long list of plugins, called providers, it has support for a variety of cloud services. In this chapter, we’ll use Terraform to describe AWS infrastructure deployments.

Once installed, you can view the Terraform help with the following command:

$ terraform help

Usage: terraform [-version] [-help] <command> [args] 

The available commands for execution are listed below.

The most common, useful commands are shown first, followed by less common or more advanced commands. If you’re just getting started with Terraform, stick with the common commands. For the other commands, please read the help and docs before usage. 

Common commands:

apply     Builds or changes infrastructure

console   Interactive console for Terraform interpolations

destroy   Destroy Terraform-managed infrastructure

init      Initialize a Terraform working directory

output    Read an output from a state file

plan      Generate and show an execution plan

providers Prints a tree of the providers used in the configuration

Terraform files have a .tf extension and use a fairly simple, easy-to-understand declarative syntax. Terraform doesn’t care which filenames you use or the order in which you create the files. It simply reads all the files with a .tf extension and looks for resources to deploy. These files do not contain executable code, but declarations. Terraform reads these files, constructs a graph of dependencies, and works out how to implement the declarations on the cloud infrastructure being used.

An example declaration is as follows:

variable “base_cidr_block” { default = “10.1.0.0/16” }

resource “aws_vpc” “main” {

cidr_block = var.base_cidr_block

}

The first word, resource or variable, is the block type, and in this case, we are declaring a resource and a variable. Within the curly braces are the arguments to the block, and it is helpful to think of these as attributes.

Blocks have labels—in this case, the labels are aws_vpc and main. We can refer to this specific resource elsewhere by joining the labels together as aws_vpc.main. The name, aws_vpc, comes from the AWS provider and refers to VPC elements. In many cases, a block—be it a resource or another kind—will support attributes that can be accessed. For example, the CIDR for this VPC can be accessed as aws_vpc.main.cidr_block.

The general structure is as follows:

<BLOCK TYPE> “<BLOCK LABEL>” “<BLOCK LABEL>” {

# Block body

<IDENTIFIER> = <EXPRESSION> # Argument

}

The block types include resource, which declares something related to the cloud infrastructure, variable, which declares a named value, output, which declares a result from a module, and a few others.

The structure of the block labels varies depending on the block type. For resource blocks, the first block label refers to the kind of resource, while the second is a name for the specific instance of that resource.

The type of arguments also varies depending on the block type. The Terraform documentation has an extensive reference to every variant.

A Terraform module is a directory containing Terraform scripts. When the terraform command is run in a directory, it reads every script in that directory to build a tree of objects.

Within modules, we are dealing with a variety of values. We’ve already discussed resources, variables, and outputs. A resource is essentially a value that is an object related to something on the cloud hosting platform being used. A variable can be thought of as an input to a module because there are multiple ways to provide a value for a variable. The output values are, as the name implies, the output from a module. Outputs can be printed on the console when a module is executed, or saved to a file and then used by other modules. The code relating to this can be seen in the following snippet:

variable “aws_region” {

default = “us-west-2”

type = “string”

description = “Where in the AWS world the service will be hosted”

}

output “vpc_arn” { value = aws_vpc.notes.arn }

This is what the variable and output declarations look like. Every value has a data type. For variables, we can attach a description to aid in their documentation. The declaration uses the word default rather than value because there are multiple ways (such as Terraform command-line arguments) to specify a value for a variable. Terraform users can override the default value in several ways, such as the –var or –var-file command-line options.

Another type of value is local. Locals exist only within a module because they are neither input values (variables) nor output values, as illustrated in the following code snippet:

locals {

vpc_cidr = “10.1.0.0/16”

cidr_subnet1 = cidrsubnet(local.vpc_cidr, 8, 1)

cidr_subnet2 = cidrsubnet(local.vpc_cidr, 8, 2)

cidr_subnet3 = cidrsubnet(local.vpc_cidr, 8, 3)

cidr_subnet4 = cidrsubnet(local.vpc_cidr, 8, 4)

}

In this case, we’ve defined several locals related to the CIDR of subnets to be created within a VPC. The cidrsubnet function is used to calculate subnet masks such as 10.1.1.0/24.

Another important feature of Terraform is the provider plugin. Each cloud system supported by Terraform requires a plugin module that defines the specifics of using Terraform with that platform.

One effect of the provider plugins is that Terraform makes no attempt to be platform- independent. Instead, all declarable resources for a given platform are unique to that platform. You cannot directly reuse Terraform scripts for AWS on another system such as Azure because the resource objects are all different. What you can reuse is the knowledge of how Terraform approaches the declaration of cloud resources.

Another task is to look for a Terraform extension for your programming editor. Some of them have support for Terraform, with syntax coloring, checking for simple errors, and even code completion.

That’s enough theory, though. To really learn this, we need to start using Terraform. In the next section, we’ll begin by implementing the VPC structure within which we’ll deploy the Notes application stack.

1. Configuring an AWS VPC with Terraform

An AWS VPC is what it sounds like—namely, a service within AWS to hold cloud services that you’ve defined. The AWS team designed the VPC service to look something like what you would construct in your own data center, but implemented on the AWS infrastructure.

In this section, we will construct a VPC consisting of a public subnet and a private subnet, an internet gateway, and security group definitions.

In the project work area, create a directory, terraform-swarm, that is a sibling to the notes and users directories.

In that directory, create a file named main.tf containing the following:

provider “aws” {

profile = “notes-app”

region  = var.aws_region

}

This says to use the AWS provider plugin. It also configures this script to execute using the named AWS profile. Clearly, the AWS provider plugin requires AWS credential tokens in order to use the AWS API. It knows how to access the credentials file set up by aws configure.

As shown here, the AWS plugin will look for the AWS credentials file in its default location, and use the notes-app profile name.

In addition, we have specified which AWS region to use. The reference, var.aws_region, is a Terraform variable. We use variables for any value that can legitimately vary. Variables can be easily customized to any value in several ways.

To support the variables, we create a file named variables.tf, starting with this:

variable “aws_region” { default = “us-west-2” }

The default attribute sets a default value for the variable. As we saw earlier, the declaration can also specify the data type for a variable, and a description.

With this, we can now run our first Terraform command, as follows:

$ terraform init

Initializing the backend…

Initializing provider plugins…

– Checking for available provider ..

– Downloading plugin for provider “aws” (hashicorp/aws) 2.56.0… The following providers do not have any version constraints in configuration, so the latest version was installed. 

 

* provider.aws: version = “~> 2.56” Terraform has been successfully initialized!

You may now begin working with Terraform. Try running “terraform plan” to see any changes that are required for your infrastructure. All Terraform commands should now work. 

This initializes the current directory as a Terraform workspace. You’ll see that it creates a directory, .terraform, and a file named terraform.tfstate containing data collected by Terraform. The .tfstate files are what is known as state files.

These are in JSON format and store the data Terraform collects from the platform (in this case, AWS) regarding what has been deployed. State files must not be committed to source code repositories because it is possible for sensitive data to end up in those files. Therefore, a .gitignore file listing the state files is recommended.

The instructions say we should run terraform plan, but before we do that, let’s declare a few more things.

To declare the VPC and its related infrastructure, let’s create a file named vpc.tf. Start with the following command:

resource “aws_vpc” “notes” {

cidr_block = var.vpc_cidr

enable_dns_support = var.enable_dns_support

enable_dns_hostnames = var.enable_dns_hostnames

tags = {

Name = var.vpc_name

}

}

This declares the VPC. This will be the container for the infrastructure we’re creating.

The cidr_block attribute determines the IPv4 address space that will be used for this VPC. The CIDR notation is an internet standard, and an example would be 10.0.0.0/16. That CIDR would cover any IP address starting with the 10.0 octets.

The enable_dns_support and enable_dns_hostnames attributes determine whether Domain Name System (DNS) names will be generated for certain resources attached to the VPC. DNS names can assist with one resource finding other resources at runtime.

The tags attribute is used for attaching name/value pairs to resources. The name tag is used by AWS to have a display name for the resource. Every AWS resource has a computer-generated, user-unfriendly name with a long coded string and, of course, we humans need user-friendly names for things. The name tag is useful in that regard, and the AWS Management Console will respond by using this name in the dashboards.

In variables.tf, add this to support these resource declarations:

variable “enable_dns_support”  { default = true }

variable “enable_dns_hostnames” { default = true }

variable “project_name” { default = “notes” }

variable “vpc_name”   { default = “notes-vpc” }

variable “vpc_cidr”    { default = “10.0.0.0/16” }

These values will be used throughout the project. For example, var.project_name will be widely used as the basis for creating name tags for deployed resources. Add the following to vpc.tf:

data “aws_availability_zones” “available” {

state = “available”

}

Where resource blocks declare something on the hosting platform (in this case, AWS), data blocks retrieve data from the hosting platform. In this case, we are retrieving a list of AZs for the currently selected region. We’ll use this later when declaring certain resources.

1.1. Configuring the AWS gateway and subnet resources

Remember that a public subnet is associated with an internet gateway, and a private subnet is associated with a NAT gateway. The difference determines what type of internet access devices attached to each subnet have.

Create a file named gw.tf containing the following:

resource “aws_internet_gateway” “igw” {

vpc_id = aws_vpc.notes.id

tags = {

Name = “${var.project_name}-IGW”

}

}

resource “aws_eip” “gw” {

vpc = true

depends_on = [ aws_internet_gateway.igw ] tags = {

Name = “${var.project_name}-EIP”

}

}

resource “aws_nat_gateway” “gw” {

subnet_id = aws_subnet.public1.id allocation_id = aws_eip.gw.id tags = {

Name = “${var.project_name}-NAT”

}

}

This declares the internet gateway and the NAT gateway. Remember that internet gateways are used with public subnets, and NAT gateways are used with private subnets.

An Elastic IP (EIP) resource is how a public internet IP address is assigned. Any device that is to be visible to the public must be on a public subnet and have an EIP. Because the NAT gateway faces the public internet, it must have an assigned public IP address and an EIP.

For the subnets, create a file named subnets.tf containing the following:

resource “aws_subnet” “public1” {

vpc_id = aws_vpc.notes.id cidr_block = var.public1_cidr

availability_zone = data.aws_availability_zones.available.names[0] tags = {

Name = “${var.project_name}-net-public1”

}

}

resource “aws_subnet” “private1” {

vpc_id = aws_vpc.notes.id cidr_block = var.private1_cidr

availability_zone = data.aws_availability_zones.available.names[0] tags = {

Name = “${var.project_name}-net-private1”

}

}

This declares the public and private subnets. Notice that these subnets are assigned to a specific AZ. It would be easy to expand this to support more subnets by adding subnets named public2, public3, private2, private3, and so on. If you do so, it would be helpful to spread these subnets across AZs. Deployment is recommended in multiple AZs so that if one AZ goes down, the application is still running in the AZ that’s still up and running.

This notation with [0] is what it looks like—an array. The value, data.aws_availability_zones.available.names, is an array, and adding [0] does access the first element of that array, just as you’d expect. Arrays are just one of the data structures offered by Terraform.

Each subnet has its own CIDR (IP address range), and to support this, we need these CIDR assignments listed in variables.tf, as follows:

variable “vpc_cidr”  { default = “10.0.0.0/16” }

variable “public1_cidr” { default = “10.0.1.0/24” }

variable “private1_cidr” { default = “10.0.3.0/24” }

These are the CIDRs corresponding to the resources declared earlier.

For these pieces to work together, we need appropriate routing tables to be configured. Create a file named routing.tf containing the following:

resource “aws_route” “route-public” {

route_table_id = aws_vpc.notes.main_route_table_id destination_cidr_block = “0.0.0.0/0”

gateway_id = aws_internet_gateway.igw.id

}

resource “aws_route_table” “private” {

vpc_id = aws_vpc.notes.id

route {

cidr_block = “0.0.0.0/0” nat_gateway_id = aws_nat_gateway.gw.id

}

tags = {

Name = “${var.project_name}-rt-private”

}

}

resource “aws_route_table_association” “public1” {

subnet_id = aws_subnet.public1.id

route_table_id = aws_vpc.notes.main_route_table_id

}

resource “aws_route_table_association” “private1” {

subnet_id = aws_subnet.private1.id

route_table_id = aws_route_table.private.id

}

To configure the routing table for the public subnets, we modify the routing table connected to the main routing table for the VPC. What we’re doing here is adding a rule to that table, saying that public internet traffic is to be sent to the internet gateway. We also have a route table association declaring that the public subnet uses this route table.

For aws_route_table.private, the routing table for private subnets, the declaration says to send public internet traffic to the NAT gateway. In the route table associations, this table is used for the private subnet.

Earlier, we said the difference between a public and private subnet is whether public internet traffic is sent to the internet gateway or the NAT gateway. These declarations are how that’s implemented.

In this section, we’ve declared the VPC, subnets, gateways, and routing tables—in other words, the infrastructure within which we’ll deploy our Docker Swarm.

Before attaching the EC2 instances in which the swarm will live, let’s deploy this to AWS and explore what gets set up.

2. Deploying the infrastructure to AWS using Terraform

We have now declared the bones of the AWS infrastructure we’ll need. This is the VPC, the subnets, and routing tables. Let’s deploy this to AWS and use the AWS console to explore what was created.

Earlier, we ran terraform init to initialize Terraform in our working directory. When we did so, it suggested that we run the following command:

$ terraform plan

Refreshing Terraform state in-memory prior to plan…

The refreshed state will be used to calculate this plan, but will not be

persisted to local or remote state storage.

data.aws_availability_zones.available: Refreshing state…

———————————————————————————– 

An execution plan has been generated and is shown below.

Resource actions are indicated with the following symbols: 

+ create

Terraform will perform the following actions:

# aws_eip.gw will be created

+ resource “aws_eip” “gw” {

+ allocation_id = (known after apply)

+ association_id = (known after apply)

+ customer_owned_ip = (known after apply)

+ domain = (known after apply)

+ id = (known after apply)

+ instance = (known after apply)

+ network_interface = (known after apply)

+ private_dns = (known after apply)

+ private_ip = (known after apply)

+ public_dns = (known after apply)

+ public_ip = (known after apply)

+ public_ipv4_pool = (known after apply)

+ tags = {

+ “Name” = “notes-EIP”

}

+ vpc = true

}

 

This command scans the Terraform files in the current directory and first determines that everything has the correct syntax, that all the values are known, and so forth. If any problems are encountered, it stops right away with error messages such as the following:

Error: Reference to undeclared resource

on outputs.tf line 8, in output “subnet_public2_id”:

8: output “subnet_public2_id” { value = aws_subnet.public2.id }

A managed resource “aws_subnet” “public2” has not been declared in the root module.

Terraform’s error messages are usually self-explanatory. In this case, the cause was a decision to use only one public and one private subnet. This code was left over from there being two of each. Therefore, this error referred to stale code that was easy to remove.

The other thing terraform plan does is construct a graph of all the declarations and print out a listing. This gives you an idea of what Terraform intends to deploy on to the chosen cloud platform. It is therefore your opportunity to examine the intended infrastructure and make sure it is what you want to use.

Once you’re satisfied, run the following command:

$ terraform apply

data.aws_availability_zones.available: Refreshing state…

An execution plan has been generated and is shown below.

Resource actions are indicated with the following symbols:

+ create 

Terraform will perform the following actions:

Plan: 10 to add, 0 to change, 0 to destroy. 

Do you want to perform these actions?

Terraform will perform the actions described above.

Only ‘yes’ will be accepted to approve. 

Enter a value: yes

 

Apply complete! Resources: 10 added, 0 changed, 0 destroyed.

Outputs:

aws_region = us-west-2

igw_id = igw-006eb101f8cb423d4

private1_cidr = 10.0.3.0/24

public1_cidr = 10.0.1.0/24

subnet_private1_id = subnet-0a9044daea298d1b2

subnet_public1_id = subnet-07e6f8ed6cc6f8397

vpc_arn = arn:aws:ec2:us-west-2:098106984154:vpc/vpc-074b2dfa7b353486f

vpc_cidr = 10.0.0.0/16

vpc_id = vpc-074b2dfa7b353486f

vpc_name = notes-vpc 

With terraform apply, the report shows the difference between the actual deployed state and the desired state as reflected by the Terraform files. In this case, there is no deployed state, so therefore everything that is in the files will be deployed. In other cases, you might have deployed a system and have made a change, in which case Terraform will work out which changes have to be deployed based on the changes you’ve made. Once it calculates that, Terraform asks for permission to proceed. Finally, if we have said yes, it will proceed and launch the desired infrastructure.

Once finished, it tells you what happened. One result is the values of the output commands in the scripts. These are both printed on the console and are saved in the backend state file.

To see what was created, let’s head to the AWS console and navigate to the VPC area, as follows:

Compare the VPC ID in the screenshot with the one shown in the Terraform output, and you’ll see that they match. What’s shown here is the main routing table, and the CIDR, and other settings we made in our scripts. Every AWS account has a default VPC that’s presumably meant for experiments. It is a better form to create a VPC for each project so that resources for each project are separate from other projects.

The sidebar contains links for further dashboards for subnets, route tables, and other things, and an example dashboard can be seen in the following screenshot:

For example, this is the NAT gateway dashboard showing the one created for this project.

Another way to explore is with the AWS CLI tool. Just because we have Terraform doesn’t mean we are prevented from using the CLI. Have a look at the following code block:

$ aws ec2 describe-vpcs –vpc-ids vpc-074b2dfa7b353486f

{

“Vpcs”: [ {

“CidrBlock”: “10.0.0.0/16”, “DhcpOptionsId”: “dopt-e0c05d98”, “State”: “available”,

“VpcId”: “vpc-074b2dfa7b353486f”, “OwnerId”: “098106984154”,

“InstanceTenancy”: “default”, “CidrBlockAssociationSet”: [ {

“AssociationId”: “vpc-cidr-assoc-0f827bcc4fbb9fd62”, “CidrBlock”: “10.0.0.0/16”,

“CidrBlockState”: {

“State”: “associated”

}

} ],

“IsDefault”: false, “Tags”: [ {

“Key”: “Name”,

“Value”: “notes-vpc”

} ]

} ]

}

 This lists the parameters for the VPC that was created.

Remember to either configure the AWS_PROFILE environment variable or use — profile on the command line.

To list data on the subnets, run the following command:

$ aws ec2 describe-subnets –filters “Name=vpc-

id,Values=vpc-074b2dfa7b353486f”

{

“Subnets”: [

{ … },

{ … }

]

} 

To focus on the subnets for a given VPC, we use the –filters option, passing in the filter named vpc-id and the VPC ID for which to filter.

The AWS CLI tool has an extensive list of sub-commands and options. These are enough to almost guarantee getting lost, so read carefully.

In this section, we learned how to use Terraform to set up the VPC and related infrastructure resources, and we also learned how to navigate both the AWS console and the AWS CLI to explore what had been created.

Our next step is to set up an initial Docker Swarm cluster by deploying an EC2 instance to AWS.

Source: Herron David (2020), Node.js Web Development: Server-side web development made easy with Node 14 using practical examples, Packt Publishing.

Leave a Reply

Your email address will not be published. Required fields are marked *