iac-env / Part 6 - Provision AWS resources

In this part we will take a small step forward with our automation and create an Amazon S3 bucket which we’ll use to store our Terraform state, using Terraform workspaces to help us. This will demonstrate how to orchestrate the testing and deployment of an AWS resource.


Why remote state?

At the time of authoring these articles my experience with cloud was right near the beginner spectrum and I’ve been trying to digest as many sources of input as I can to learn more. A theme I’ve picked up on so far is that managing and orchestrating infrastructure through code (such as with Terraform) has a whole bunch of unique challenges and constraints compared to other forms of software development.

Among the various articles and blogs I’ve read, I found this one quite useful in understanding a typical journey that newcomers go through when starting out https://www.hashicorp.com/resources/evolving-infrastructure-terraform-opencredo/. While the article is probably a little old now (the code looks like it is using an older form of Terraform), the messaging felt quite solid around the problems with scaling this kind of work across a team.

I won’t be trying to reach the end state suggested in the article above, but I’ll take one of the baby steps by provisioning an Amazon S3 bucket where we can start storing our Terraform state files. What better way to provision our S3 bucket to store state than with our Jenkins pipeline!

S3 as remote state repository

This Terraform document has a reasonable explanation of how an S3 bucket can be used to store remote state: https://www.terraform.io/docs/backends/types/s3.html. Before using an S3 bucket to store Terraform state - we actually have to create the S3 bucket.

We will do this in its own Bitbucket repository and with its own build pipeline.

Important: from this point on you will need an active AWS account available to use. Follow this guide to get some credentials setup on your machine: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html.

Start off by creating a new repository within the Infrastructure as Code project in Bitbucket named terraform-remote:

In your local project create a new folder named terraform-remote containing the following structure and files:

.
├── .gitignore
├── Jenkinsfile
├── kitchen.yml
├── main.tf
├── readme.md
└── test
    ├── fixture
    │   └── main.tf
    └── integration
        └── default
            ├── controls
            │   └── verify_s3_bucket.rb
            └── inspec.yml

Copy the .gitignore and Jenkinsfile from our existing example project - we can update them later if needed.

Enter the following into each of the files:

kitchen.yml

---
driver:
  name: terraform
  root_module_directory: test/fixture

provisioner:
  name: terraform

verifier:
  name: terraform
  systems:
    - name: aws
      backend: aws
      controls:
        - verify_remote_state_s3_bucket

platforms:
  - name: terraform

suites:
  - name: default

We are using the aws backend system now in the verifier section and have renamed the control to run to verify_remote_state_s3_bucket which lives in the integration/default/controls/verify_s3_bucket.rb file.

We’ve also renamed the test suite to default and collapsed our text fixture into test/fixture.

main.tf

terraform {
  required_providers {
    aws = "~> 2.57.0"
  }
}

provider "aws" {
  region = "ap-southeast-2"
}

resource "aws_s3_bucket" "remote_state_bucket" {
  bucket = "iac-env-state-${terraform.workspace}"
  acl    = "private"

  versioning {
    enabled = true
  }

  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

This time we are using the aws provider and we’ll create an S3 bucket with the following name:

iac-env-state-${terraform.workspace}

Note that I’m using ${terraform.workspace} as part of the name. This allows us to use Terraform workspaces to scope the resource. When we run the Terraform Kitchen test suite, it automatically generates a Terraform workspace to use, then destroy it at then end.

Important: There is a gotchya with S3 buckets - their name can only be up to 63 characters long but must be GLOBALLY unique!!! Crazy business! Luckily the Terraform validation will detect this if it happens. You’ll probably need to change the S3 bucket name from iac-env-state to something unique as well!

We are also asking the S3 bucket to be private, with versioning enabled and with default encryption turned on.

readme.md

Update the readme.md to something more appropriate - I’ll leave it up to you :)

test/fixture/main.tf

This is the test fixture that will be invoked to run our test suite against. It simply inherits from our main Terraform code very much like in our example project.

module "remote_state_bucket_test" {
  source = "../.."
}

test/integration/default/inspec.yml

Pretty much the same as for our example project:

---
name: default

test/integration/default/controls/verify_s3_bucket.rb

This is a control file which contains our actual tests. In the main kitchen.yml file you might remember we specified which controls to run for our aws verifier.

# frozen_string_literal: true

control "verify_remote_state_s3_bucket" do
    describe aws_s3_bucket(bucket_name: 'iac-env-state-kitchen-terraform-default-terraform') do
        it { should exist }
        it { should_not be_public }
    end
end

In this test we are simply verifying that the AWS S3 bucket with name iac-env-state-kitchen-terraform-default-terraform exists and is not marked as public. We could write more tests using the available InSpec DSL https://www.inspec.io/docs/reference/resources/aws_s3_bucket/ but for this experiment I’ll leave it with just a couple of simple tests to show the concept.

Test drive locally

We will make some changes to our Jenkinsfile shortly, but we are actually at a stage where we can test drive this code using Kitchen. Run our iac-env image within the terraform-remote folder then execute kitchen converge:

Note: I am deliberately using kitchen converge instead of kitchen verify so we can hop over to AWS and actually see the S3 bucket that Kitchen created before letting it continue with its verification. You can actually just run kitchen verify to converge and run the tests. Using converge lets you adjust Terraform code and refresh it easily during testing.

----------------------------------------------------------------------
█ ▄▀█ █▀▀ ▄▄ █▀▀ █▄░█ █░█
█ █▀█ █▄▄ ░░ ██▄ █░▀█ ▀▄▀ v1.0.0
----------------------------------------------------------------------
https://github.com/MarcelBraghetto/iac-env
Working directory mounted at /iac-env
Enter 'iac-env-help' for help, 'exit' to leave.

$ kitchen converge
-----> Starting Test Kitchen (v2.5.0)
-----> Creating <default-terraform>...
$$$$$$ Verifying the Terraform client version is in the supported interval of < 0.13.0, >= 0.11.4...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.24
       + provider.aws v2.57.0
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Initializing the Terraform working directory...
       Upgrading modules...
       - remote_state_bucket_test in ../..
       
       Initializing the backend...
       
       Initializing provider plugins...
       - Checking for available provider plugins...
       - Downloading plugin for provider "aws" (hashicorp/aws) 2.57.0...
       
       Terraform has been successfully initialized!
$$$$$$ Finished initializing the Terraform working directory.
$$$$$$ Creating the kitchen-terraform-default-terraform Terraform workspace...
       Created and switched to workspace "kitchen-terraform-default-terraform"!
       
       You're now on a new, empty workspace. Workspaces isolate their state,
       so if you run "terraform plan" Terraform will not see any existing state
       for this configuration.
$$$$$$ Finished creating the kitchen-terraform-default-terraform Terraform workspace.
       Finished creating <default-terraform> (0m2.36s).
-----> Converging <default-terraform>...
$$$$$$ Verifying the Terraform client version is in the supported interval of < 0.13.0, >= 0.11.4...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.24
       + provider.aws v2.57.0
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Selecting the kitchen-terraform-default-terraform Terraform workspace...
$$$$$$ Finished selecting the kitchen-terraform-default-terraform Terraform workspace.
$$$$$$ Downloading the modules needed for the Terraform configuration...
       - remote_state_bucket_test in ../..
$$$$$$ Finished downloading the modules needed for the Terraform configuration.
$$$$$$ Validating the Terraform configuration files...
       Success! The configuration is valid.
       
$$$$$$ Finished validating the Terraform configuration files.
$$$$$$ Building the infrastructure based on the Terraform configuration...
       module.remote_state_bucket_test.aws_s3_bucket.remote_state_bucket: Creating...
       module.remote_state_bucket_test.aws_s3_bucket.remote_state_bucket: Creation complete after 7s [id=iac-env-state-kitchen-terraform-default-terraform]
       
       Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
$$$$$$ Finished building the infrastructure based on the Terraform configuration.
$$$$$$ Reading the output variables from the Terraform state...
$$$$$$ Finished reading the output variables from the Terraform state.
$$$$$$ Parsing the Terraform output variables as JSON...
$$$$$$ Finished parsing the Terraform output variables as JSON.
$$$$$$ Writing the output variables to the Kitchen instance state...
$$$$$$ Finished writing the output varibales to the Kitchen instance state.
$$$$$$ Writing the input variables to the Kitchen instance state...
$$$$$$ Finished writing the input variables to the Kitchen instance state.
       Finished converging <default-terraform> (0m17.71s).
-----> Test Kitchen is finished. (0m21.06s)

Ok cool, so what just happened? In basic terms, Kitchen ran our Terraform code via the test/fixture/main.tf module and prepared all the resources that were declared. If I log into my Amazon AWS console and view the S3 buckets you can see the bucket that was created:

Sweet huh? So our tests actually create real AWS resources - keep that in mind!

We can also use the AWS command line tool within our iac-env session to show the list of S3 buckets:

$ aws s3 ls
iac-env-state-kitchen-terraform-default-terraform

Let’s continue our Kitchen test by running kitchen verify.

$ kitchen verify
-----> Starting Test Kitchen (v2.5.0)
-----> Setting up <default-terraform>...
       Finished setting up <default-terraform> (0m0.00s).
-----> Verifying <default-terraform>...
$$$$$$ Reading the Terraform input variables from the Kitchen instance state...
$$$$$$ Finished reading the Terraform input variables from the Kitchen instance state.
$$$$$$ Reading the Terraform output variables from the Kitchen instance state...
$$$$$$ Finished reading the Terraform output varibales from the Kitchen instance state.
$$$$$$ Verifying the systems...
$$$$$$ Verifying the 'aws' system...

Profile: default
Version: (not specified)
Target:  aws://

  ✔  verify_remote_state_s3_bucket: S3 Bucket iac-env-state-kitchen-terraform-default-terraform
     ✔  S3 Bucket iac-env-state-kitchen-terraform-default-terraform is expected to exist
     ✔  S3 Bucket iac-env-state-kitchen-terraform-default-terraform is expected not to be public


Profile Summary: 1 successful control, 0 control failures, 0 controls skipped
Test Summary: 2 successful, 0 failures, 0 skipped
$$$$$$ Finished verifying the 'aws' system.
$$$$$$ Finished verifying the systems.
       Finished verifying <default-terraform> (0m2.46s).
-----> Test Kitchen is finished. (0m3.45s)

Nice! You can see that our verify_remote_state_s3_bucket control was run, and the two assertions it made succeeded.

Finally we should always destroy our Kitchen test when its finished, which would remove or restore whatever resources were being tested:

$ kitchen destroy
-----> Starting Test Kitchen (v2.5.0)
-----> Destroying <default-terraform>...
$$$$$$ Verifying the Terraform client version is in the supported interval of < 0.13.0, >= 0.11.4...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.24
       + provider.aws v2.57.0
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Initializing the Terraform working directory...
       Initializing modules...
       
       Initializing the backend...
       
       Initializing provider plugins...
       
       Terraform has been successfully initialized!
$$$$$$ Finished initializing the Terraform working directory.
$$$$$$ Selecting the kitchen-terraform-default-terraform Terraform workspace...
$$$$$$ Finished selecting the kitchen-terraform-default-terraform Terraform workspace.
$$$$$$ Destroying the Terraform-managed infrastructure...
       module.remote_state_bucket_test.aws_s3_bucket.remote_state_bucket: Refreshing state... [id=iac-env-state-kitchen-terraform-default-terraform]
       module.remote_state_bucket_test.aws_s3_bucket.remote_state_bucket: Destroying... [id=iac-env-state-kitchen-terraform-default-terraform]
       module.remote_state_bucket_test.aws_s3_bucket.remote_state_bucket: Destruction complete after 1s
       
       Destroy complete! Resources: 1 destroyed.
$$$$$$ Finished destroying the Terraform-managed infrastructure.
$$$$$$ Selecting the default Terraform workspace...
       Switched to workspace "default".
$$$$$$ Finished selecting the default Terraform workspace.
$$$$$$ Deleting the kitchen-terraform-default-terraform Terraform workspace...
       Deleted workspace "kitchen-terraform-default-terraform"!
$$$$$$ Finished deleting the kitchen-terraform-default-terraform Terraform workspace.
       Finished destroying <default-terraform> (0m15.19s).
-----> Test Kitchen is finished. (0m16.21s)

After Kitchen has finished, you can see that I no longer have the S3 bucket that it created:

Jenkins build pipeline concurrency

While there are some recommendations to use an AWS database to lock the Terraform state to single user access I’m not going to implement that in my experiment though it sounds like a good idea. Instead we will use Jenkins to provide locking around any stages that run which read or change the remote Terraform state using the Lockable Resources plugin: https://github.com/jenkinsci/lockable-resources-plugin.

Hop over to your Jenkins instance to setup our lockable resource - navigate to Manage Jenkins -> System Configuration then find the Lockable Resources Manager section. Select the Add Lockable Resource and enter the following:

Select the Save button (lower left) to save your changes. We can now refer to the terraform-remote-state as a resource that any build job must wait in line for to use.

Jenkins AWS credentials

We have been able to run Kitchen tests and the AWS command line tools locally because we setup some AWS credentials on our local machine. In our Jenkins build jobs however they will not have access to these credentials. We will add some credentials into Jenkins that can be loaded into our build pipeline for AWS access.

Go into Jenkins and navigate to Credentials -> System -> Global credentials (unrestricted) then select the Add Credentials option.

Enter the following:

Leave the Scope field as default and enter your own AWS secret key ID into the Secret field then select OK.

Create another credential for the AWS access key:

Again, leave the Scope field and enter your own AWS secret access key into the Secret field then select OK.

Note: These credentials will have whatever permissions you have granted them in AWS - be aware that build jobs could use these credentials to create or destroy stuff. There are no doubt more robust mechanisms for securing the secrets for our Jenkins pipelines but for the sake of our local experiment the approach we are taking is fine.

Having these credentials setup in Jenkins will allow us to inject them as environment variables into our build jobs which is how our builds will be able to authenticate with AWS.

Jenkinsfile updates

Update the Jenkinsfile in our terraform-remote folder with some adjustments to use our lockable resource, ensuring that only one build job can do remote Terraform operations at a time:

final def IS_PULL_REQUEST_BUILD = env.BRANCH_NAME.startsWith('PR-')
final def IS_MASTER_BUILD = env.BRANCH_NAME == 'master'
final def DOCKER_IMAGE_NAME = 'iac-env:latest'
final def DOCKER_IMAGE_ARGS = '-it'
final def TERRAFORM_LOCK = 'terraform-remote-state'

pipeline {
    agent none

    stages {
        stage("Start: Verification") {
            agent { docker image: DOCKER_IMAGE_NAME, args: DOCKER_IMAGE_ARGS }

            environment {
                TF_IN_AUTOMATION = 'TF_IN_AUTOMATION'
                AWS_ACCESS_KEY_ID = credentials('jenkins-aws-secret-key-id')
                AWS_SECRET_ACCESS_KEY = credentials('jenkins-aws-secret-access-key')
                AWS_DEFAULT_REGION = 'ap-southeast-2'
            }

            options {
                lock(TERRAFORM_LOCK)
            }

            stages {
                stage("Basic Validation") { steps {
                    sh 'terraform fmt -check -recursive -list=true -diff'
                    sh 'tflint'
                    sh 'terraform init -input=false'
                    sh 'terraform plan -out=tfplan -input=false'
                    sh 'terraform validate'
                    sh 'terraform graph | dot -Tpng > terraform-graph.png'
                    archiveArtifacts artifacts: 'terraform-graph.png'
                }}

                stage("Integration Tests") {
                    when { expression { IS_PULL_REQUEST_BUILD || IS_MASTER_BUILD } }

                    steps {
                        sh 'kitchen verify'
                    }

                    post {
                        always {
                            sh 'kitchen destroy'
                        }
                    }
                }

                stage("End: Verification") { steps {
                    milestone(label: 'End: Verification', ordinal: 1)
                }}
            }

            post {
                cleanup {
                    cleanWs()
                }
            }
        }

        stage("Publish Approval") {
            when { expression { IS_MASTER_BUILD } }

            steps {
                input message: "Deploy these changes?"
            }
        }

        stage("Start: Publish") {
            when { expression { IS_MASTER_BUILD } }

            agent { docker image: DOCKER_IMAGE_NAME, args: DOCKER_IMAGE_ARGS }
            
            environment {
                TF_IN_AUTOMATION = 'TF_IN_AUTOMATION'
                AWS_ACCESS_KEY_ID = credentials('jenkins-aws-secret-key-id')
                AWS_SECRET_ACCESS_KEY = credentials('jenkins-aws-secret-access-key')
                AWS_DEFAULT_REGION = 'ap-southeast-2'
            }
            
            options {
                lock(TERRAFORM_LOCK)
            }

            stages {
                stage("Deploy") {
                    steps {
                        milestone(label: 'Deployment started', ordinal: 2)
                        sh 'terraform init -input=false'
                        sh 'terraform plan -out=tfplan -input=false'
                        sh 'terraform apply -input=false tfplan'
                    }
                }

                stage ("End: Publish") {
                    steps {
                        milestone(label: 'Deployment complete', ordinal: 3)
                    }
                }
            }
            
            post {
                cleanup {
                    cleanWs()
                }
            }
        }
    }
}

The Jenkinsfile is almost the same as before with only some minor differences:

  1. The main stages now have a lock option, associated with our terraform-remote-state lockable resource.
  2. The environment block in each build section now injects the AWS credentials to use during the builds - allowing our build tools to authenticate with AWS.

Commit code

git init
git add --all
git commit -m "Initial Commit"
git remote add origin http://localhost:7990/scm/ic/terraform-remote.git
git push -u origin master

The terraform-remote repository should now be populated with all the code from our example project:

Update Bitbucket

Our terraform-remote repository will need the same configuration applied to it as our example project, in particular:

  1. Setup branch permissions to stop changes other than via pull requests.
  2. Add jenkins user with Read access.
  3. Setup the Post Webhook to fire events to Jenkins.

This was well explained in an earlier article so just follow those steps now to apply the changes.

Add Jenkins job

We need to add a Jenkins multi branch pipeline job for our new repository also or no builds will happen.

In Jenkins, select New Item, with the name terraform-remote and type of Multibranch Pipeline.

Next enter the following details:

Add Source -> Bitbucket:

Hit Save and immediately Jenkins will spring to life and start building our master branch:

This time, the integration tests actually created and destroyed a real AWS S3 bucket. The master branch build will now be waiting for our permission to Proceed or Abort to genuinely deploy our S3 bucket to AWS. Let’s hit Proceed and see if the stars align and we end up with an S3 bucket!

Nice! And if we look in our actual AWS S3 console page we can see our new bucket waiting to be used:

From this point on, we can still run our integration tests but they won’t collide with our ‘production’ S3 bucket. Note also that our bucket has the suffix default which is the Terraform workspace that was used to publish it. In the future we could use Terraform workspaces in a more advanced way to create dev, staging, prod etc.

Thanks for coming!

I’ll stop this experiment at this point, next steps would be to use some of these concepts to start provisioning more infrastructure and adopt more resilient workflows amongst teams of people.

I’d recommend reading/watching this link again https://www.hashicorp.com/resources/evolving-infrastructure-terraform-opencredo/ which I found insightful about how to roadmap some of the activities that are probably needed to get to a place where the code can be automated and changed by teams of people.

Source code can be found at: https://github.com/MarcelBraghetto/iac-env

End of part 6