Skip to content

More Than Certified GitOps 2024 MiniCamp

License

Notifications You must be signed in to change notification settings

3ware/gitops-2024

Repository files navigation

More Than Certified GitOps MiniCamp 2024

The main purpose of this mini camp is to build a GitOps pipeline to deploy resources, managed by terraform, to AWS using GitHub Actions.

semantic-release: conventionalcommits GitHub release issues - workflows CI

Table of contents

Table of contents

Requirements

Expand to see requirements
Section Task Self-Reported Status Notes
Setup
Main branch is protected
Cannot merge to main with failed checks
State is stored remotely
State Locking mechanism is enabled
Design and Code
Confirm Account Number allowed_account_ids provider argument
Confirm Region variable validation
Add Default Tags added to provider block
Avoid Hardcoded Values
No plaintext credentials Environment variables set by OIDC
Pipeline in GitHub Actions only
Validate
terraform fmt pre-commit hook Git Hooks managed by trunk-io
pre-commit hooks are in repo Git Hooks managed by trunk-io
Test and Review
Pipeline works on every PR on: pull_request trigger
Linter TFLint configured with aws plugin and deep check
terraform fmt See PR #5
terraform validate See PR #5
terraform plan See PR #5
Infracost with comment See PR #4
Open Policy Agent fail if cost > $10 See PR #6
Deploy
terraform apply with human intervention Applied when PR is merged
Deploy to production environment Matrix strategy
Operate and Monitor
Scheduled drift detection
Scheduled port accessibility check
Readme
Organized Structure
Explains all workflows
Link to docs for each action
Contribution Instructions
Explains merging strategy
Bonus
Deploy to multiple environments See PR #35
Ignore non-terraform changes Workflow trigger use paths filter for tf and tfvars files.
Comment PR with useful plan information See PR #7
Comment PR with useful Linter information See PR #5
Open an Issue if Drifted See Issue #20
Open an issue if port is inaccessible
Comment on PR to apply See PR #32

Workflow

  • Create feature branch off main
  • Commit change locally and push to remote
  • Create a draft pull request that targets the main branch: gh pr create --draft --base main

Important

Pull Requests must be set to draft to prevent CODEOWNER reviewers being assigned until the pull request is ready. This cannot be set by default. See open discussion. Unfortunately this also cannot be automated because action runners, using GITHUB_TOKEN for authentication, are unable to run gh pr ready --undo as the integration is unavailable. See open discussion

  • The workflow will run through the tests (fmt, validate, TFLint), then run terraform plan and post the plan to the pull request and workflow job summary.
  • To approve the plan, approve the pull request and add the pull request to merge queue.

When to apply?

The debate rumbles on. The merge queue does a pretty good job of addressing this. If apply is triggered using the merge_group event, the workflow will attempt to apply the plan from the PR and then merge the PR. If the apply fails for any reason, then the PR is not merged.

Directories vs Workspaces for multiple environments

Another debate. The best argument I have heard for directories was in the Q&A session on 19/10/2024:

"anyone should be able to cd into a terraform working directory and simply run terraform plan without have to worry about workspaces and variable files"

The workflow uses changed-files to find the directories containing terraform changes. The output of this job is used to define the matrix strategy for the terraform workflow.

Each directory is mapped to an environment which achieves 2 things:

  • Secrets, in the case, the AWS roles, are stored in the environment - not the repository.
  • Deployments to production require additional approval.

Branching Strategy

---
config:
  theme: base
---
gitGraph
  commit id: "prev" tag: "v1.0.0"
  branch feature
  switch feature
  commit id: "Terraform Changes"
  commit id: "Bug Fix"
  commit id: "Plan Diff fix"
  switch main
  merge feature
  commit id: "new" tag: "v1.1.0"
Loading

Diagram

---
config:
  look: handDrawn
  theme: neo
---
flowchart LR
  subgraph Fail Checks
    direction LR
    Fail("`**Fail Required Checks**
    PR Cannot be merged`")
  end
  subgraph Pass Checks
    direction LR
    noTFPass("`**Met Required Checks**`") -->merge(Merge PR to main branch)
  end
  subgraph Pass Terraform Checks
    direction LR
    TFPass("`**Met Required Checks**
        Add to Merge Queue`") -->apply{terraform apply}
    apply -->dev(Development) -->prd(Production) -->tfMerge(Complete Merge)
    tfMerge -->docs(Run terraform-docs) -->rel(Generate a release)
    apply -->|Fail|Fail
  end
  subgraph Infracost
    direction LR
    ic{"`**Infracost**
        Infracost fail if > $10`"} -->|Fail|Fail
  end
  subgraph Targets
    direction LR
    target{"`**Terraform Targets**
        Search for terraform changes and output the directory name(s)`"} -->|No Changes|noTFPass
  end
  subgraph Deploy Development
    direction LR
    devSetup("`**Setup**
        AWS Credentials
        Install and Initialise TFLint
        with AWS Plugin`") -->
    devValidate{"`**Validate**
        terraform fmt
        terraform validate
        tflint`"} -->|Fail|Fail
    devValidate -->|Pass|devPlan(terraform plan)
  end
  subgraph Deploy Production
    direction LR
    prdSetup("`**Setup**
        AWS Credentials
        Install and Initialise TFLint
        with AWS Plugin`") -->
    prdValidate{"`**Validate**
        terraform fmt
        terraform validate
        tflint`"} -->|Fail|Fail
    prdValidate -->|Pass|prdPlan(terraform plan)
  end
PR(Draft Pull Request) -->target & ic
target -->|Job Matrix|devSetup
devPlan -->prdSetup
prdPlan -->|Approve PR|TFPass
Loading

Workflows

Actions Used

Infracost

Infracost runs on pull requests when they are opened or synchronized. The workflow generates a cost difference of the resources between the main branch and the proposed changes on the feature branch.

This workflow also flags any policy violations defined in infracost-policy.rego. See an example in this pull_request

Terraform CI

Targets

The initial job of the workflow uses changed-files to output the directories where terraform changes have been made. This output is uses ad the matrix strategy for the deploy job.

Validate

Uses a matrix strategy to run in each directory identified in the targets job.

Important

The strategy has a max-parallel value of 1, which means the jobs are run sequentially.

  • Setup AWS credentials using config-aws-credentials using OIDC to assume a role and set the authentication parameters as environment variables on the runner. This step is required when TFLint deep checking for the AWS rule plugin is enabled.
  • Install terraform using setup-terraform. Despite being installed on the runners, apply jobs were failing due to version differences between the apply runner and the plan runner
  • Run terraform fmt
  • Run terraform init
  • Run terraform validate
  • Install TFLint using setup-tflint
  • Initialise TFLint to download the AWS plugin rules.
  • Run tflint
  • Update the PR comments if any of the steps fail and exit the workflow on failure.
Plan

When the validation steps have succeeded - a terraform plan will be run. The conditional statement runs plan on a pull_request event. The workflow uses TF-via-PR. This action adds a high level plan and detailed drop down style plan to the workflow summary and updates the pull request with a comment.

Apply

After terraform plan has been run, assuming the plan is accurate, approve the PR, and click merge when ready. This adds the pull request to the merge queue. The conditional statement in the workflow will run terraform apply on a merge_group event.

Enforce All Checks

The only required check for the pull request.

Uses Wait for Status Checks to poll the checks API for the status of the other running checks. This helps to overcome the situation where a required check may not run. For example, we could make Terraform CI a required check but, this workflow may not run (so it is skipped) and consequently the required check is not met. This workflow will detect that Terraform CI has been skipped and return an outcome of successful for itself, so the required check passes.

Terraform Docs

Terraform docs will run when the pull request is merged. This only needs to run once, following the apply, and not on every commit to a pull request. Updating the README on every commit generates a lot unnecessary commits and you have to pull the updated README prior to the next push to avoid conflicts.

I use my own Terraform Docs reusable workflow which adds job summaries and verified commits to the terraform-docs gh-action.

Release

Generate a CHANGELOG and version tag using semantic release

To do list

  • Grafana Port Check
  • Fix drift detection for multiple environments

Contributions

  • Special mention to the maintainer of TF-via-PR for responding to queries quickly and proactively suggesting workflow improvements.