presentations/terraform/terraform.md
2019-03-28 16:18:11 +01:00

8.2 KiB

terraform logo

Theory and Practice


Content

  • About me
  • Infrastructure as Code
  • What is Terraform?
  • narando Backstory
  • Terraform at narando
  • What is planned?
  • Recap

About me

picture of me

Julian Tölle
Developer @ narando & TrackCode
Backend Development & DevOps

--

About us

narando Logo

  • ~ 5 developers
  • crowd-sourced audio production for blogs
  • Node.js on AWS

Infrastructure as Code

Domain

  • Provisioning servers
  • Configuring databases
  • Firewall rules
  • DNS
  • configuring and deploying applications

--

Infrastructure as Code

Before IaC

  • servers and applications are configured manually
  • new environments take days/weeks to be running
  • copy&paste
  • search&replace

--

Infrastructure as Code

Definition

Infrastructure as code (IaC) is the process of managing [...] data centers through machine-readable definition files, rather than [...] interactive configuration tools.

From Wikipedia

--

Infrastructure as Code

Benefits

  • automation saves (expensive) time
  • avoids human errors
  • easier to review/verify

--

Infrastructure as Code

Tools

  • Chef
  • Puppet
  • Ansible
  • AWS CloudFormation
  • Terraform

What is Terraform?

  • open-source toolchain for IaC
  • declarative approach
  • providers for every big cloud

--

What is Terraform?

HCL

provider "aws" {
  access_key = "ACCESS_KEY_HERE"
  secret_key = "SECRET_KEY_HERE"
  region     = "eu-central-1"
}

resource "aws_instance" "www" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
}

--

What is Terraform?

Resource Graph

resource "aws_instance" "www" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
}

resource "cloudflare_record" "www" {
  domain = "example.com"
  type   = "A"
  value  = "${aws_instance.www.public_ip}"
}

--

What is Terraform?

Modules

  • *.tf in a folder
  • share a namespace
  • input is called variable
  • output is called output
variable "env" {
  type    = "string"
  default = "prod"
}

output "public_ip" {
  type  = "string"
  value = "${aws_instance.www.public_ip"
}

--

What is Terraform?

State
  • contains all managed resources - "why does the container not receive traffic?"
    • might be AWS specific
  • used to produce "execution plan"
  • JSON format
  • can be local or remote (S3, TF Enterprise, etc.)

--

What is Terraform?

terraform plan

  1. Verify variables
  2. Load state
  3. Refresh resources
  4. Diff state against desired state
  5. Plan actions to reach desired state

--

What is Terraform?

terraform apply

  1. Load generated plan
  2. Apply actions in sequence

narando Backstory

Development History

  • 2014-2015: initial app
  • 2016: Maintenance Mode
  • January 2017: Rewrite
  • ~ May 2017: live traffic for new services

--

narando Backstory

Initial App

  • Built by previous co-founder
  • Ruby on Rails
  • Heroku
    • Addons for DB, Cache, DNS, Logs, Monitoring
  • Only prod environment

"Let's test this in production!"

--

narando Backstory

Rewrite

  • Requirement: multiple environments
  • Node.js services
  • Docker + Elastic Container Service
  • All resources in AWS

Terraform at narando

Initial Struggles

  • AWS resources managed by hand

  • scripts for deployments

  • manually duplicating setup for prod

  • new system went live in May 2017
  • modifications became dangerous

--

Terraform at narando

Rev. I - Proof of Concept

  • one file with all resources
    • ECS, EC2, VPC, RDS, Elasticache, IAM, ALB, Route53
    • each with 1~5 TF resources
  • only prod environment
  • 554 LOC

--

Terraform at narando

Rev. II - Adding Structure

  • modules/
    • encapsulates concern
    • "central" modules like modules/vpc, modules/cluster, modules/iam
    • "once-per-service" - module/service
      • Container definition
      • Loadbalancer Target
      • DNS Entry
      • Firewall Rules

--

Terraform at narando

Rev. II - Adding Structure

  • env/$ENV
    • includes all modules
    • root for state
    • difference between environments
      • prod: Elasticache
      • dev: custom Redis
      • same inputs/outputs
      • can be easily swapped

--

Terraform at narando

Rev. II - Directory Tree

├── env
│   ├── dev
│   │   ├── main.tf
│   │   ├── services.tf
│   │   └── vars.tf
│   └── prod
│       └── same as dev
└── modules
    ├── cluster
    │   ├── vars.tf
    │   ├── outputs.tf
    │   └── main.tf
    ├── service
    ├── dns
    └── vpc

--

Terraform at narando

Rev. II - services.tf

  • one module/service per service
  • similar services
    • db
    • cache
    • hostname
    • container definition
  • different services in dev/prod

--

Terraform at narando

Rev. III - Splitting of services.tf

  • services became more custom, more resources
  • moved into own module
  • 1 file per service
└── env
    └── dev (prod)
        └── services
            ├── feed-fetcher.tf
            ├── feeds.tf
            ├── notifier.tf
            ├── narrator.tf
            ├── publisher.tf
            ├── vars.tf
            └── outputs.tf

--

Terraform at narando

Pain Points (Structure)

  • state updates are slow
  • infrastructure is seperate from code (+CI/CD)
  • not all values are auto-generated(/-filled)
  • service definitions duplicated

--

Terraform at narando

Pain Points (Process)

  • manual apply by me
    • repo might differ from deployed infrastructure
  • CI does not create plans to review
    • reviewers need to do this themselves
  • iteration is slow
    • missing playground environment
    • smoke tests?

--

Terraform at narando

Pain Points (Terraform)

  • not all service providers have terraform providers
    • drone / CICD
    • gitea / VCS
    • pointDNS
  • some validations are only done during apply

What is planned?

Upgrade Modules to support cross-repo setups

  • Using data resources and naming schemas
  • allows moving service-specific code into repos
data "aws_ecs_cluster" "cluster" {
  cluster_name = "${var.env}"
}

--

What is planned?

Move service definitions into service repos

  • integrated with usual review and deployment processes
  • improves state update time
  • single PR for related changes
    • e.g. implement a new batch task (service repo)
    • and configure the cron trigger (currently infrastructure repo)

--

What is planned?

Automatic apply

  • implement CD for core infrastructure repo
  • probably with manual review at first

--

What is planned?

Move Root DNS

  • to Service Provider with Terraform Provider
    • AWS / OVH / Cloudflare
  • less manual work

Recap

Things I like

  • Exhaustive documentation
  • huge list of offical providers
  • community-built providers for new products
  • declarative approach like K8s
  • minial code, all configuration
  • Hcl looks like Go and feels like Go

--

Recap

Things I do not like

  • side-effects between resources are not always documented
    • searching in the dark for errors
    • "why does the container not receive traffic?"
    • might be AWS specific
  • lots of concepts to learn
  • failures do not roll back
    • changes must be self-contained and backwards compatible