mirror of
https://github.com/apricote/presentations.git
synced 2026-01-13 13:01:03 +00:00
8.2 KiB
8.2 KiB
Theory and Practice
Content
- About me
- Infrastructure as Code
- What is Terraform?
- narando Backstory
- Terraform at narando
- What is planned?
- Recap
About me
Julian Tölle
Developer @ narando & TrackCode
Backend Development & DevOps
--
About us
- ~ 5 developers
- crowd-sourced audio production for blogs
- Node.js on AWS
Infrastructure as Code
Domain
- Provisioning servers
- Configuring databases
- Firewall rules
- DNS
- configuring and deploying applications
--
Infrastructure as Code
Before IaC
- servers and applications are configured manually
- new environments take days/weeks to be running
- copy&paste
- search&replace
--
Infrastructure as Code
Definition
Infrastructure as code (IaC) is the process of managing [...] data centers through machine-readable definition files, rather than [...] interactive configuration tools.
From Wikipedia
--
Infrastructure as Code
Benefits
- automation saves (expensive) time
- avoids human errors
- easier to review/verify
--
Infrastructure as Code
Tools
- Chef
- Puppet
- Ansible
- AWS CloudFormation
- Terraform
What is Terraform?
- open-source toolchain for IaC
- declarative approach
- providers for every big cloud
--
What is Terraform?
HCL
provider "aws" {
access_key = "ACCESS_KEY_HERE"
secret_key = "SECRET_KEY_HERE"
region = "eu-central-1"
}
resource "aws_instance" "www" {
ami = "ami-2757f631"
instance_type = "t2.micro"
}
--
What is Terraform?
Resource Graph
resource "aws_instance" "www" {
ami = "ami-2757f631"
instance_type = "t2.micro"
}
resource "cloudflare_record" "www" {
domain = "example.com"
type = "A"
value = "${aws_instance.www.public_ip}"
}
--
What is Terraform?
Modules
*.tfin a folder- share a namespace
- input is called
variable - output is called
output
variable "env" {
type = "string"
default = "prod"
}
output "public_ip" {
type = "string"
value = "${aws_instance.www.public_ip"
}
--
What is Terraform?
State
- contains all managed resources - "why does the container not receive traffic?"
- might be AWS specific
- used to produce "execution plan"
- JSON format
- can be local or remote (S3, TF Enterprise, etc.)
--
What is Terraform?
terraform plan
- Verify variables
- Load state
- Refresh resources
- Diff state against desired state
- Plan actions to reach desired state
--
What is Terraform?
terraform apply
- Load generated plan
- Apply actions in sequence
narando Backstory
Development History
- 2014-2015: initial app
- 2016: Maintenance Mode
- January 2017: Rewrite
- ~ May 2017: live traffic for new services
--
narando Backstory
Initial App
- Built by previous co-founder
- Ruby on Rails
- Heroku
- Addons for DB, Cache, DNS, Logs, Monitoring
- Only prod environment
"Let's test this in production!"
--
narando Backstory
Rewrite
- Requirement: multiple environments
- Node.js services
- Docker + Elastic Container Service
- All resources in AWS
Terraform at narando
Initial Struggles
-
AWS resources managed by hand
-
scripts for deployments
-
manually duplicating setup for prod
- new system went live in May 2017
- modifications became dangerous
--
Terraform at narando
Rev. I - Proof of Concept
- one file with all resources
- ECS, EC2, VPC, RDS, Elasticache, IAM, ALB, Route53
- each with 1~5 TF resources
- only prod environment
- 554 LOC
--
Terraform at narando
Rev. II - Adding Structure
modules/- encapsulates concern
- "central" modules like
modules/vpc,modules/cluster,modules/iam - "once-per-service" -
module/service- Container definition
- Loadbalancer Target
- DNS Entry
- Firewall Rules
--
Terraform at narando
Rev. II - Adding Structure
env/$ENV- includes all modules
- root for state
- difference between environments
- prod: Elasticache
- dev: custom Redis
- same inputs/outputs
- can be easily swapped
--
Terraform at narando
Rev. II - Directory Tree
├── env
│ ├── dev
│ │ ├── main.tf
│ │ ├── services.tf
│ │ └── vars.tf
│ └── prod
│ └── same as dev
└── modules
├── cluster
│ ├── vars.tf
│ ├── outputs.tf
│ └── main.tf
├── service
├── dns
└── vpc
--
Terraform at narando
Rev. II - services.tf
- one
module/serviceper service - similar services
- db
- cache
- hostname
- container definition
- different services in dev/prod
--
Terraform at narando
Rev. III - Splitting of services.tf
- services became more custom, more resources
- moved into own module
- 1 file per service
└── env
└── dev (prod)
└── services
├── feed-fetcher.tf
├── feeds.tf
├── notifier.tf
├── narrator.tf
├── publisher.tf
├── vars.tf
└── outputs.tf
--
Terraform at narando
Pain Points (Structure)
- state updates are slow
- infrastructure is seperate from code (+CI/CD)
- not all values are auto-generated(/-filled)
- service definitions duplicated
--
Terraform at narando
Pain Points (Process)
- manual
applyby me- repo might differ from deployed infrastructure
- CI does not create plans to review
- reviewers need to do this themselves
- iteration is slow
- missing
playgroundenvironment - smoke tests?
- missing
--
Terraform at narando
Pain Points (Terraform)
- not all service providers have terraform providers
- drone / CICD
- gitea / VCS
- pointDNS
- some validations are only done during
apply
What is planned?
Upgrade Modules to support cross-repo setups
- Using
dataresources and naming schemas - allows moving service-specific code into repos
data "aws_ecs_cluster" "cluster" {
cluster_name = "${var.env}"
}
--
What is planned?
Move service definitions into service repos
- integrated with usual review and deployment processes
- improves state update time
- single PR for related changes
- e.g. implement a new batch task (service repo)
- and configure the cron trigger (currently infrastructure repo)
--
What is planned?
Automatic apply
- implement CD for core infrastructure repo
- probably with manual review at first
--
What is planned?
Move Root DNS
- to Service Provider with Terraform Provider
- AWS / OVH / Cloudflare
- less manual work
Recap
Things I like
- Exhaustive documentation
- huge list of offical providers
- community-built providers for new products
- declarative approach like K8s
- minial code, all configuration
- Hcl looks like Go and feels like Go
--
Recap
Things I do not like
- side-effects between resources are not always documented
- searching in the dark for errors
- "why does the container not receive traffic?"
- might be AWS specific
- lots of concepts to learn
- failures do not roll back
- changes must be self-contained and backwards compatible
