Infrastructure as Code (IaC): How to Implement Best Practices
For the longest time, infrastructure was left out of this practice. We managed servers manually in our data centers until configuration management and infrastructure as code (IaC) was introduced.
In the day of modern computing, repeatability, auditability, and simplicity are core tenets of many flourishing tech companies. Application code is stored in Git or SVN, and code reviews are performed. Deployments happen in an automated fashion, enabling developers to move software and features to 'done,' thereby delivering value to whomever the customers are. For the longest time, infrastructure was left out of this practice. We managed servers manually in our data centers until configuration management and infrastructure as code (IaC) was introduced.
Today, most of the world's infrastructure is being hosted in data centers owned by cloud providers, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. As we launch new infrastructure, we're actually interacting with robust APIs behind the scenes that each platform provides. This opens the door to a totally new paradigm. Infrastructure can be deployed and configured through API calls, which provides a gap for Terraform, CloudFormation, Pulumi, and other IaC tools to fill!
What is Infrastructure as Code (IaC)?
Now that we understand the progression of how infrastructure is managed by businesses, let's talk about what Infrastructure as Code is, exactly.
Just as we check our application code into Git repos, which serves as the source of truth for all things, we can describe the shape and characteristics of our cloud infrastructure using code. In Terraform, this means writing HCL (HashiCorp Configuration Language) and checking that code into source control.
When infrastructure changes are desired, one simply opens up the Pull Request to a "Terraform" Git repo, goes through the review process, and then the changes will be eventually be applied to the cloud platform or platforms of your choice (hybrid cloud).
You might ask, how does a tool like Terraform keep track of what it manages, and how does it prevent multiple engineers from applying their changes at the same time? Typically, in the case of AWS, one would have an S3 bucket that's designated for "remote state." This is where CloudFormation and Terraform store their view of the infrastructure managed for a given set of resources. To prevent multiple engineers from applying changes at the same time, one might use Terraform's DynamoDB state locking, which simply uses DynamoDB as a locking mechanism.
This is all to drive towards the reliable, repeatable, secure, and auditable representation of the business' cloud infrastructure.
What Problems Does Infrastructure as Code Solve?
I can personally share some experiences managing on-premise and cloud infrastructure. Setting up infrastructure manually in the AWS console or even via one-off scripts is cumbersome and not very scalable, and configuration drift happens so subtly that it's easy to lose track of deviations without a tool that tracks the lifecycle of one's dynamic infrastructure.
Using a tool like Terraform, in combination with version control, we're able to track each change that gets made to our infrastructure. Previously, before IaC, we'd frequently use a configuration management tool such as Puppet or Ansible and bend it to solve problems it wasn't best equipped to solve.
In today's businesses, new services need to be launched rapidly, and the time between launching the first environment and production is getting shorter and shorter, as developer productivity increases. In order for DevOps to scale, we need tools that provide us with repeatable, scalable, and transparent ways of managing our cloud infrastructure. Standing up a second environment should be a matter of copying a variables file, tweaking it slightly, and then deploying it. This greatly contrasts with the old way of doing things: going through the console, clicking around, and hoping you didn't miss anything while configuring a plethora of cloud resources with a multitude of potential configuration options. Infrastructure as code (IaC) is the way to make managing distributed and complex infrastructure sane.
Benefits and Best Practices of Using Infrastructure as Code
The power of infrastructure as code comes from the agility it imbues engineers with. You can't copy and paste SQS queues, RDS instances, Redshift clusters, etc. in the console, and managing them directly via the SDKs ourselves isn't the best use of time. With IaC, you can leverage open source modules on GitHub, as well as in the HashiCorp Terraform Registry. This significantly reduces the amount of HCL code to launch new components in your infrastructure. In terms of applying your changes, like other configuration management tools, Terraform is capable of using multiple threads to deploy your infrastructure in a concurrent fashion, thereby reducing the amount of time it takes to actually deploy changes. The syntax is also much more declarative instead of one giant JSON or YAML blob that's difficult or cumbersome to work with. How many times have we seen an issue due to malformed JSON?
Configuration drift across multiple environments can be a nightmare for DevOps and software engineers alike. Questions such as, "Why does this work fine in staging, but not production?" can become a thing of the past with CloudFormation, Terraform, and other infrastructure automation tools. When leveraging a GitOps style tool to roll out changes to your managed resources, you can view the "plan" and "results" of your changes once applied. This information can be referred to later on, should any issues come up.
In modern businesses, the word accountability can mean many things. Many of us strive to embody a blameless culture in our businesses, but this doesn't mean accountability isn't important.
Since IaC lives in source control, as the single source of truth, we can track who made a specific change to a given infrastructure component, just like we may git blame a line of code our applications after a regression is introduced. This fosters better communication since half the battle is knowing who made the change, so we can understand why.
More Efficiency During the Whole Software Development Life Cycle
Development teams are gaining velocity through new tools, streamlined processes, and more robust Continuous Integration (CI) pipelines. Without infrastructure as code, it can be challenging to keep up with the feature velocity of your teams. How often have you heard, "we need 3 environments stood up by tomorrow for this new application." Using a tool like Terraform, AWS CloudFormation, or Pulumi, we can write the infrastructure component code once, and use dynamic naming to avoid naming conflicts between environments. What once took 3 days to do in the AWS console can now be accomplished in less than a few hours. Who doesn't love feeling empowered to do more with less time! Another great benefit to infrastructure is code is that developers can be more involved in the review process. A great example case would be with an SQS queue or Kinesis stream. There are quite a few configurations at the infrastructure level that directly impact the behavior of the application. Using standard pull requests, we can gain confidence in our infrastructure changes right at the beginning. Getting feedback sooner is paramount in true Continuous Deployment (CD), as Dave Farley called out in his book, "Continuous Delivery."
When all of your cloud infrastructure is managed in a single place, it makes cost analysis of infrastructure much easier to rationalize. Depending on the cloud vendor and IaC tool you choose to use, there are plugins out there, such as InfraCost which calculates the cost impact of each modification made in the infrastructure as code source control repository. No more needing to manually use a cost calculator for each change set that invariably was spurred from another team. This quick feedback loop approach ties back into the speed and efficiency benefits provided by IaC as well!
Something that isn't frequently highlighted but is a great benefit to using a cloud vendor agnostic IaC tool, is the number of external systems the tool works with. Configuration drift spans beyond just Amazon Web Services, Google Cloud Platform, and Microsoft Azure. How do we update our observability systems to reflect the changes to our infrastructure? How do we create service accounts in Kubernetes? IaC tools like Terraform allow us to centrally manage so much more than just our cloud computing resources. I personally have used Terraform to manage Auth0 settings, Datadog monitors and maintenance modes, PagerDuty configurations, and much more. Take a look at the list of providers offered by Terraform to get a clearer picture of the power these tools imbue our teams with.
Depending on the extent that a company aims to embody the spirit of DevOps, opening up IaC to developers may be something you're interested in. DevOps may be the gatekeepers for the Terraform or CloudFormation version control repositories, but ultimately, with the right guardrails, change control processes, and tooling/automation, you can have developers, SREs, and DevOps contribute to the same declarative infrastructure definitions. Imagine if developers could add support for that new SQS queue or other infrastructure components all on their own!
Infrastructure as Code: The Conclusion
To summarize the contents of this article, IaC tools such as Terraform, Pulumi, and CloudFormation can help give your team a multitude of benefits ranging from speed, security, reliability, and more. Implementing a tool like this can greatly improve the productivity of your teams as they invariably need cloud infrastructure to meet the business' technical needs. It's important to note that choosing the right tool from the beginning will help keep doors unlocked for you going forward as you introduce new third-party solutions and APIs.
For some further reading, why not take a look at our Terraform 201 Tutorial? Chock full of info on a topic you're already researching!