This is a complete overview of Verified GitOps. We will start with an overview of GitOps, the benefits, and the drawbacks. Next, we will take a deeper dive into what GitOps actually looks like in practice. Lastly, we will go over what Verified GitOps is and how it looks in practice.
Overview of GitOps
Containerization! Cloud-Native! CI/CD! Other marketing buzzwords!
It’s amazing how most technological concepts start and grow, starting with engineers, eventually becoming a best practice, leading to marketing and sales buzzwords, and then management enforces the best practices on their teams, which then leads to other technological concepts to solve issues or gaps that exist. A vicious cycle that defines how OOP became popular, functional programming, cloud computing, virtualization, and many others.
One technological concept which was practiced before it was officially termed is GitOps. The idea of keeping a codified version of something in a source code management solution using git has been a part of Application Development for a very long time.
Eventually, users began leveraging declarative languages to represent the infrastructure in code in order to make provisioning infrastructure more scalable, repeatable, and reliable. What this has lead to, which is where the terminology came into the market, is when Weaveworks created a tool that allows you to leverage the declarative application code for Kubernetes (Kubernetes Specs, Helm, Kustomize, KSonnet) and ensure that what is in the code is accurately represented in the cluster.
This leads to some major benefits for the SDLC process, specifically allowing the user to quickly spin up a new cluster in a Disaster Recovery scenario.
Breakdown of GitOps
Let’s quickly breakdown what GitOps looks like currently:
The whole process starts with the Engineer that makes a change to their application or infrastructure specification code, which then syncs to their git repo through a commit (Pull Request), then, depending on the method of GitOps Sync, either a webhook pushes the changes, or an agent/operator pulls the changes to the application or infrastructure. This allows for the git repo to be the source of truth, and the application/infrastructure is a reflection of the git repo.
Benefits of GitOps
There are some significant benefits to GitOps, especially over what came before:
Leverage Developer Native Tooling
Because developers are already using Git for their application code, and maybe even Infrastructure code, allowing them to handle deployment specifications in their git repo makes it easy to train and use.
Because the infrastructure and application specification code is stored in a git repo, if a cluster goes down, everything can be spun up very quickly because it already exists in code.
Since engineers are already required to authenticate into their git repo, that authentication and authorization can be leveraged for deployment as well.
Git Blame and Versioning
If the declarative code for application configurations and infrastructure is stored in git, then you have native versioning of the code, which allows for reverting as needed, and also the ability to find out who made the changes.
Easier Knowledge Transfer
Since everything exists in a directory structure for git, it is easy to see how things relate to other applications or files.
Issues in GitOps
Auto-deploying to the infrastructure or application the moment that the git repo is updated is not a good process for any environment above Development.
A solution for this is making an air-gap between each repo and manually pushing the change up the repo chain when it is approved.
By defaulting to Helm/Kustomize/Kubernetes Specs and letting those files dictate what is deployed and how, it is near-impossible to add things like pauses, approvals, testing, etc. in the process since there is no way of representing those requirements natively.
A solution for this is to set up the air-gap model mentioned before which allows you to move from one environment deployment to the other by implementing a manual pause.
When you get to the point of setting up the git repo to best integrate with a GitOps process, the use of a tool for GitOps is almost pointless, other than getting a visual understanding of what is being deployed and its current state.
Additional tooling might represent a vertical definition of GitOps.
Advanced technological understanding of the Application, Infrastructure, and Architecture is required to make sure that the git repo is ready for GitOps processes, which alienates anyone who is not well versed in the underlying technology.
The only solution for this is to bring in more people with advanced knowledge and build out tribal knowledge internally.
Current GitOps oriented tooling only supports Kubernetes.
If the company is using anything else (serverless, other container orchestration tools, traditional applications, etc.) then they need to set up multiple tools and processes.
Automated Rollbacks result in Configuration Drift.
If the solution being used offers automated rollbacks, then the configuration in the infrastructure or application will be the previous version, compared to the git repo.
The potential workaround is to not have automated rollbacks, but rather that a notification is sent when there is an issue, requiring an end-user to revert the changes in the git repo, which will then sync the changes down.
Summary of High-Level Gitops
Being able to codify the infrastructure and application specifications can be very useful. The ability to have repeatable configurations without the need for tribal knowledge is not just helpful in GitOps, it is one of the main reasons why technologies like Kubernetes are so popular and heavily used!
Onboarding new people also becomes easy, since the configuration code is declarative and easily readable, allowing those with access to the Git repo to easily see what the environment is intended to look like.
Yet, even with these benefits, there are significant drawbacks to GitOps which make it hard to implement properly at an enterprise scale.
Verified GitOps in Practice
The GitOps process is intended to start when a user makes a change to the configuration code that is stored in the git repo, which then executes an auto-sync between the git repo and the environment. This allows for the git repo to be the source of truth on what should be in the end environment.
The Beginning of GitOps in Practice
At this point, everything around GitOps has been hypothetical, but what does it look like in practice?
To start your GitOps process, there are three prerequisites:
First, you will nee a working version of your Kubernetes manifests.
Kustomize in VSCode
A cluster that is either set up or Infrastructure-as-code to set up a cluster that is tested and approved.
Cluster setup on Minikube
A solution that will enable you to synchronize your git repo with the cluster.
VSCode RunOnSave Extension
The idea of GitOps is to allow the git repo to be the source of truth and enable an auto-sync (auto-deploy) to the endpoint (cluster) whenever a file has been committed to the repository. In this example, the saving of the file emulates the committing of the file to the repo, which then triggers the auto-sync (you can run something like a GitHub Action for this in the git repo).
Original Kustomize file
Changed file and save
Auto-sync of the kustomize file with the cluster
The outcome of the auto-sync
This is a rather basic setup of GitOps, where the main point is to auto-sync to the cluster when the file is changed. There is another part of GitOps which is an appliance that does a check on what is in the cluster and what is in the git repo to make sure that the two are equal.
This is commonly seen as an operator, which does an internal audit between what is in the cluster and what is in git. There could also be an external solution that accomplishes the same process as the operator.
What is not shown here is the Infrastructure-as-Code process of GitOps, but there are little to no solutions that implement this capability, especially when considering the full context.
The furthest any solution would be able to go, with regards to Infrastructure-as-Code, would be basic infrastructure configuration changes (i.e. services, configmaps, secrets, ingress, etc.).
The other option would be to have the pre-approved and tested Infrastructure Code stored in the git repo, which would allow for disaster recovery processes to be more easily done, or simply the desire to scale the infrastructure.
Benefits of GitOps in Practice
Some of the benefits of GitOps were discussed in the last blog post. This list will specifically address the benefits that can be seen from the practical walkthrough that was done.
One such benefit is the ability to trigger a deployment via automated syncing is a very simple and powerful process. Through the use of the git repo, the engineers would have a more familiar process, and GitOps would be an extension of that process.
Another benefit is that most teams already have a process of approvals, code checks, feature branching, etc. that can be leveraged with GitOps as well.
The last benefit from the walkthrough is the confidence that the user can get by knowing that what they put in the git repo is active in the cluster. However, this is where the drawbacks start to come into play.
Drawbacks of GitOps in Practice
One of the most obvious drawbacks with GitOps is that the git repo is the source of truth. Therefore, if a misconfiguration happens in the git repo, the deployment will still go through, but the error could be rather devastating:
The misconfigured image leads to deployment issue in cluster
Imagine if the misconfiguration was with the name of the deployment or replica count? A whole cluster could be taken down. And, to make matters worse, what would the solution be when live traffic is impacting the cluster?
You can’t rollback at this point, considering all of the potential issues with other dependant resources in the cluster. It would almost be better to spin up a brand new cluster and blow the problematic one away.
What if there was an issue with the YAML looking correct, but Kubernetes doesn’t recognize it? You could have a deployment that should be taking place, but it doesn’t actually go through, and then there are failures without a feedback loop (Kubernetes is not the best when it comes to error messages).
Another major issue is the rollback process. What is the source of truth for the rollback? Is it the git repo? The cluster? A person? For example:
The name of the deployment has now changed.
The new deployment shows up under the wrong name and we want to rollback. The code is reverted to the old name and saved again with the following outcome:
Reverted the deployment name
New deployment goes into effect and the old deployment is still there
If the users were under the impression that the rollback would actually delete the deployment and its resources, they would be in for a rude awakening when they actually looked at the cluster.
Another glaring issue is that every step related to a deployment process (testing, approvals, change requests, etc.) is not able to be a part of this automated process (you can’t add a testing portion to your Kubernetes Manifest without getting super-hacky).
Therefore, GitOps now requires those in charge of the deployment process to leverage the GitOps process simply as the deployer, essentially removing the requirement to log into the cluster and run the kubectl commands themselves. Then, as each additional portion of the deployment process comes into scope, everything becomes extremely manual again.
The last and the most important of the drawbacks of GitOps are the pre-requisites. The cluster MUST be locked down in order to funnel all potential changes through the git repo and avoid configuration drift from negatively affecting the process.
Additionally, everything must already exist for GitOps to be leveraged. Those who want to implement GitOps are required to have a fully functioning git repo, fully tested Kubernetes manifests, well-tuned Kubernetes clusters.
A user or set of users that are Kubernetes experts to troubleshoot and maintain the Kubernetes process outside of the GitOps scope (you can’t really test how new deployment pipelines would work by using GitOps processes.
You have to test them externally first and then import them into the process once they are vetted).
Summary of GitOps in Practice
GitOps has some very good practices that should be adhered to; storing the codified versions of the application, infrastructure, and application configuration in a git repo for collaboration and version control.
However, the git repo acting as the source of truth for the Kubernetes deployments is not the best scenario for those who are wanting to have reliable processes across production and the whole enterprise.
Verified GitOps can be the optimal process for every company, and the landscape of solutions enables that company to truly leverage Verified GitOps.
Above and Beyond GitOps
When we originally discussed GitOps, we started talking about how different concepts start at the grassroots level and then grow to the point where the original concept is forgotten and marketing teams use the terminology to drive traffic to their website. We want to try and break the cycle, start a concept outside of the grassroots process, and get the original purpose and concept into the world before it can be diluted!
It should be common sense that not every company in the world is solely using Kubernetes, which means that a Kubernetes-only solution is not very helpful when it comes to non-Kubernetes pieces.
Additionally, there are no companies that are okay with their deployments to Production being fully-automated when the git repo is updated. Rather, everyone has some form of testing, check, approval, verification, etc that is a part of the process.
And for good reason! Can you imagine someone pushing a replica count of 20 instead of 2 and it gets merged and deployed automatically?
What the Verified GitOps practice intends to do is codify every piece and interaction of the process. In the case of CI/CD, it is not just that the code itself lives in the git repo, but the process of getting the code to the repo, the code checks that will happen on the git repo, the approvals, the testing, the building of the artifact, the ticketing and change management, and the list goes on.
Breaking Down Verified GitOps
Let’s quickly breakdown what Verified GitOps looks like:
It probably seems more like what the typical process is already, right? That is because the goal of Verified GitOps is not to introduce a simple deployment tool, but rather to create a codified, repeatable process from start to finish, which allows for scalability and security as everything should be automated (or close to it).
More Benefits to Verified GitOps
There are some obvious benefits to Verified GitOps:
- Leverage tools that everyone in the process is already familiar with.
- The process extends to much more than just Kubernetes and deployments
- Codification each part of the process allows for easy scalability and reliability
- Security can now have better control, input, and auditability over all of the process
Yet, there are some not-so-obvious advantages of Verified GitOps:
- Each portion can be placed under RBAC
- Onboarding of new features, functionality, and people becomes extremely easy
- If implemented correctly, additions/removals/exchanges of parts of the process will not have any significant ripple effect on the rest of the process
- Opinionation and Confgurability can be harmonized depending on how the company wants to implement the process
More Issues with Verified GitOps
As great as the process sounds, there are some drawbacks:
- Not everyone can implement it right away
- How many different processes or variations of processes exist in Engineering?
- How much of the process will be enforced and how many variations will be allowed?
- Who will maintain the process?
- There are currently no tools in the market that can do full Verified GitOps. So what do you do?
- Do you combine tools?
- Do you buy and build?
- Do you just build?
Verified GitOps Design
Now that we have discussed some of the benefits and issues with Verified GitOps, there are some major decisions before getting Verified GitOps implemented: How will the configuration be designed?
The first major decision for the configuration design is the language style in which the design will be constructed. A declarative style of language would be the best choice, but which markup language should be used? YAML? JSON? XML?
It is important to decide based on the current usage of markup languages in the organization, but also on where the company will inevitably want to go. Don’t pick XML because that’s where every engineer in the company has lived for the past 10 years.
Rather, pick something that is both readable and easy to put together (which still might be XML for your purposes). Keep in mind that whatever choice is made, there will be training requirements; this includes the intermittent user, the daily user, the advanced user, and the maintainers of the process.
The Approach to Verified GitOps
The next major decision is whether you approach the monofile approach, the file-reference approach, or an attempt at a hybrid.
- As can be assumed from the name, this approach combines all of the declarative language for the whole process in one long file (hence, monofile). If you look back at the diagram above, imagine that each step, variable, configuration requirement, etc. are laid out sequentially in a single file.
- It is very easy to read through the process and understand what is happening
- Onboarding new people is simple since they only need to read through the file to get an understanding of what does and does not happen
- No extra orchestration files are needed
- It can get very long, very fast. Depending on how many pieces make up your Verified GitOps process, you could have hundreds of lines of declarative language to parse through.
- Any variation that needs to be supported leads to another monofile, regardless of how big or small the variation is.
- This would be the opposite of the Monofile Approach, where each piece or part of the process is contained in its own file and then other files will reference those files.
- Significantly smaller configuration files are needed
- Easy to read and comprehend
- Changes and variations in different parts of the process are easily handled through making copies of the individual files without major ripple effects throughout the rest of the process
- Can become very intricate, requiring multiple files to be examined to understand how the flow of information should be handled
- Orchestration files are required to piece together the different parts of the process
- Harder to onboard people if they are required to navigate the file structure
- This method is a combination of both the file-reference and monofile approach. The actual design of this approach is up to the end users, but the recommendation would be that the majority would be monofile. Additionally, with the pieces that would be referenced, the main part of the reference in the monofile would be the variables that need to be overwritten (if any).
- Less files to parse through than the file-reference approach
- No orchestration files are required
- Small changes and variations are less impactful than with the Monofile approach
- There are still file references that can lead to a rabbit-hole of clicking in order to get context
- The main file can become very long still, depending on how the hybrid approach is implemented
- A UI will be required to make sense of this set-up. It is not as readable as the monofile approach is.
Summary of Verified GitOps
The ability to codify every portion of the CI/CD process is essential to future-proofing the business. Repeatable and scalable configurations without the need for tribal knowledge are not just helpful, it is one of the main reasons why technologies like Kubernetes or Terraform are so popular and heavily used!
In contrast to GitOps, Verified GitOps extends beyond just Kubernetes and allows for the engineering teams to truly understand all of the intricacies of their current process. Even with the drawbacks mentioned above, Verified GitOps is the best way forward for any company that has a CI/CD practice.