This article on pipeline performance was written in collaboration by Naidu Annepu, Prashant Pal, and Sahil Hindwani.
Harness is growing quickly in the DevOps space, and with higher growth comes the need for better scalability and performance. To meet the needs of our rapidly-expanding customer base, we decided to run an activity around the performance of our pipeline execution. We started performing load tests on our services and made changes on the fly to improve pipeline performance further. The main goal of this activity was to make our system more performant and scalable.
With these optimizations, we:
Curious how we achieved this? Read along to find out more about our approach and learnings.
Before starting our performance experiment, we gathered some data around how much scale we should aim for. We calculated the average number of builds we generate, the number of PR checks we run, the number of deployments we do, and any interaction we have with our tool. With this, we could get an estimate of the scale for a mid-size organization. We extrapolated the above information and tested our platform scale to 100x the current load.
We were aiming for the following scale:
Once our scale estimation exercise was over, we decided to put some load on our service. We did some analysis around some load testing tools and decided to use locust.
Adding stress to our system is meaningless unless we have monitoring in place. For this, we used our Continuous Verification module to figure out how services behaved with load. We also used Opencensus to publish some metrics on GCP to help us understand where we spend a lot of time and where we can optimize.
We performed numerous activities with locust wherein we were putting load on our system again and again. Every time we performed a new activity, we made a few optimizations to our services.
Some screenshots from our monitoring dashboards are below.
First run of the experiment:
Final run of the experiment:
At Harness, our executions are completely event driven. At the start of this activity, our event framework was built upon legacy mongo queues. We inherited legacy Mongo queues and modified them according to our use case by building a wrapper and framework around it.
Though Mongo queues functioned well in the past, for our use case, we faced certain limitations in performance because of it. Those were:
With the above limitations, Mongo wasn’t a good fit for our use case, so we realised that we need to evaluate more queuing systems like Kafka, Redis Streams, etc. We evaluated Kafka and Redis Streams, but decided on using Redis Streams as it fit our use case perfectly and we were already using it for different things.
We have always heard that nobody is perfect, and so was Redis in our case. Redis provides many things out of the box, but there were some issues we faced while migrating to it.
We faced the above issues, but could migrate it successfully. Now, we decided to run our experiment against Redis. To our surprise, we did observe a huge performance gain.
A few gotchas we got when using Redis:
Here's a nifty graph on performance pre and post optimization. As you can see, the results are impressive!
We’re thrilled with the results we received after optimizing for pipeline performance. It’s amazing what load testing and Redis can do! Are you looking for a performant CI/CD solution? Take Harness out for a spin today.
Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.