At Harness, our complete codebase is in a single GitHub repo. It comprises around 5 million lines of code with 100+ interdependent modules. We started observing an increase in build times every week and our existing build tool (Maven) was unable to scale with our growing needs. In this blog post, we'll go over our challenges with Maven, why we decided on Bazel, how we migrated, and what results we saw.
Challenges With Maven
Lack of incremental build support— When we make a small change in one of the hundreds of modules, Maven often requires clean builds. As a fast-growing organization, we include new modules at a good pace. With the increase in code, the time taken for each build was adding up significantly.
Local development issues — For local development, branch switching and then building the whole project again was a pain point. The developers had to wait for around 20-25 minutes for the project to get synced up.
Time taken by the unit-tests jobs — The maximum time in Continuous Integration is the time taken by the unit-tests jobs. When a developer makes a small change, ideally it should run only dependent unit tests. But Maven executes all unit tests, no matter what is the change in code. For example: if a developer updates the README file, there is no need to run any unit tests. But developers had to wait for all the tests to pass regardless. We were running our unit tests in three batches: unit-tests-0, unit-tests-1, and unit-tests-2. With the increase in code and unit tests, the time taken by the unit-tests jobs was adding up.
At this juncture, we realized that we needed to look into an alternate option that could serve our future needs as well.
Bazel is an open-source build tool developed by Google in 2015. It is used for building and testing software. We chose Bazel as our build system for the following properties:
Fast:Bazel is fast. By analyzing the dependency graph, Bazel knows exactly what needs to be rebuilt. Bazel caches all previously-done work and rebuilds only what is needed. Also, Bazel can build our projects in parallel. For example, if we have three modules named A, B, and C where A depends on B and B depends on C, if you’re making changes in module B, Bazel will build only modules A and B - not C.
Correct/reproducible builds:If you build the same code with the same arguments, it will always output the same builds.
Less intermittent test failure: Bazel runs tests in the sandbox. There are very few chances of collision, which results in less possibility of intermittent test failure.
Proof of Concept (POC)
To confirm, we took two modules from Maven - modules A and B, where B depends on A (B->A) - and tested Bazel changes there. We also created a bucket on GCP for cache. Here are our findings:
Running both modules for the first time — We observed that both module tests ran without any cache.
Making changes in module A — Both modules were built and all tests ran.
Making changes in module B — In this case, module A was cached, and only module B was built. Tests for module B ran, while tests for module A were cached. Time taken for the whole process was less compared to the above two use cases.
No changes in any module — No module was built. Modules A and B were both cached. Tests for both modules were also cached. Time taken was minimum among all the above cases. To test the cache, we also added a sleep statement in one of the tests for module B. The test run without cache was close to 30 seconds, and with cache, it took less than a second. We knew we have some modules that take a long time to build and run tests. By having it cached, we saved a lot of time. Hence, we decided to move to Bazel from Maven.
One of the challenges we faced while migrating was we had more than 100 modules to migrate. It was impossible to migrate all these modules in one go. So, we decided to adopt a hybrid approach. Here, “hybrid approach” means that we used both Maven and Bazel. The modules converted to Bazel were built through Bazel, while the rest were built through Maven. We will cover this in the latter part of this blog.
Here, we are going to discuss how we can migrate a simple module from Maven to Bazel.
We can compare this with the central pom.xml file in Maven. In Bazel, this file is at the root level of the project where we define our external dependencies, similar to how we do in Maven’s pom.xml.
We use rules_jvm_external for external Maven dependencies. This is an external library that fetches these dependencies transitively.
The pattern followed for adding dependency is: groupId:artifactId:version
In the below image, first, we load rules_jvm_external. Then, we use maven_install to fetch external Maven dependencies.
This file can be created at module-level as well as file-level. Since our codebase contains a lot of files, we decided to adopt module-level Bazel migration, which means we created the BUILD.bazel file at the module level.
BUILD.bazel can be defined as follows:
java_library:This is a java_rule in Bazel that compiles a set of Java source files and creates a jar.
name: Unique name for this target. Will be used to refer to this target while building.
@maven:Here, maven is the name of the target we defined in WORKSPACE. The dependency is referenced by combining groupId followed by artifactId separated by _ . We need to replace each . with _ in groupId as well as in artifactId.
srcs:A set of Java source files which we want to include in this target and build together.
deps:all the dependencies of srcs are defined under deps. It can have external dependencies and other targets from the project as well.
After creating the BUILD.bazel file, we can run the below command to build this target:
bazel build path_to_directory:module
Here, path_to_directory is the relative path from project root to Build.bazel file directory.
Obstacles During Migration
Here, we will discuss the challenges we faced while migrating from Maven to Bazel. We will also discuss the solutions to solve the challenges.
We have a large codebase and more than 100 interdependent modules. As such, we could not migrate the whole repository in one go.
As discussed earlier, we needed a strategy that allowed migrating iteratively and that wouldn’t affect current development work much. For this, we came up with an approach that we called the hybrid approach.
Hybrid Approach: We had a requirement that the modules we were migrating to Bazel could only depend on Bazel modules (it can depend on external libraries, but cannot depend on local Maven modules). So, we started our migration with independent/leaf modules.
We created one script and hooked this script in Maven’s pom.xml file. So, when we run a Maven build, this script is executed first and it builds all the Bazel modules and installs the artifacts in the local Maven repository (~/.m2/repository in our case).
Then, Maven continues building the Maven modules, which can depend on Bazel modules. Maven treats the Bazel modules as external dependencies. Since the Bazel module’s artifacts are already installed in the local Maven repo, Maven simply does the build without caring if those dependencies are external libraries or Bazel-built artifacts.
Bazel does not have any built-in rules for running a set of unit tests in one target. A developer has to write one target for each unit-tests file. Considering our codebase, we would have to write a large number of unit-tests targets. That would be repetitive work and would make our build files very big and non-maintainable.
To overcome the above issue, we wrote a macro that initializes java_test rules for each unit-tests file when we run Bazel actions. This made our BUILD.bazel files smaller, cleaner, and easier to maintain. If we want to change an argument for all unit tests, we can simply do that change in macro and it will be reflected for all test targets.
In the above macro, we have runtime_deps = [“tests”]. Here “tests“ is a java_library target that should contain all dependencies for running the unit tests of that module.
Here’s the command to run the Bazel test: bazel test //relative_path_to_module_directory:fqn_of_test_class
Fixing Unit Tests
One of the biggest challenges was to fix the tests after migrating the module from Maven to Bazel. 90% of the failed tests were failing because of the path issue. Bazel runs tests in its own private sandbox - hence when your test is dependent on your resource file, it fails because your file might not be present in the sandbox.
In every resource folder, we created a BUILD.bazel file. The test target will depend on this target if any of the tests require any resource file.
Bazel does not have built-in support for checkstyle, so we needed to manually integrate it and other static checks. First, we created java_binary using checkstyle external jar and our custom checkstyle rules.
Then, we created a genrule in Bazel that takes the checkstyle binary and forms a shell command to execute the binary after applying checkstyle arguments.
Now, we can invoke this genrule from a module-level build file to run checkstyle on all files in that module.
Similarly, pmd checks can also be integrated onto Bazel.
Our Experience After Migrating from Maven to Bazel
After completely migrating from Maven to Bazel, we faced both pros and cons. The challenges with Maven were covered above with the Bazel migration.
There were some new problems, however, which we faced after the Bazel migration. We will discuss both advantages and disadvantages below.
Unit-tests jobs: There was a significant improvement in running the test jobs. Now, the time taken by the unit-tests jobs depends on the module in which the developer is making the change. As of July 21st, we boast the following time records for the unit-tests jobs:
Local devs working across branches: When we work on multiple branches, and if we switch branches and do a sync, then Bazel doesn’t compile/sync the whole repo again if we have already synched it before. Bazel takes the previously cached results and compiles only those which are required. In this way, it becomes easy for developers to work on multiple branches simultaneously.
Less intermittent test failure:Bazel runs tests in the sandbox, so there are very few chances of collision. This results in less possibility of intermittent test failure.
Less support for IntelliJ Bazel plugin: TheBazel plugin for IntelliJ has less feature support when compared with the Maven plugin. Maven is mature enough to add support for IntelliJ, while Bazel is a very young technology and it will take some time to reach that level of support.
First project sync takes more time:Bazel works on high granularity, so it has a large number of actions to perform. It generates the build graph, which helps in determining what to rebuild after a change. All these actions consume some time and make the first/clean build slow in Bazel. But, incremental/no-op builds are much faster in Bazel, which makes our overall builds faster.
We are thrilled with the results we got after migrating from Maven to Bazel. Migration has helped us improve developer productivity, which has helped us boost our build and test time significantly.
This article was written in collaboration by Prashant Sharma and Brijesh Dhakar.
Prashant Sharma is a Software Backend Engineer at Harness. He was a core member of the team charged with implementing and migrating to Bazel, and takes part in building new iterations of Harness pipelines.
Brijesh Dhakar is a Software Engineer with a passion for technology. He works at Harness, building the premiere software delivery platform to solve industry-wide problems.