Chapters
Try It For Free
December 31, 2024

Scaling Bazel builds @Harness

Table of Contents
Scaling Bazel builds @Harness

Scaling Bazel builds @Harness

Authors: Gaurav Nanda, Udham Singh



 

Table of contents


 


       

     

At Harness, we believe that developer productivity is one of the key pillars of software development. We are always actively looking for improvements to ensure developers at Harness have a great experience.

To the same end, last quarter, we identified a key area of improvement — improving build times. Historically, our repository, harness-core had fat Bazel modules, creating one BUILD per top-level directory. This approach has many shortcomings, one of them being slower incremental build times. To fix this issue, we are moving toward Bazel’s recommendation, of creating one BUILD per directory.

As the number of build files is increasing, we have encountered many scaling issues. In this post, we would like to go over those issues, and how we debugged and fixed those.

Blog image

“Class not found”

Issue

One of the first issues, we started to run into was the “Class not found”, while trying to run applications.

lua

Could not

/* Syntax highlighting for code blocks */ .hljs-keyword { color: #569cd6; font-weight: bold; } .hljs-string { color: #ce9178; } .hljs-number { color: #b5cea8; } .hljs-comment { color: #6a9955; font-style: italic; } .hljs-function { color: #dcdcaa; } .hljs-variable { color: #9cdcfe; } .hljs-type { color: #4ec9b0; } .hljs-built_in { color: #4fc1ff; } .hljs-operator { color: #d4d4d4; } .hljs-punctuation { color: #d4d4d4; } .hljs-attr { color: #92c5f8; } .hljs-property { color: #9cdcfe; } .hljs-title { color: #dcdcaa; } .hljs-class { color: #4ec9b0; } .hljs-meta { color: #569cd6; } .hljs-literal { color: #569cd6; } .hljs-symbol { color: #ce9178; } .hljs-regexp { color: #d16969; } .hljs-link { color: #3794ff; text-decoration: underline; } .hljs-selector-tag { color: #569cd6; } .hljs-selector-id { color: #ffd700; } .hljs-selector-class { color: #d7ba7d; } .hljs-addition { color: #4fc1ff; background-color: rgba(79, 193, 255, 0.1); } .hljs-deletion { color: #f85149; background-color: rgba(248, 81, 73, 0.1); }

This started to happen a couple of months after we started refactoring to create smaller build targets. Eventually, we reached a point where adding new dependencies lead to a build failure!

Debugging

The error message suggested that the class was missing from the classpath. We noted that this error was happening only when our classpath limit would cross 120K characters.

Solution

We did a quick workaround here to specify a higher CLASSPATH_LIMIT of 400K. The underlying reason was still not clear but got fixed when we faced the next issue discussed below.

“Argument list too long”

Issue

Soon after the classpath limit fix, we started to run into the “Argument list too long” error.

bash

Executing tests from //batch-processing/service:io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest ----------------------------------------------------------------------------- /tmp/sandbox/processwrapper-sandbox/5627/execroot/harness_monorepo/bazel-out/k8-fastbuild/bin/batch-processing/service/io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest.runfiles/harness_monorepo/batch-processing/service/io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest: line 366: /tmp/sandbox/processwrapper-sandbox/5627/execroot/harness_monorepo/bazel-out/k8-fastbuild/bin/batch-processing/service/io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest.runfiles/local_jdk/bin/java: Argument list too long /tmp/sandbox/processwrapper-sandbox/5627/execroot/harness_monorepo/bazel-out/k8-fastbuild/bin/batch-processing/service/io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest.runfiles/harness_monorepo/batch-processing/service/io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest: line 366: /tmp/sandbox/processwrapper-sandbox/5627/execroot/harness_monorepo/bazel-out/k8-fastbuild/bin/batch-processing/service/io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest.runfiles/local_jdk/bin/java: Success [root@harnessci-tin0-xvss4fzd io.harness.batch.processing.config.k8s.recommendation.WorkloadCostServiceTest]# grep ARG_MAX /usr/include/linux/limits.h

/* Syntax highlighting for code blocks */ .hljs-keyword { color: #569cd6; font-weight: bold; } .hljs-string { color: #ce9178; } .hljs-number { color: #b5cea8; } .hljs-comment { color: #6a9955; font-style: italic; } .hljs-function { color: #dcdcaa; } .hljs-variable { color: #9cdcfe; } .hljs-type { color: #4ec9b0; } .hljs-built_in { color: #4fc1ff; } .hljs-operator { color: #d4d4d4; } .hljs-punctuation { color: #d4d4d4; } .hljs-attr { color: #92c5f8; } .hljs-property { color: #9cdcfe; } .hljs-title { color: #dcdcaa; } .hljs-class { color: #4ec9b0; } .hljs-meta { color: #569cd6; } .hljs-literal { color: #569cd6; } .hljs-symbol { color: #ce9178; } .hljs-regexp { color: #d16969; } .hljs-link { color: #3794ff; text-decoration: underline; } .hljs-selector-tag { color: #569cd6; } .hljs-selector-id { color: #ffd700; } .hljs-selector-class { color: #d7ba7d; } .hljs-addition { color: #4fc1ff; background-color: rgba(79, 193, 255, 0.1); } .hljs-deletion { color: #f85149; background-color: rgba(248, 81, 73, 0.1); }

Debugging

This error suggested that we were crossing the Linux’s limit of arguments length longer than 128K. This meant our original fix to increase CLASSPATH_LIMIT beyond 128K, was not a great idea.

On digging deeper into the Bazel codebase, we noted that Bazel seemed to already handle the long arguments issue in java_stub_template.txt. As per the following snippet, if the java argument’s length crosses 120K (8K for windows), Bazel should wrap all classpath arguments inside the Jar file using the “class-path” header.

bash

# If the user didn't specify a - classpath_limit, use the default value.

/* Syntax highlighting for code blocks */ .hljs-keyword { color: #569cd6; font-weight: bold; } .hljs-string { color: #ce9178; } .hljs-number { color: #b5cea8; } .hljs-comment { color: #6a9955; font-style: italic; } .hljs-function { color: #dcdcaa; } .hljs-variable { color: #9cdcfe; } .hljs-type { color: #4ec9b0; } .hljs-built_in { color: #4fc1ff; } .hljs-operator { color: #d4d4d4; } .hljs-punctuation { color: #d4d4d4; } .hljs-attr { color: #92c5f8; } .hljs-property { color: #9cdcfe; } .hljs-title { color: #dcdcaa; } .hljs-class { color: #4ec9b0; } .hljs-meta { color: #569cd6; } .hljs-literal { color: #569cd6; } .hljs-symbol { color: #ce9178; } .hljs-regexp { color: #d16969; } .hljs-link { color: #3794ff; text-decoration: underline; } .hljs-selector-tag { color: #569cd6; } .hljs-selector-id { color: #ffd700; } .hljs-selector-class { color: #d7ba7d; } .hljs-addition { color: #4fc1ff; background-color: rgba(79, 193, 255, 0.1); } .hljs-deletion { color: #f85149; background-color: rgba(248, 81, 73, 0.1); }

However, this was not working for us somehow. A little more research suggested that Bazel had a bug in its earlier implementation, which got fixed in 5.x.x versions.

Solution

Therefore, to address this issue, we upgraded our Bazel version to 5.0.0and that took care of the underlying problem. We also reverted our CLASSPATH_LIMIT change, so we never go beyond the operating system’s argument limits.

“Too many open files”

Issue

Recently, many developers started to complain about Bazel build failures with “Too many open files” errors on their Mac machines.

Debugging

MacOS has a limitation on the total number of open file descriptors (these limits can be verified by running ulimit -n command). While running “bazel build” on the harness-core monorepo, Bazel’s main process and its worker threads were trying to open more than the limit, resulting in the termination of the build with the error “Too many open files”.

  • The first thing we tried was to increase the OS file limit using the ulimit command. This did not work as the latest versions use launchctl¹ for setting limits on maximum files open². To our surprise, even after using launchctl, we did not see any change in behavior and the builds were still failing beyond ~10K open file descriptors.
  • We also learned that JVM sets its own file descriptor limit and to ignore those limits, we can pass -XX: -MaxFDLimit argument³. However, there is no option in the java build target to accept JVM arguments.
  • Defining this flag “build — jvmopt=’-XX:-MaxFDLimit’” in bazelrc files did not work for us either and worker threads were still crashing.

Solution

  • We discovered that the child java processes in Java accept their jvm arguments from the default java toolchain directly.
  • Hence, we ended up extending the default Java toolchain and added “-XX: -MaxFDLimit” to the jvm_opts. This ensured we used the system limits, rather than the JVM-defined limits. Here is the PR for reference.

perl

load( "@bazel_tools//tools/jdk:default_java_toolchain.bzl"

/* Syntax highlighting for code blocks */ .hljs-keyword { color: #569cd6; font-weight: bold; } .hljs-string { color: #ce9178; } .hljs-number { color: #b5cea8; } .hljs-comment { color: #6a9955; font-style: italic; } .hljs-function { color: #dcdcaa; } .hljs-variable { color: #9cdcfe; } .hljs-type { color: #4ec9b0; } .hljs-built_in { color: #4fc1ff; } .hljs-operator { color: #d4d4d4; } .hljs-punctuation { color: #d4d4d4; } .hljs-attr { color: #92c5f8; } .hljs-property { color: #9cdcfe; } .hljs-title { color: #dcdcaa; } .hljs-class { color: #4ec9b0; } .hljs-meta { color: #569cd6; } .hljs-literal { color: #569cd6; } .hljs-symbol { color: #ce9178; } .hljs-regexp { color: #d16969; } .hljs-link { color: #3794ff; text-decoration: underline; } .hljs-selector-tag { color: #569cd6; } .hljs-selector-id { color: #ffd700; } .hljs-selector-class { color: #d7ba7d; } .hljs-addition { color: #4fc1ff; background-color: rgba(79, 193, 255, 0.1); } .hljs-deletion { color: #f85149; background-color: rgba(248, 81, 73, 0.1); }

If you are also excited about the developer productivity domain and would like to contribute to solving such interesting problems and making an impact, feel free to take a look at Harness’ career page.

How to change default ulimit values in Mac OS X 10.6?
Thanks for contributing an answer to Super User! Please be sure to answer the question. Provide details and share your…superuser.com

Maximum limits (in macOS file descriptors)
Operating systems (Linux and macOS included) have settings which limit the number of files and processes that are…wilsonmar.github.io

Add `-XX:-MaxFDLimit` to builder binaries to allow it use system open files limit by arunkumar9t2 ·…
When compiling large modules we noticed KotlinKapt would fail with Too many open files on MacOS. We increased the shell…github.com

Gaurav Nanda

Gaurav Nanda is a platform and infrastructure engineer focused on building secure, scalable systems at Databricks. He works at the intersection of networking, developer experience, and distributed systems — simplifying complex infrastructure into intuitive workflows. Passionate about platform evolution, he shares insights on scaling engineering teams through better abstractions, tooling, and culture.

Next-generation CI/CD For Dummies

Stop struggling with tools—master modern CI/CD and turn deployment headaches into smooth, automated workflows.

Read the ebook

Similar Blogs

No items found.
Continuous Integration