🚀 Scaling Pipeline Execution Search: Why We Moved Beyond MongoDB

All this author’s posts

🚀 Scaling Pipeline Execution Search: Why We Moved Beyond MongoDB

Table of contents

At Harness, we prioritize performance and user experience in everything we build. One area that needed a major upgrade was the pipeline executions search experience — a core capability our users depend on to track deployments, triage failures, and analyze patterns.

Imagine searching through millions of deployment records and waiting minutes for results. That was the reality we faced at Harness.
Initially powered by MongoDB, the search system struggled as data volumes grew. Complex filters, regex queries, and aggregations all began to slow down. So, we decided to evolve.

This post walks through:

✅ Why MongoDB was no longer a good fit for search
✅ What we need in a modern search engine
✅ How Elasticsearch solved our problems
✅ The architecture and indexing strategy we implemented
✅ Real performance gains after the migration

🛠️ What Are Pipelines and Executions at Harness?

At Harness, a pipeline defines a sequence of automated steps to build, test, deploy, and manage software across environments. Think CI/CD pipelines, but with built-in governance, automatic rollback, and advanced deployment strategies like canary, blue/green, and more — all integrated out of the box. And that’s just the start — Harness pipelines come packed with many more features to simplify and scale your delivery workflows.

Each execution represents a single run of a pipeline. It carries metadata like:

Trigger type (manual, webhook, cron scheduler)
Who triggered it and when
Status (running, success, failure, etc.)
Deployment targets (environment, infrastructure, services)
Custom tags and runtime parameters

Teams may run millions of executions per month across thousands of pipelines. Searching through this at scale is critical for debugging, audit trails, and analytics.

👉 For more details, refer to the Pipeline Execution History documentation.

Harness Execution List View:Displays real-time metadata for each pipeline execution, including pipeline name, execution status, triggering user, environment, and timestamps. This interface enables engineers to monitor, filter, and debug deployment workflows efficiently.

📖 Background

In our original setup, all pipeline execution metadata was stored in MongoDB. A typical execution document contained fields of various types — for example:

Map<String,CustomObject>
List<CustomObject>

Each of these fields included multiple levels of nested, user-defined objects to capture all relevant metadata. Over time, this led to execution documents becoming large and deeply structured, which posed serious challenges as we scaled.

Performing searches over these documents became increasingly difficult for a few key reasons:

🔍 Deep nesting made indexing inefficient
Many queries relied on deeply nested fields, requiring complex compound indexes that consumed significant memory and were difficult to manage.
🧩 Impossible to flatten effectively
Flattening the document structure would’ve either broken the logical relationships between fields or introduced redundant data — neither of which was acceptable for our use case.

Despite these challenges, users expected to filter executions based on:

Trigger type or who triggered it
Status, environment, infrastructure, etc.
Custom metadata tags
Full-text search on pipeline name or identifier

Meeting these expectations pushed MongoDB past its comfort zone when it came to query performance, index management, and search flexibility.

😬 The Problem with MongoDB for Search

In just six months, we had indexed over 9 million pipeline executions (~90 GB). As our user base expanded, the insertion rate climbed to 4.5 million records per month. While MongoDB’s document-oriented model offered flexibility, it began to show signs of strain under this scale.

❌ Case-insensitive and Regex Filtering

Users often filter pipeline executions by:

Partial artifact names or tags
Environment types
Triggers like “ACL”, “acl”, or “ACL_Prod”

In MongoDB, such queries need:

regex or $text searches
Case-insensitive flags like i

Issue: This meant that MongoDB had to search through the entire dataset, leading to slow query times. These operations bypass indexes → collection scans → slow results.

❌ Complex Aggregations and Filtering

Listing executions involved:

Filtering by multiple fields
Application-level joins

MongoDB supports aggregations, but:

Joins were expensive
Aggregation of large collections became slow

❌ Index Size and Vertical Limits

With MongoDB Atlas:

Replica sets scale vertically (up to 4 TB storage)
Our indexes were already large (especially on full-text fields)
Next step: sharding, which comes with added operational complexity

📌 What We Needed

We needed something more purpose-built:

Type caption for embed (optional)

🔍 Why Elasticsearch?

Elasticsearch is a search engine built on Lucene, optimized for full-text queries, filtering, and analytics at scale.

✅ Full-text and Fuzzy Search

Inverted indexing for fast term lookups
Support for fuzzy, wildcard, regex, and match_phrase_prefix
Native tokenization, stemming, and case normalization

✅ Aggregation Performance

Aggregations use columnar storage (doc_values)
Extremely efficient for grouping, histograms, etc.
Handles high-cardinality fields well

✅ Index Lifecycle Management (ILM)

Rollover indices based on age or size
Move data from hot → warm → cold → delete
Define retention windows per index pattern

🤔 Why Not Atlas Search?

Since our existing system was built on MongoDB, we also evaluated MongoDB Atlas Search before choosing Elasticsearch. Atlas Search is a full-text search engine built directly into MongoDB Atlas and powered by Apache Lucene. It removes the need for managing a separate search system, making it an appealing option at first glance.

Here’s a quick breakdown of its pros and cons in our context:

Type caption for embed (optional)

Summary:
While Atlas Search offers a tightly integrated experience with MongoDB, we needed more flexibility and isolation at scale. Elasticsearch gave us better control over indexing, retention, and scaling for read-heavy workloads.

🧠 Our New Architecture: MongoDB + Elasticsearch (CQRS)

To balance writes and reads, we adopted a CQRS (Command Query Responsibility Segregation) pattern.

📌 How it works:

Writes go to MongoDB (source of truth)
Execution events are transformed and indexed into Elasticsearch
Reads/searches happen from Elasticsearch only

Architecture(Writes)

Pipeline Execution Flow:This diagram illustrates the end-to-end process for managing pipeline executions. It starts with saving and updating execution data in MongoDB, followed by transforming the data for efficient indexing in Elasticsearch. Key steps include marking the execution state in the RUNNING index, updating MongoDB, and finalizing the execution in the COMPLETED index to optimize performance and search efficiency.

Architecture(Reads)

Optimized Execution Listing Flow in Pipeline-Service This diagram illustrates the new execution retrieval strategy where filtering and sorting are delegated to Elasticsearch for performance, and full execution data is fetched from MongoDB based on the ExecutionIDs returned. The final result is ordered and returned to the user, ensuring both speed and data completeness.

🗂 Indexing Strategy: Running + Completed Executions

We split indexes into two types:

Type caption for embed (optional)

🔄 ILM Policy (Index Lifecycle Management)

To keep our Elasticsearch storage efficient and cost-effective as data grows, we use Index Lifecycle Management (ILM). ILM allows us to automatically move data through different phases (called tiers) based on its age and index size. This ensures frequently accessed data stays fast, while older data is kept on cheaper, slower storage — or eventually deleted.

Here’s how our ILM policy is structured:

Type caption for embed (optional)

📝 Note: We retain data in each tier for +1 month longer than usual to account for discrepancies between the index creation timestamp and the actual data age. This avoids premature rollovers that could impact query behavior or retention guarantees.

🧭 Performance Optimization: Shard Routing by Account

To further optimize performance and reduce query latency, we implemented custom shard routing using the accountId as the routing key. In Elasticsearch, routing helps determine which shard a document (or query) should be sent to. By default, Elasticsearch distributes data randomly across shards, but with custom routing, we can control this distribution.

Why accountId?
Most queries in our system are scoped to a specific account. By routing documents and queries using accountId, we ensure that:

✅ Queries are routed to a single shard instead of all shards — reducing overhead
🚀 Read performance improves significantly, especially for large datasets
📉 Cluster resource usage goes down, making the system more scalable

This optimization significantly reduces search latency — especially for high-volume accounts handling millions of executions.

📈 Results

Before vs After (6-Month Dataset)

Type caption for embed (optional)

⚖️ Tradeoffs

Type caption for embed (optional)

🧵 Final Thoughts

Migrating to Elasticsearch wasn’t just a performance fix — it unlocked new search capabilities, reduced query times by orders of magnitude, and made our architecture more scalable for future growth.

If you’re facing:

Regex queries that timeout
Case-insensitive searches that crawl
Joins and filters that slow dashboards to a halt…

It’s probably time to introduce a purpose-built search layer like Elasticsearch.

Rishabh Gupta

All this author’s posts

He is an enthusiastic Software Engineer who is passionate about coding and continuously improving his technical skills. He has experience across a wide range of technologies, including Golang, Python, C++, Java, Flask, HTML, and CSS. Achievements: Built the entire control plane for an eBPF-based Layer 4 load balancer deployed across multiple private cloud data centers — an effort recognized with a Bravo Award. Received the Tech>FWD Award for contributions to a mission-critical queueing system that powered Walmart’s Black Friday sales.

🚀 Scaling Pipeline Execution Search: Why We Moved Beyond MongoDB

🚀 Scaling Pipeline Execution Search: Why We Moved Beyond MongoDB

🛠️ What Are Pipelines and Executions at Harness?

📖 Background