Product
|
Cloud costs
|
released
March 7, 2024
|
3
min read
|

From dbt to SQLMesh

Updated

From dbt to SQLMesh: A Deep Dive into Data Transformation Evolution

The advent of dbt marked a pivotal moment in the data engineering world, not so much for its revolutionary approach to SQL templating — which, frankly, had been around in various forms for years — but for its comprehensive packaging and the vibrant community it fostered. dbt’s success lay in its ability to encapsulate a set of practices around data transformation in a way that resonated deeply within the burgeoning field of data engineering. Its timing couldn’t have been more fortuitous, aligning perfectly with the maturation of cloud data warehouses that sought to leverage SQL’s power in more dynamic and scalable ways.

What dbt offered was a cohesive framework that standardized data transformation tasks. This standardization was critical in a field experiencing exponential growth and facing a scarcity of seasoned professionals capable of navigating its complexities. dbt’s project structure, CLI runner, Jinja-powered templating, and integration capabilities for testing and CI/CD pipelines provided a solid foundation that many data teams quickly adopted. It offered a semblance of order and consistency in a rapidly evolving landscape, ensuring projects from different organizations bore a familiar structure and workflow.

However, as with any tool that gains widespread adoption, dbt’s limitations began to surface, magnified by its own success and the diverse challenges faced by data teams at scale. The reliance on Jinja for templating, while offering modularity, introduced a layer of complexity that could obfuscate SQL logic and make maintenance challenging. Furthermore, the process of managing state and dependencies in dbt, although manageable on a small scale, became cumbersome as projects grew in complexity and size. These challenges underscored a fundamental issue: dbt’s approach, while innovative, had not fundamentally reimagined the process of data transformation but had instead offered a polished experience atop existing practices.

Enter SQLMesh, a tool that builds upon the foundation laid by dbt but with a fundamentally different philosophy at its core. SQLMesh represents not just an evolution in tooling but a rethinking of how data transformation should be managed, tracked, and executed. At first glance, SQLMesh and dbt share similarities in project structure and functionality, easing the transition for teams considering a switch. Yet, it’s beneath the surface that SQLMesh distinguishes itself, offering a more refined approach to managing data transformations.

SQLMesh treats state management as a core feature rather than an afterthought. This shift in perspective means that SQLMesh operates with an awareness of the current state of your data models and the desired end state, actively managing the path between the two. This approach is akin to how Terraform manages infrastructure, focusing on the desired state and automating the process to achieve it. SQLMesh’s plan/apply paradigm streamlines the development process, reducing the manual oversight required to manage state and dependencies. This not only simplifies the workflow but also minimizes the potential for errors during complex transformations.

Moreover, SQLMesh addresses some of the core criticisms of dbt, particularly around templating, environment management, and incremental models. The tool’s use of SQLGlot for parsing and transpiling SQL allows for more sophisticated macro functionalities without compromising the clarity of the SQL itself. This, coupled with SQLMesh’s robust support for what they have termed Virtual Data Environments, makes it a compelling choice for data teams looking for greater efficiency and reliability in their workflows.

In transitioning from dbt to SQLMesh, my team and I embarked on a journey that was as much about adopting a new tool as it was about embracing a new philosophy towards data transformation. The migration process illuminated the inherent limitations of dbt’s approach, which, despite years of development and a strong community backing, seemed to invest in areas that didn’t fully address the growing complexities of data engineering.

SQLMesh, with its proactive approach to state management and its emphasis on minimizing manual intervention, offered a glimpse into a more streamlined future for data transformations. The tool’s philosophy resonates with the need for a more deterministic, state-aware system that can handle the intricacies of modern data workflows with greater precision and less overhead. A system that will scale that just works the way you want it out of the box. The dynamics are similar to Jenkins vs Harness.

Navigating the Transition to SQLMesh at Harness: An Exploration

Unveiling a New Paradigm in Data Engineering

The migration from dbt to SQLMesh at Harness was more than just a shift in tooling but a paradigmatic transition towards a more streamlined, efficient, and transparent approach to data engineering. This journey unveiled the hidden complexities and inefficiencies embedded in our previous workflows and offered a path to both overcoming these challenges and refining our approach to data transformation. As I step through some of the key points, remember this all stems from years of experience. So I hope some points resonate with others like ourselves on similar journeys.

Phased Migration: A Tale of Careful Planning and Execution

The migration to SQLMesh at Harness was a deliberate and methodically orchestrated process, designed to ensure continuity and minimize disruption. This journey underscored the value of a phased approach, allowing us to transition our workflows to SQLMesh with precision and care. It’s worth prefacing I did NOT choose to “get by” running my existing dbt project on SQLMesh but opted to go much deeper and build out a pure SQLMesh project. This provides a pathway to drop the redundant dependency and avoid pitfalls or shortcomings in the dbt loader based approach. Furthermore we would proceed only if SQLMesh was able to meet and exceed the value of dbt from an operational and functional perspective. So by combining this approach with the risk mitigation outlined below, we really hedged our bets on the best possible outcome.

At the outset, we recognized the importance of maintaining operational integrity. To this end, we devised a strategy that would allow both dbt and SQLMesh projects to coexist, facilitating a seamless transition. This approach not only provided a safety net but also enabled our team to gradually acclimate to SQLMesh’s environment and its paradigms.

Prioritizing Staging Models: A Strategic Move

One of the critical decisions in our phased migration plan was to prioritize the migration of staging models. This decision was guided by several key considerations. Staging models, by their nature, are foundational to our data transformation pipeline, serving as the initial layer where raw data is prepared and standardized for downstream processing. Given their fundamental role, ensuring that these models were successfully migrated and fully operational in SQLMesh was impactful. The models are also traditional quite simple serving more as a dependency inversion layer than a complex web of logic.

The migration of staging models also presented an opportunity to gauge the compatibility of our existing SQL logic with SQLMesh’s framework. To our relief and satisfaction, we found that these models required essentially no adjustments outside unwinding Jinja. This compatibility not only expedited the migration process but also served as a testament to SQLMesh’s flexibility and its capacity to accommodate a wide range of SQL patterns, dialects, and practices. I’d actually take it a step further and say the flexibility of SQLGlot’s parsing which gave us the ability to use shorthand casts :: and lateral column references SELECT 1 as a, a * 2 as b in BigQuery was a massive win.

The smooth transition of staging models was a morale booster for our team. It provided a tangible proof of concept for SQLMesh’s efficacy and set a positive tone for the remainder of the migration process. Moreover, this initial phase allowed us to refine our migration strategy, identifying best practices and optimizations that could be applied in subsequent phases.

Incremental Migration: Ensuring Continuity and Stability

Following the successful migration of staging models, we adopted an incremental approach to migrate the remaining parts of our data pipeline. This approach involved segmenting our models based on dependencies and operational criticality, ensuring that each segment’s migration could be carried out with minimal impact on dependent processes.

This careful segmentation allowed us to maintain a continuous flow of operations, with SQLMesh and dbt models coexisting harmoniously during the transition period. Each phase of the migration provided valuable insights into SQLMesh’s capabilities, allowing us to leverage its features more effectively and streamline our data transformation processes further.

The Unwinding of Jinja and Non-Deterministic Macros

One of the most striking realizations during our migration was the inherent complexity and opacity introduced by Jinja templating and non-deterministic macros. These elements, once considered indispensable for dynamic SQL generation, had inadvertently become sources of unpredictability and technical debt. The extensive use of dbt_utils and macros like star and pivot, which relied on fetching unique values dynamically, epitomized this issue. While such utilities promised convenience and DRY (Don’t Repeat Yourself) principles, they often led to non-deterministic behaviors, inefficiencies, and an over-reliance on the tool’s black-box operations.

SQLMesh’s introduction into our workflow was eye-opening. By leveraging SQLGlot macros, we found a powerful alternative that maintained the flexibility and dynamism we needed without the drawbacks of non-determinism and complexity. This shift was not just about replacing one set of tools with another; it was about embracing a philosophy that prioritized clarity, determinism, and efficiency. The migration process urged us to critically reevaluate the merits of dbt’s touted utilities like dbt_utils.star and any other low-value / database connection-reliant macros. We discovered that the allure of these utilities — reducing keystrokes and supposedly simplifying SQL — was overshadowed by their hidden costs: the maintenance burden, the opacity of the transformations, and the challenges in debugging and optimization.

Streamlining SQL with SQLMesh’s Macros and Constructs

SQLMesh’s macros and constructs, particularly @EACH and @REDUCE, offered us a new lens through which to view SQL transformation. These constructs allowed us to express complex logic in a more structured and readable format, significantly deviating from any ugly Jinja loops that once plagued our (or any dbt users) codebase. Furthermore, the introduction of @DEF for setting macro variables directly within our models was a boon. This feature eliminated the cumbersome Jinja {% set var = … %} expressions; and also brought a level of inline parametrization and clarity that was previously obfuscated by markup. We have been able to adjust & fine tune parameters for certain models in real-time, collaboratively with the domain experts thanks to the pragmatic use of this feature in places that matter; so I highly recommend leveraging it.

The Liberating Elimination of YAML

One underrated, but liberating aspect of our migration was the elimination of YAML configurations. This move not only simplified our project structure but also rectified the dual source of truth problem that frequently led to discrepancies and manual synchronization efforts. With SQLMesh, SQL became the unequivocal source of truth, enhancing our confidence in the integrity and reliability of our data transformations. Trust me, I have tried to keep the YAML in sync with the SQL (I wrote https://github.com/z3z1ma/dbt-osmosis) but it was fighting a symptom, not the cause. If you are going to document your transformations, which I recommend, my learnings are as follows. Do it concisely, and do it inline in one place such that the transformation code is a singular self-contained, self-describing unit. That is ideal and this was straightforward in the new framework. If dbt docs are the primary consideration between the two approaches (SQLMesh does not have static documentation), remember that it doesn’t solve the bigger problem of business users with the context of the source system being able to collaborate with the engineers or analysts writing the code. For that, I’d recommend a proper data catalog such as Datahub or OpenMetadata. Both do significantly more than dbt docs. In lieu of that, the SQLMesh UI offers a neat way to explore your documentation too.

Embracing the Plan-Apply Workflow: A Game-Changer

Adopting SQLMesh’s plan/apply workflow fundamentally changed how we approached model changes. This workflow, akin to Terraform’s infrastructure management approach, imbued our team with the confidence to make bold changes, safe in the knowledge that SQLMesh’s deterministic nature and robust state & environment management would guide us to the desired outcome. The integration of features like table_diff further empowered us, providing clear insights into the impacts of our changes across environments. Having this all in one tool is a superpower.

Figure 1: Breaking vs. non-breaking changes
Change categorization coupled with plan/apply allows fine grained control

CI/CD Simplification and Accelerated Development

The simplification of our CI/CD pipeline was another significant milestone in our migration journey. With SQLMesh, the complexity of our deployment processes was drastically reduced, often to a single command. This efficiency gain was not trivial — it represented a broader shift towards a more agile, responsive, and best practice-aligned workflow. The adoption of SQLMesh’s improved paradigm for incremental models further underscored this shift, offering a robust framework that facilitated faster, more reliable data transformations. By utilizing Harness CI, we automated SQLMesh deployments, seamlessly transitioning code into development, staging, and production environments. This automation, especially in per PR environments and upon merging to the main branch, is incredibly powerful. The synergy between SQLMesh and Harness not only accelerated our development cycles but allowed us to leverage more intelligent workflows in Harness CI including automated rollbacks.

Reflections on the Migration Journey and its Challenges

The migration to SQLMesh at Harness was a journey of discovery and transformation. It challenged our preconceptions, uncovered inefficiencies in our workflows, and presented a new way of thinking about and executing data transformations. The journey from dbt to SQLMesh was not just about adopting a new set of tools; it was about embracing a philosophy that prioritizes clarity, efficiency, and determinism in data engineering.

I would highly recommend any team working with a legacy dbt project NOT just get it running with SQLMesh, but consider a phased migration where you can drop the dependency if you so wish.

While it might all sound like sunshine and rainbows, transitioning to a new system, especially one as transformative as SQLMesh, was not without its challenges. As one of the earliest adopters of SQLMesh, literally on the forefront of this migration, I found myself in a unique position. Not only did I navigate through the expected complexities of transitioning between two fundamentally different systems, but I also worked closely with the SQLMesh team to identify and squash bugs as they arose.

This collaboration, though immensely rewarding, shed light on the teething problems inherent to adopting new technology at its nascent stage. The SQLMesh ecosystem, while robust in its vision and capabilities, was still evolving. Encountering bugs and issues was par for the course — a natural part of the development cycle for any cutting-edge technology. These ranged from minor inconveniences to more significant challenges that required immediate attention and creative problem-solving.

Despite encountering bugs and challenges along the way, it’s crucial to highlight that I have never worked with a more responsive open-source software (OSS) team than the SQLMesh team. Their engagement, quick turnaround on issues, and openness to feedback and feature suggestions were unparalleled. This level of responsiveness reinforced our confidence in SQLMesh as a long-term solution for our data engineering needs. The team’s commitment to collaboration and continuous improvement was a standout aspect of our transition journey, showcasing the potential of effective OSS collaboration in driving technological evolution.

Closing Thoughts

The migration from dbt to SQLMesh is emblematic of the broader evolution in data engineering. It reflects a growing recognition that the tools and methodologies we adopt must evolve in tandem with our ambitions and challenges. SQLMesh represents both an alternative to dbt, and a new vision for data transformation — one that prioritizes efficiency, flexibility, and developer experience.

As we look to the future, the lessons learned from this migration journey resonate deeply. They remind us that progress often requires questioning the status quo, exploring new horizons, and being willing to embrace change. In SQLMesh, we have found a catalyst for reimagining what’s possible in the realm of data engineering.

At Harness, innovation is in our nature. Check out our careers page to stay up to date on any potential opportunities to join our growing team!

Sign up now

Sign up for our free plan, start building and deploying with Harness, take your software delivery to the next level.

Get a demo

Sign up for a free 14 day trial and take your software development to the next level

Documentation

Learn intelligent software delivery at your own pace. Step-by-step tutorials, videos, and reference docs to help you deliver customer happiness.

Case studies

Learn intelligent software delivery at your own pace. Step-by-step tutorials, videos, and reference docs to help you deliver customer happiness.

We want to hear from you

Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.

Sign up for our monthly newsletter

Subscribe to our newsletter to receive the latest Harness content in your inbox every month.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Continuous Integration