Unlocking Data Orchestration: How to Orchestrate Fivetran and dbt with Dagster

Jordan Ilyat
Rittman Analytics Blog
8 min readOct 6, 2023

--

In today’s data-centric world, organisations generate volumes of data from various sources, with many tools in the modern data stack creating their own stream of information. Data projects span across multiple teams and systems, leaving organisations without a global view of their most valuable assets.

This fragmented approach to data management results in a complex maze of siloed sources, convoluted transformations, and missed opportunities. The key to informed decision-making lies in the ability to harness the full potential of your data.

“Imagine a world where data flows seamlessly from source to analysis, where transformations are executed with precision, and where data engineers and analysts collaborate effortlessly.”

This is the world of orchestrating Fivetran and dbt with Dagster — an ecosystem that unlocked to me the power of Dagster, focusing on the end products of data transformation, rather than the process itself.

Is Dagster just an orchestrator?

In a well-orchestrated modern data stack, visibility extends across teams, ensuring that valuable insights are accessible throughout the organisation’s data landscape. This accessibility ensures that opportunities for data-guided decision-making are not overlooked, ultimately contributing to the success of strategies and outcomes.

The Orchestration Journey

Data is an invaluable asset, but its value to an organisation remains untapped until it is harnessed, transformed, and made accessible to those who need it most.

The orchestration journey involves a series of steps, from the moment data is ingested to when it is finally transformed into meaningful insights.

Fivetran: Streamlining Data Ingestion

“Fivetran solves the data engineering problem of ensuring data flows effortlessly into your centralised data warehouse.”

dbt: Transforming Data with Confidence

“dbt enables you to transform your raw data into structured, organised, and reliable data.”

Dagster: Orchestrating the flow

“Dagster is experiencing an evolution, transitioning from an orchestration tool to becoming a true data control plane — a pivotal shift in the role of data orchestrators within the Modern Data Stack”

These three components — Fivetran, dbt, and Dagster — work in harmony to streamline data operations, enable confident data transformations, and orchestrate the flow of software-defined assets throughout the organisation.

How do you orchestrate Fivetran and dbt with Dagster?

Pre-requisites: you’re already using Fivetran, dbt and Dagster Cloud Serverless deployment.

Step 1: Structuring your Dagster project

When setting up your Dagster project, it’s essential to consider ease of navigation, especially for new developers joining your team. A well-organised project structure enhances the developer experience and accelerates onboarding.

We’ve found that structuring your project with the following format is highly effective:

  • Each folder contains a module (indicated by an __init__.py file).
  • These submodules are then imported into the top-level Dagster orchestration module.
Dagster project structure

This approach not only simplifies the project’s architecture but also ensures that developers can quickly locate and access the components they need for their tasks. It fosters a smooth onboarding experience and promotes efficient collaboration within your team.

Step 2: Managing your Fivetran assets

When integrating Fivetran into your Dagster project, you have two options: you can either load all your Fivetran assets or explicitly declare the specific assets you require. We prefer the latter approach, as it enhances governance over the assets your package syncs.

To implement this approach effectively, follow these steps:

  • Add the “dagster-fivetran” package to your setup.py file
  • In your dagster project, create a dedicated file for your Fivetran assets, such as ‘dagster_orchestration/assets/fivetran_sources.py’
./dagster_orchestration/assets/fivetran_sources.py

This strategy not only provides more control over your defined assets but also simplifies the management of your Fivetran integration within your Dagster project.

Step 3: Generate Fivetran API credentials

Dagster relies on the Fivetran API to synchronise connectors seamlessly. To establish this connection, you’ll need to generate API credentials and incorporate them into your Dagster environment variables. Here’s how to do it:

  • Generate API Credentials: Visit the Fivetran API configuration page by navigating to https://fivetran.com/dashboard/user/api-config. There, you can generate the necessary API key and secret.
  • Incorporate Credentials: Once you have your API key and secret, add them to your Dagster environment variables. This integration ensures that Dagster can securely communicate with Fivetran to synchronize your connectors and data seamlessly.
Generate Fivetran API credentials

Step 4: Dagster jobs

In Dagster, jobs serve as the primary unit of execution and monitoring. The essential components of a job are its name and the asset selection to materialise.

Here are the steps to create a job within your Dagster project:

  • Organise your job module: Begin by organising your job module within your Dagster project. Create a dedicated file for your job module, such as ‘dagster_orchestration/jobs/__init__.py’
  • Import necessary components: In this module, import the required components and modules containing your Fivetran and dbt assets. This step ensures that your job has access to the necessary resources for execution.

It’s worth noting that these jobs provide the flexibility to update dbt or Fivetran independently. This means you can trigger updates directly from the user interface, offering you control over your data pipeline in one place (Dagster).

./dagster_orchestration/jobs/__init__.py

Step 5. Dagster Schedules

Jobs can be launched in a few different ways. A schedule is a Dagster definition that is used to execute a job at a fixed interval.

Here are the steps to create a schedule within your Dagster project:

  • Organise your schedule module: Begin by structuring your schedule module within your Dagster project. Create a dedicated file for your schedules module, such as ‘dagster_orchestration/schedules/__init__.py’
  • Import necessary components: In this module, import the required components and the module containing your jobs. This step ensures that your schedule has access to the necessary resources for execution.

It’s important to note that, in our example, we create a schedule specifically for the ‘everything’ job. However, additional schedules can be added as needed to accommodate different job execution patterns or intervals.

./dagster_orchestration/schedules/__init__.py

Step 6: Managing Dependencies

In Dagster, it’s essential to establish clear dependencies between your dbt assets and Fivetran assets. This ensures that your data transformations align seamlessly with your data sources. While the approach below is effective, we’re always exploring more elegant alternatives. To manage dependencies, follow these steps:

  • Modify Your dbt Configuration: In your dbt project folder, open the relevant dbt .yml file(s) where you define your models and transformations.
  • Specify Dependencies: Within these files, specify the dependencies between your dbt models and the corresponding Fivetran assets. This step helps Dagster understand the relationships between data sources and transformations.
./dbt_project/models/staging/stg_ad_reporting/src_google_ads.yml

Step 7: Definitions object

In Dagster, the Definitions object unifies the core concepts of Dagster. To finalise your orchestration project, follow these steps:

  • Organise your project module: Start by organising your project effectively. Create a dedicated file for your project module, such as dagster_orchestration/__init__.py’
  • Import necessary components: Within this module, import all the essential components and sub-modules containing your assets, resources, jobs, and schedules. This step ensures that your project has access to the required resources and configurations.
  • Aggregate your definitions: You might have numerous components within your project. To improve readability and maintain a structured codebase, consider aggregating your project definitions.
  • Define the Definitions object: At the heart of your orchestration project is the Definition object. It brings together all the key components and configurations, providing a clear and cohesive view of your data pipeline.
./dagster_orchestration/__init__.py

By following these steps, you’ll create a well-structured project in Dagster.

What to expect?

When you orchestrate Fivetran and dbt with Dagster, you open the door to a world of possibilities. Dagster provides a comprehensive overview of your data and analytics infrastructure, breaking down silos and providing insights across teams and systems.

“When something breaks, Dagster enables practitioners to understand where, when, and why it broke”

If we change asset ‘x’ what does this impact downstream?

These assets could be notebooks, app interfaces, or a simple CSV file. As data platform engineers, these are the goods we are bringing to the table, regardless of how clever our engineering is behind the scenes.

By orchestrating Fivetran and dbt with Dagster, you are not only transforming your data infrastructure but also setting the stage for data excellence and agility.

Conclusion

In today’s dynamic landscape of data analytics, the path to success hinges on the efficient orchestration of your data infrastructure. What was once considered a luxury has evolved into a necessity.

It’s not just about managing data — it’s about harnessing its full potential. With the right tools and strategies in place, you can turn raw data into actionable insights that fuel your organisation’s growth and success.

Without an orchestrated approach, organisations find themselves grappling with a tangled web of data sources, inefficient processes, and a lack of collaboration.

The orchestration of Fivetran and dbt with Dagster isn’t just a technological innovation; it’s a strategic advantage. It empowers you to leverage your data as a strategic asset, making it the driving force behind your organisation’s data analytics function.

In conclusion, the choice is clear. Inaction comes with risks, while orchestration offers unparalleled benefits. It’s time for organisations to take the bold step towards orchestrating their data infrastructure.

Ready to unlock the power of orchestration?

Whether you’re looking to implement a proven data and analytics infrastructure, need assistance in achieving strategic objectives, or considering what’s next, our team would love to hear from you, contact us.

--

--