Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airflow-dbt-python v1.0.0 #93

Merged
merged 28 commits into from
Feb 20, 2023
Merged

airflow-dbt-python v1.0.0 #93

merged 28 commits into from
Feb 20, 2023

Conversation

tomasfarias
Copy link
Owner

@tomasfarias tomasfarias commented Feb 15, 2023

Over the last few months I've been working on a heavy refactoring of the internals of airflow-dbt-python. Life has consistently gotten in the way, but the finish line is now clear. The main goal of the refactor was formalize the responsibilities of operators, hooks, and utilities to allow me to commit to a v1.0.0 release:

airflow-dbt-python operators:

  • Define what dbt command/s to execute.
  • Expose configuration parameters.
  • Push/return execution results.

airflow-dbt-python hooks:

  • Define how to execute a dbt command.
  • Setup dbt for execution by interacting with remotes.
  • Enable Airflow features, like connections, to be used with dbt.

airflow-dbt-python utilities:

  • Misc. utilities
  • Configurations to map arguments exposed by operators to configuration values used by dbt.

Moreover, besides the items in the Roadmap (see the bottom), the release of v1.0.0 signifies I have no more ideas for features to implement in airflow-dbt-python. I'll be happy to resume maintenance of the issue board which has gone a bit stale at this point, and I'll rely on the community to propose new ideas for features to develop.

Due to the sheer size of this refactor, once merged I'll make a beta release for version 1.0.0. I encourage folks to try out the beta version and report any issues if they can.

Changes:

  • airflow-dbt-python operators no longer handle temporary directories.
    • airflow-dbt-python operators should be about defining what to execute, not how.
    • airflow-dbt-python hooks are the ones that should worry about setting up directories.
    • This significantly reduces the complexity in operators.
  • Refactored the dbt remote interface (previously dbt backends).
    • The interface has been simplified to two methods: upload and download.
    • Now utilizes a more specific URL class for all URL-like arguments.
    • All dbt remotes are now hooks too, which means they can use Airflow Connections.
  • Implemented a new DbtRemoteGitHook to utilize git repositories as remotes.
  • tar file is a new supported archive format for dbt projects.
  • Moved dbt configurations to the utilities module.
  • Much better understanding of how dbt logs things (as in, we now don't log things multiple times).
  • Support for Python 3.11.
  • Updated documentation.

Breaking changes:

  • Dropped support for Airflow major version 1.
    • It required too much work to maintain, and AWS MWAA has offered Airflow >= 2.x for a while now.

Roadmap (things that I didn't get to do but may do in the future):

  • Deferrable operators: dbt workflows are, by definition, heavily I/O bound. Airflow released deferrable operators to better utilize resources in tasks that do a lot of waiting, like heavy I/O bound tasks. These operators utilize Python's asyncio which unfortunately dbt does not currently use (dbt is thread-based). It may be possible to use deferrable operators with an asyncio dbt adapter, but I have not had the time to turn these ideas into anything concrete.
  • Airflow provider packages implement a set of interfaces that cover mostly hook attributes and methods. Although we have covered some, we don't cover all of the requirements yet.

dbt-core v1.4 broke us. It's unclear why, but as this feature is not
well documented, tested, or even fully ironed out yet, I'm allowing
this test to fail.

My hope is that as development of dbt-library advances, there will be
a cleaner way to do this.
@tomasfarias tomasfarias marked this pull request as ready for review February 20, 2023 00:16
@tomasfarias tomasfarias merged commit 06e6025 into master Feb 20, 2023
@tomasfarias tomasfarias deleted the feat/v1.0.0 branch February 20, 2023 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant