Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No datasets emitted in DbtLocalBaseOperator after Cosmos upgrade to 1.3.0 #796

Closed
jakob-hvitnov-telia opened this issue Jan 12, 2024 · 3 comments · Fixed by #810
Closed
Labels
area:datasets Related to the Airflow datasets feature/module dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc

Comments

@jakob-hvitnov-telia
Copy link
Contributor

jakob-hvitnov-telia commented Jan 12, 2024

After updating Cosmos from 1.1.3 to 1.3.0 we have run into an issue where we are no longer seeing emission of datasets when using theDbtRunLocal operator.
At the same update we went from Airflow 2.6.3 to 2.7.2.

Dag log output (from dbt_run task) before the update:

[2024-01-04 12:49:06] {{local.py:238}} INFO - Inlets: [Dataset(uri='correct_uri_to_inlet01_anonymized', extra=None), Dataset(uri='correct_uri_to_inlet02_anonymized', extra=None), Dataset(uri='correct_uri_to_inlet03_anonymized', extra=None), Dataset(correct_uri_to_inlet04_anonymized', extra=None)]
[2024-01-04, 23:10:00] CET {{local.py:239}} INFO - Outlets: [Dataset(uri='correct_uri_to_outlet01_anonymized', extra=None), Dataset(uri='correct_uri_to_outlet01_anonymized', extra=None), Dataset(uri='correct_uri_to_outlet01_anonymized', extra=None), Dataset(uri='correct_uri_to_outlet01_anonymized', extra=None)]

Dag log output after the update:

[2024-01-05 10:26:17] {{local.py:238}} INFO - Inlets: []
[2024-01-05 10:26:17] {{local.py:239}} INFO - Outlets: []

We are running on AWS MWAA.

I suspect the change in has something to do with this commit as this introduces the get_datasets method, which in its docstring clarifies that the method relies on:

  • profiles
  • {project_dir}/target/manifest.json
  • {project_dir}/target/run_results.json

It is not quite clear to me if these are generated by our call to DbtRunLocal at the moment.
There might be configuration options I am overlooking:

dbt_run = DbtRunLocalOperator(
        profile_config=profile_config,
        project_dir= os.path.join(PurePath(__file__).parent.parent.parent.parent, "dbt"),
        task_id="dbt_run",
        select="path:path/to/our/dbt/models",
        install_deps=False,
        emit_datasets=True
        )

Docstring typo

Also in DbtLocalBaseOperator there seems to be an error in the docstring (lines 97-98) where it says:

:param install_deps: If true, install dependencies before running the command
:param install_deps: If true, the operator will set inlets and outlets

I think the phrasing should probably be:

:param install_deps: If true, install dependencies before running the command
:param emit_datasets: If true, the operator will set inlets and outlets

A fix for this is in https://github.com/astronomer/astronomer-cosmos/pull/797

Copy link

dosubot bot commented Jan 12, 2024

🤖

That sounds like a great proposal! It's clear that you've put thought into the issue and have a solid plan for addressing it. Encouraging user contributions is a fantastic way to improve Cosmos for everyone, and your proposed solution seems like a step in the right direction. I encourage you to go ahead and create a pull request with your proposed changes. Your contribution will be valuable to the Cosmos community. Keep up the great work!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area:datasets Related to the Airflow datasets feature/module dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Jan 12, 2024
@jakob-hvitnov-telia jakob-hvitnov-telia changed the title Reintroduce emit_datasets in DbtLocalBaseOperator No datasets emitted in DbtLocalBaseOperator after Cosmos upgrade to 1.3.0 Jan 18, 2024
@tatiana
Copy link
Collaborator

tatiana commented Jan 18, 2024

@jakob-hvitnov-telia do things work as expected if you install apache-airflow-providers-openlineage or if you use install astronomer-cosmos[openlineage]?

@jakob-hvitnov-telia
Copy link
Contributor Author

jakob-hvitnov-telia commented Jan 19, 2024

@tatiana installing apache-airflow-providers-openlineage did the trick!

So in the end it seems the Airflow update was more the culprit than the Cosmos update.

Thank you very much for your help!

tatiana added a commit that referenced this issue Jan 23, 2024
Remove incorrect docstring from `DbtLocalBaseOperator` (relatest to
#796)

---------

Co-authored-by: Justin Bandoro <[email protected]>
Co-authored-by: Tatiana Al-Chueyr <[email protected]>
tatiana added a commit that referenced this issue Jan 23, 2024
tatiana added a commit that referenced this issue Jan 23, 2024
tatiana added a commit that referenced this issue Jan 26, 2024
[Cosmos docs](https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html)
stated that users didn't have to install any dependency to use
OpenLineage with Cosmos.

However, for inlets and outlets to be emitted, Airflow 2.7 users must
install `apache-airflow-providers-openlineage` or
`astronomer-cosmos[openlineage]`.

Closes: #796
tatiana pushed a commit that referenced this issue Jan 26, 2024
Remove incorrect docstring from `DbtLocalBaseOperator` (relatest to
#796)

---------

Co-authored-by: Justin Bandoro <[email protected]>
Co-authored-by: Tatiana Al-Chueyr <[email protected]>
(cherry picked from commit ef2c7bb)
tatiana added a commit that referenced this issue Jan 26, 2024
[Cosmos docs](https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html)
stated that users didn't have to install any dependency to use
OpenLineage with Cosmos.

However, for inlets and outlets to be emitted, Airflow 2.7 users must
install `apache-airflow-providers-openlineage` or
`astronomer-cosmos[openlineage]`.

Closes: #796
(cherry picked from commit fe01237)
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this issue Jul 14, 2024
Remove incorrect docstring from `DbtLocalBaseOperator` (relatest to
astronomer#796)

---------

Co-authored-by: Justin Bandoro <[email protected]>
Co-authored-by: Tatiana Al-Chueyr <[email protected]>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this issue Jul 14, 2024
[Cosmos docs](https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html)
stated that users didn't have to install any dependency to use
OpenLineage with Cosmos.

However, for inlets and outlets to be emitted, Airflow 2.7 users must
install `apache-airflow-providers-openlineage` or
`astronomer-cosmos[openlineage]`.

Closes: astronomer#796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:datasets Related to the Airflow datasets feature/module dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants