Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Airflow Dataset in inlets and outlets breaks Datahub Airflow plugin #7809

Closed
error418 opened this issue Apr 13, 2023 · 4 comments · Fixed by #8853
Closed

Using Airflow Dataset in inlets and outlets breaks Datahub Airflow plugin #7809

error418 opened this issue Apr 13, 2023 · 4 comments · Fixed by #8853
Labels
bug Bug report

Comments

@error418
Copy link

error418 commented Apr 13, 2023

Describe the bug
Using object instances aside datahub_provider.entities.Dataset in inlets or outlets breaks the lineage emit of the Datahub Airflow plugin and prevents use of Data-aware Scheduling

To Reproduce
Steps to reproduce the behavior:

  1. Create an Airflow DAG with a task emitting Datasets in its outlets
  2. Add a outlet item of type airflow.datasets.Dataset
  3. Run DAG

Lineage emit will fail due to missing attribute urn on airflow.datasets.Dataset instances:

[2023-04-13, 09:48:51 CEST] {_plugin.py:147} INFO - Emitting Datahub Dataflow: DataFlow(urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7f8582489540>, id='***', orchestrator='airflow', cluster='***', name=None, description='***', properties={***} INFO - Exception: Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/datahub_provider/_plugin.py", line 284, in custom_on_success_callback
    datahub_task_status_callback(context, status=InstanceRunResult.SUCCESS)
  File "/home/airflow/.local/lib/python3.10/site-packages/datahub_provider/_plugin.py", line 163, in datahub_task_status_callback
    datajob.outlets.append(outlet.urn)
AttributeError: 'Dataset' object has no attribute 'urn'

Expected behavior
Adding a Dataset of type airflow.datasets.Dataset should not impact the functionality of Airflow and the Datahub Plugin. This error makes it also not possible to use the Data-aware Scheduling feature of Airflow

Airflow Datasets should be ignored when processing the inlets/outlets in the plugin

Additional context

Points of failure:

datajob.inlets.append(inlet.urn)

datajob.outlets.append(outlet.urn)

Solution

inlets and outlets must not be assumed to be always of type datahub_provider.entities.Dataset.

@error418 error418 added the bug Bug report label Apr 13, 2023
@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label May 15, 2023
@error418
Copy link
Author

This affects the by the time latest Datahub release v0.10.2

@github-actions github-actions bot removed the stale label May 16, 2023
@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Jun 15, 2023
@github-actions
Copy link

This issue was closed because it has been inactive for 30 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 15, 2023
@hsheth2 hsheth2 reopened this Oct 4, 2023
@github-actions github-actions bot removed the stale label Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants