-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DatasetManager.register_dataset_change(): Fall back to datasets' "extra" if arg extra is empty #37006
Comments
See #37810 and issues and PRs linked in it (including in comments). Doing this fallback would go against the original design vision. |
I’m sorry but if I understand the original discussion that is completely different from what we try to accomplish. I view the extra as a way to quantify the uri more. The uri identifies the dataset, the extra contains specific information on how/what was used in/from that dara when the task was triggered. In our case this is usually the data version. I obviously can’t force you to implement it like this, but for us the uri identifies which dataset and triggers any scheduling, the extra identifies exactly which version of the data was used and is unique to every task that uses it as in/outlet. So I would argue that data is part of the dataset and not xcom. |
I don't think it's the matter of forcing anyone or convincing single person. Looks like you just want to propose a different way of treating extras - or maybe a completely new feature of dataset. And the right way of doing it @Blizzke is as usual - open a devlist discussion (not a github issue) where you explain your rationale, present your proposal, convince people at the devlist to your idea and in case of a bigger change write an Airflow Improvement Proposal describing your - apparently - design change (or addition) to the DataSet feature. Then you either are able to reach consensus that your idea is good. or when you can't reach consensus - you call for a vote. This is how it works when you propose a design change to an important feature of Airflow that impact everyone using it - there is no other way. You will find all the links in https://airflow.apache.org/community/ It's entirely up to the arguments you present and how convincing you are and how complete your proposal will be to get others to agree to it. It's entirely in your hands, you just need to put an effort to convince those from the community who will take part in the discussion. |
Description
I would like to know if it is possible to change the behavior of the
register_dataset_change
function to copy theextra
from the specified dataset if the actualextra
argument is empty?Use case/motivation
We have a lot of datasets where we, within the dataset, have versions of the data.
Because - for lineage etc - we want to keep the URI for all of those the same (it is the same dataset after all), we use the
extra
argument to pass along the version number etc. Unfortunately, after a change is registered, we lose that information.It seems logical to us that when no
extra
is specified (which is actually impossible as far as we can see, since the current calling locations don't even allow it), that you fall back to the extra of the dataset. Our override is as simple asRelated issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: