-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kedro catalog rank
command
#2848
Conversation
Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: Ahdra Merali <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think we should add these CLI commands to the dataset factories docs too (https://docs.kedro.org/en/latest/data/data_catalog.html#load-multiple-datasets-with-similar-configuration-using-dataset-factories) but that can be done after kedro catalog resolve
. Tested the implementation, all good! :)
Signed-off-by: Ahdra Merali <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! ⭐
Were the changes in |
Without exceeding a command depth of three words the changes are no longer necessary. I didn't include them here as I was unsure if there would be any effect on |
Got it! Totally agree that we should remove the duplicated code instead, but of course as part of another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks for also updating the docs in this PR ⭐
Takes over #2796
Description
This PR introduces the command
kedro catalog rank
which ranks all dataset factories in the catalog config in the order of how they are matched (i.e. if an explicitly named dataset could match several of the factories in the catalog, it would be resolved with the first on the list.)The priority is determined as such:
Added note on streamlining the terminology
It looks like
dataset factory
is equiv tofactory pattern
,catalog factory
,dataset pattern
,dataset factory pattern
and others that have been used interchangeably. As per #2670 any mentions of a catalog entry that makes use of placeholders will be referred to as adataset factory
.Catalog factories
refer to all dataset factories within a specific catalog.What this means in practice / Why this is useful
As seen in the test case, a catalog is created with the following patterns:
an_example_{place}_{holder}
,an_example_{placeholder}
,an_{example_placeholder}
,on_{example_placeholder}
,The explicitly declared
an_example_data_set
could match any of patterns 1, 2, or 3:an_example_{data}_{set}
,an_example_{data_set}
,an_{example_data_set}
. Priority matching means that it will be resolved with the first pattern.kedro catalog factories
allows users to confirm the order in which factory patterns are considered for matching.Checklist
RELEASE.md
file