-
Notifications
You must be signed in to change notification settings - Fork 913
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the I/O transparency with kedro run
#1691
Comments
I would like to say that |
The compiled filepath is actually already available in the data catalog, just it's quite hidden away:
Fundamental issueI think this and #1580 are actually just symptoms of a more fundamental underlying issue: the API and underlying workings of I don't think they're massively wrong as it stands, but I think it would be a good exercise to go through them and work out exactly what functionality we should expose in the API and how we might like to rework them. e.g. in the case raised here there is quite a bit of confusion about how to get the filepath:
So I think we should look holistically at the structures involved here and work out what the API should look like so there's one, clear way to access the things that people need to access. I actually don't think this is such a huge task. Then we can tell much more easily whether we need any new functionality in these structures (like a |
Curious what @deepyaman thinks of this. He may well have the honour of being the most familiar person playing around with |
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Introduction
With a highly parameterized configuration (
Jinja
,hydra
orOmegaConf
), it is not easy to troubleshoot data easily. Often, it is useful to get the full path so users can inspect the data manually. Currently, users need to hack intocontext
and doyaml.dump
to get this information.i.e.
s3://{base_path}//{special_parameter}
-> Should be compile tos3://prefix/filename
Ultimately, the goal is to provide full transparency about the I/O within a
kedro run
, user should be able to get this information for logging or reproducing a particular experiment.Background
Related Issues:
catalog.dumps
which is more suitable for Jupyter workflow. (It should log the full path in case relatively path is used)load_version
isn't available to users withVersionedDataSet
and it's something that we need to fix.kedro run
- potentially with some DEBUG level messageRollout strategy
There should be no breaking changes, 1 & 2 can be done in parallel. For 3 we can default with no changes and optionally expose more verbose logging.
The text was updated successfully, but these errors were encountered: