-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preview to datasets as specified in the Kedro catalog under metadata #1374
Conversation
… merged Signed-off-by: Rashida Kanchwala <[email protected]>
My suggested syntax for this is:
This would match You'd call the preview function as There's a question about what the default behaviour should be though. If I don't specify
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @rashidakanchwala all looks good to me, just left 2 small comments but nothing seems to be an issue to me.
@@ -46,7 +46,7 @@ const MetadataModal = ({ metadata, onToggle, visible }) => { | |||
</div> | |||
{hasPreview && ( | |||
<div className="pipeline-metadata-modal__preview-text"> | |||
Previewing first 40 rows only | |||
Previewing first {metadata.preview.data.length} rows only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this copy line might not make sense anymore. We use it originally because we were only going to show the first 40 rows. But now since users can define it themself, do we need it anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so. but should we still show it, it's nice to give them a number of rows of what they are viewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still useful to have the message but maybe rephrase it to: "Previewing first ... rows"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Merel here. Let's still give them the number.
|
||
|
||
|
||
# companies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we not going to remove these completely? if not then it might be helpful to have a comment to say why we just commented them out here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for testing purpose, I will remove them before the merge. thanks :)
I agree with @antonymilne 's comment above.
IMO if the user doesn't specify |
The preview shouldn't show at all. Basically, if you want to show preview then you need to have the preview_args in your paml and nrows would follow. Currently, we don't have any other options. But I guess if the user doesn't add nrows, it would default to 40, which is what is specified already in kedro-datasets(https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/pandas/csv_dataset.py#L197O) . I will still test and make sure it works |
@@ -595,7 +600,7 @@ def __post_init__(self, data_node: DataNode): | |||
self.tracking_data = dataset.load() | |||
elif data_node.is_preview_node(): | |||
try: | |||
self.preview = dataset._preview() # type: ignore | |||
self.preview = dataset._preview(data_node.get_preview_nrows()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it still trigger the _preview
method if the row = 0?
def is_preview_node(self): | ||
"""Checks if the current node has a preview""" | ||
return hasattr(self.kedro_obj, "_preview") | ||
metadata = getattr(self.kedro_obj, "metadata", {}) or {} | ||
return bool(metadata.get("kedro-viz", {}).get("preview")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it shouldn't be that complicated.
Many questions here:
- Does
preview: 0
count asTrue
orFalse
? It will beFalse
now, then the `get("preview", 0) path will never be executed.
The same pattern accessing the nested dict appears in multiple places, maybe refactor it with a small util function
Signed-off-by: Rashida Kanchwala <[email protected]>
I think the problem with this approach is that the user always has to specify
... but as it stands there's no way to actually use that default value, since as soon as you write For the case of pandas datasets it's not too much effort for the user, but in future maybe we'll have It's not a big thing so I'm ok with this if it's what we want to do, but I'd probably make The alternative solution is maybe we have two keys:
where Or another option is that Overall if we haven't already done so I think we need @yetudada and/or @NeroOkwa to weigh in on what the expected behaviour would be for this feature because I think there's a few legitimate options. |
@antonymilne We discussed this in stand up last week and agreed that the user needs to be specific about what needs to be previewed. So they will have to specify |
Signed-off-by: Rashida Kanchwala <[email protected]>
try: | ||
viz_metadata = metadata["kedro-viz"] | ||
except (AttributeError, KeyError): | ||
return None | ||
return viz_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be replaced with the getattr
method since it's not a nested key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional comment: Would it be better to just set this as a property viz_metadata
instead of a get method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did try this but I got some weird errors, so changed it to get_ . but i agree, it's better as a property. I will ping u again, if u have a couple of min to help me with the errors that would be appreciated. let me try doing it again.
Update - it works now, i will push :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Antony suggested doing it this way based on EAFP
try:
viz_metadata = metadata["kedro-viz"]
except (AttributeError, KeyError):
Signed-off-by: Rashida Kanchwala <[email protected]>
…ro-viz into preview-datasets-2 Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rashidakanchwala I'll message you on Slack as well because I'm a bit confused now about the implementation: are we using preview_args
or preview
with a numeric value? It seems mixed in the changes you've made..
metadata: | ||
kedro-viz: | ||
layer: feature | ||
preview: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we would just do the preview_args
and not the preview
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just old code. i removed it. thanks for flagging
try: | ||
self.viz_metadata = metadata["kedro-viz"] | ||
except (AttributeError, KeyError): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful here to emit an error message for debugging purposes?
return False | ||
return is_preview | ||
|
||
def get_preview_nrows(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets all preview_args
right? So I think it makes more sense to name the function get_preview_args
Signed-off-by: Rashida Kanchwala <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
return is_preview | ||
|
||
def get_preview_args(self): | ||
"""Gets the number of rows for the preview dataset""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Gets the number of rows for the preview dataset""" | |
"""Gets the preview arguments for a dataset""" |
@@ -368,6 +369,17 @@ def test_data_node_metadata(self): | |||
assert data_node_metadata.filepath == "/tmp/dataset.csv" | |||
assert data_node_metadata.run_command == "kedro run --to-outputs=dataset" | |||
|
|||
def test_get_preview_nrows(self): | |||
metadata = {"kedro-viz": {"preview": 3}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this test needs to be updated to use preview_args
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this. i missed it. the tests are still failing but that's because of the kedro.extras and that should go away once the other PR is merged.
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
Signed-off-by: Rashida Kanchwala <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! ⭐ Don't forget to add the change to the release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small suggestions but generally this looks great ⭐
try: | ||
is_preview = bool(self.viz_metadata["preview_args"]) | ||
except (AttributeError, KeyError): | ||
return False | ||
return is_preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok how it is, but I don't understand why we need the viz_metadata
property. This isn't used for anything other than preview args, so why can't we just do this?
def is_preview_node(self):
try:
self.kedro_obj.metadata["kedro-viz"]["preview_args"]
except (AttributeError, KeyError): # pragma: no cover
logger.debug("Kedro-viz preview args not found for %s", self.full_name)
return False
else:
return True
def get_preview_args(self):
"""Gets the preview arguments for a dataset"""
return self.kedro_obj.metadata["kedro-viz"]["preview_args"]
But it's not a big thing so if you think it's clearer how you have it then don't worry about it 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I see this is probably the opposite of what @noklam was suggesting, so feel free to ignore and just do whatever you think is best 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think viz_metadata might be eventually used for more than preview. for e.g. looking at this (#662) ticket. if we were to also show table statistics we could probably have
metadata :
kedro-viz:
preview_args :
nrows=4
profiler_args:
show = true..
something like it
@@ -368,6 +369,18 @@ def test_data_node_metadata(self): | |||
assert data_node_metadata.filepath == "/tmp/dataset.csv" | |||
assert data_node_metadata.run_command == "kedro run --to-outputs=dataset" | |||
|
|||
def test_get_preview_args(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can we also have a test for when preview_args
does not exist to check assert not data_node.is_preview_node()
?
layer=None, | ||
dataset=dataset, | ||
) | ||
assert data_node.is_preview_node() is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert data_node.is_preview_node() is True | |
assert data_node.is_preview_node() |
Signed-off-by: Rashida Kanchwala <[email protected]>
Description
This ticket is part 2 of the preview dataset feature development. In the first part, we enabled preview functionality for CSV and Excel datasets in the Kedro Framework. In Kedro-viz, we showed previews for all CSV and Excel datasets when the user turned on the preview option under the settings panel.
In this ticket, we are removing the preview option from the settings panel. Now, if the user wants to preview a dataset, they need to specify it via the catalog.yml file.
In the above the user wants to see the first 3 rows of the dataset. Previously we always showed the first 40 rows, but now we show the number of rows specified by the user in the catalog.yml file.
Development notes
QA notes
The tests and linter is currently failing but the changes to this are already part of the first PR related to layers so it should be fixed once that's merged. So we can ignore that in this PR.
Checklist
RELEASE.md
file