-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preview to datasets as specified in the Kedro catalog under metadata #1374
Changes from 1 commit
e2086a3
859c359
1257a66
bee0771
71ff583
dd168ae
faac571
9346fb8
0c98271
a241bf5
e604031
d5ed369
90ea841
4d85ddb
4ee8367
6cbd29f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,41 @@ | ||
companies: | ||
type: pandas.CSVDataSet | ||
filepath: ${base_location}/01_raw/companies.csv | ||
layer: raw | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview: 5 | ||
|
||
reviews: | ||
type: pandas.CSVDataSet | ||
filepath: ${base_location}/01_raw/reviews.csv | ||
layer: raw | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview: 10 | ||
|
||
shuttles: | ||
type: pandas.ExcelDataSet | ||
filepath: ${base_location}/01_raw/shuttles.xlsx | ||
layer: raw | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview: 15 | ||
|
||
|
||
|
||
|
||
# companies: | ||
# type: pandas.CSVDataSet | ||
# filepath: ${base_location}/01_raw/companies.csv | ||
# layer: raw | ||
|
||
# reviews: | ||
# type: pandas.CSVDataSet | ||
# filepath: ${base_location}/01_raw/reviews.csv | ||
# layer: raw | ||
|
||
# shuttles: | ||
# type: pandas.ExcelDataSet | ||
# filepath: ${base_location}/01_raw/shuttles.xlsx | ||
# layer: raw |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,9 @@ | |
feature_importance_output: | ||
type: pandas.CSVDataSet | ||
filepath: ${base_location}/04_feature/feature_importance_output.csv | ||
layer: feature | ||
metadata: | ||
kedro-viz: | ||
layer: feature | ||
preview: 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought we would just do the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is just old code. i removed it. thanks for flagging |
||
|
||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -488,7 +488,12 @@ def is_tracking_node(self): | |||||
|
||||||
def is_preview_node(self): | ||||||
"""Checks if the current node has a preview""" | ||||||
return hasattr(self.kedro_obj, "_preview") | ||||||
metadata = getattr(self.kedro_obj, "metadata", {}) or {} | ||||||
return bool(metadata.get("kedro-viz", {}).get("preview")) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like it shouldn't be that complicated.
The same pattern accessing the nested dict appears in multiple places, maybe refactor it with a small util function |
||||||
|
||||||
def get_preview_nrows(self): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This gets all |
||||||
"""Gets the number of rows for the preview dataset""" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
return int(self.kedro_obj.metadata.get("kedro-viz", {}).get("preview", 0)) | ||||||
|
||||||
|
||||||
@dataclass | ||||||
|
@@ -595,7 +600,7 @@ def __post_init__(self, data_node: DataNode): | |||||
self.tracking_data = dataset.load() | ||||||
elif data_node.is_preview_node(): | ||||||
try: | ||||||
self.preview = dataset._preview() # type: ignore | ||||||
self.preview = dataset._preview(data_node.get_preview_nrows()) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it still trigger the |
||||||
except Exception as exc: # pylint: disable=broad-except # pragma: no cover | ||||||
logger.warning( | ||||||
"'%s' could not be previewed. Full exception: %s: %s", | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
# pylint: disable=too-many-public-methods | ||
import base64 | ||
from functools import partial | ||
from pathlib import Path | ||
|
@@ -368,6 +369,17 @@ def test_data_node_metadata(self): | |
assert data_node_metadata.filepath == "/tmp/dataset.csv" | ||
assert data_node_metadata.run_command == "kedro run --to-outputs=dataset" | ||
|
||
def test_get_preview_nrows(self): | ||
metadata = {"kedro-viz": {"preview": 3}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm guessing this test needs to be updated to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for this. i missed it. the tests are still failing but that's because of the kedro.extras and that should go away once the other PR is merged. |
||
dataset = CSVDataSet(filepath="test.csv", metadata=metadata) | ||
data_node = GraphNode.create_data_node( | ||
full_name="dataset", | ||
tags=set(), | ||
layer=None, | ||
dataset=dataset, | ||
) | ||
assert data_node.get_preview_nrows() == 3 | ||
|
||
def test_preview_data_node_metadata(self): | ||
mock_preview_data = { | ||
"columns": ["id", "company_rating", "company_location"], | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,7 +46,7 @@ const MetadataModal = ({ metadata, onToggle, visible }) => { | |
</div> | ||
{hasPreview && ( | ||
<div className="pipeline-metadata-modal__preview-text"> | ||
Previewing first 40 rows only | ||
Previewing first {metadata.preview.data.length} rows only | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this copy line might not make sense anymore. We use it originally because we were only going to show the first 40 rows. But now since users can define it themself, do we need it anymore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought so. but should we still show it, it's nice to give them a number of rows of what they are viewing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's still useful to have the message but maybe rephrase it to: "Previewing first ... rows" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with Merel here. Let's still give them the number. |
||
</div> | ||
)} | ||
</div> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we not going to remove these completely? if not then it might be helpful to have a comment to say why we just commented them out here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for testing purpose, I will remove them before the merge. thanks :)