Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preview to datasets as specified in the Kedro catalog under metadata #1374

Merged
merged 16 commits into from
Jun 8, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions demo-project/conf/base/catalog_01_raw.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,41 @@
companies:
type: pandas.CSVDataSet
filepath: ${base_location}/01_raw/companies.csv
layer: raw
metadata:
kedro-viz:
layer: raw
preview: 5

reviews:
type: pandas.CSVDataSet
filepath: ${base_location}/01_raw/reviews.csv
layer: raw
metadata:
kedro-viz:
layer: raw
preview: 10

shuttles:
type: pandas.ExcelDataSet
filepath: ${base_location}/01_raw/shuttles.xlsx
layer: raw
metadata:
kedro-viz:
layer: raw
preview: 15




# companies:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we not going to remove these completely? if not then it might be helpful to have a comment to say why we just commented them out here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for testing purpose, I will remove them before the merge. thanks :)

# type: pandas.CSVDataSet
# filepath: ${base_location}/01_raw/companies.csv
# layer: raw

# reviews:
# type: pandas.CSVDataSet
# filepath: ${base_location}/01_raw/reviews.csv
# layer: raw

# shuttles:
# type: pandas.ExcelDataSet
# filepath: ${base_location}/01_raw/shuttles.xlsx
# layer: raw
5 changes: 4 additions & 1 deletion demo-project/conf/base/catalog_04_feature.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
feature_importance_output:
type: pandas.CSVDataSet
filepath: ${base_location}/04_feature/feature_importance_output.csv
layer: feature
metadata:
kedro-viz:
layer: feature
preview: 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we would just do the preview_args and not the preview?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just old code. i removed it. thanks for flagging



12 changes: 6 additions & 6 deletions package/kedro_viz/integrations/kedro/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ def load_data(

with KedroSession.create(
project_path=project_path,
env=env, # type: ignore
env=env,
save_on_close=False,
extra_params=extra_params, # type: ignore
extra_params=extra_params,
) as session:
context = session.load_context()
session_store = session._store
Expand All @@ -100,9 +100,9 @@ def load_data(

with KedroSession.create(
project_path=project_path,
env=env, # type: ignore
env=env,
save_on_close=False,
extra_params=extra_params, # type: ignore
extra_params=extra_params,
) as session:
context = session.load_context()
session_store = session._store
Expand All @@ -118,9 +118,9 @@ def load_data(
with KedroSession.create(
package_name=metadata.package_name,
project_path=project_path,
env=env, # type: ignore
env=env,
save_on_close=False,
extra_params=extra_params, # type: ignore
extra_params=extra_params,
) as session:
context = session.load_context()
session_store = session._store
Expand Down
9 changes: 7 additions & 2 deletions package/kedro_viz/models/flowchart.py
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,12 @@ def is_tracking_node(self):

def is_preview_node(self):
"""Checks if the current node has a preview"""
return hasattr(self.kedro_obj, "_preview")
metadata = getattr(self.kedro_obj, "metadata", {}) or {}
return bool(metadata.get("kedro-viz", {}).get("preview"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it shouldn't be that complicated.
Many questions here:

  • Does preview: 0 count as True or False? It will be False now, then the `get("preview", 0) path will never be executed.

The same pattern accessing the nested dict appears in multiple places, maybe refactor it with a small util function


def get_preview_nrows(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets all preview_args right? So I think it makes more sense to name the function get_preview_args

"""Gets the number of rows for the preview dataset"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Gets the number of rows for the preview dataset"""
"""Gets the preview arguments for a dataset"""

return int(self.kedro_obj.metadata.get("kedro-viz", {}).get("preview", 0))


@dataclass
Expand Down Expand Up @@ -595,7 +600,7 @@ def __post_init__(self, data_node: DataNode):
self.tracking_data = dataset.load()
elif data_node.is_preview_node():
try:
self.preview = dataset._preview() # type: ignore
self.preview = dataset._preview(data_node.get_preview_nrows())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it still trigger the _preview method if the row = 0?

except Exception as exc: # pylint: disable=broad-except # pragma: no cover
logger.warning(
"'%s' could not be previewed. Full exception: %s: %s",
Expand Down
12 changes: 12 additions & 0 deletions package/tests/test_models/test_flowchart.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# pylint: disable=too-many-public-methods
import base64
from functools import partial
from pathlib import Path
Expand Down Expand Up @@ -368,6 +369,17 @@ def test_data_node_metadata(self):
assert data_node_metadata.filepath == "/tmp/dataset.csv"
assert data_node_metadata.run_command == "kedro run --to-outputs=dataset"

def test_get_preview_nrows(self):
metadata = {"kedro-viz": {"preview": 3}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this test needs to be updated to use preview_args right?

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this. i missed it. the tests are still failing but that's because of the kedro.extras and that should go away once the other PR is merged.

dataset = CSVDataSet(filepath="test.csv", metadata=metadata)
data_node = GraphNode.create_data_node(
full_name="dataset",
tags=set(),
layer=None,
dataset=dataset,
)
assert data_node.get_preview_nrows() == 3

def test_preview_data_node_metadata(self):
mock_preview_data = {
"columns": ["id", "company_rating", "company_location"],
Expand Down
2 changes: 1 addition & 1 deletion src/components/metadata-modal/metadata-modal.js
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ const MetadataModal = ({ metadata, onToggle, visible }) => {
</div>
{hasPreview && (
<div className="pipeline-metadata-modal__preview-text">
Previewing first 40 rows only
Previewing first {metadata.preview.data.length} rows only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this copy line might not make sense anymore. We use it originally because we were only going to show the first 40 rows. But now since users can define it themself, do we need it anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so. but should we still show it, it's nice to give them a number of rows of what they are viewing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still useful to have the message but maybe rephrase it to: "Previewing first ... rows"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Merel here. Let's still give them the number.

</div>
)}
</div>
Expand Down
4 changes: 1 addition & 3 deletions src/components/metadata/metadata.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ import './styles/metadata.css';
* Shows node meta data
*/
const MetaData = ({
flags,
isPrettyNameOn,
metadata,
onToggleCode,
Expand Down Expand Up @@ -58,7 +57,7 @@ const MetaData = ({
const hasPlot = Boolean(metadata?.plot);
const hasImage = Boolean(metadata?.image);
const hasTrackingData = Boolean(metadata?.trackingData);
const hasPreviewData = Boolean(metadata?.preview) && flags.previewDataSet;
const hasPreviewData = Boolean(metadata?.preview);
const isMetricsTrackingDataset = nodeTypeIcon === 'metricsTracking';
const hasCode = Boolean(metadata?.code);
const isTranscoded = Boolean(metadata?.originalType);
Expand Down Expand Up @@ -326,7 +325,6 @@ const MetaData = ({
};

export const mapStateToProps = (state, ownProps) => ({
flags: state.flags,
isPrettyNameOn: state.prettyName,
metadata: getClickedNodeMetaData(state),
theme: state.theme,
Expand Down
6 changes: 0 additions & 6 deletions src/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,6 @@ export const flags = {
default: false,
icon: '🔛',
},
previewDataSet: {
name: 'Preview datasets',
description: 'Enable dataset previews in the metadata panel',
default: true,
icon: '🗂',
},
};

export const settings = {
Expand Down