Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preview-csv-dataset #129

Merged
merged 20 commits into from
Mar 24, 2023
Merged

preview-csv-dataset #129

merged 20 commits into from
Mar 24, 2023

Conversation

Huongg
Copy link
Contributor

@Huongg Huongg commented Mar 16, 2023

Description

As a part of the discussion from #907, the preview function only applies to the CSV, Excel and Parquet Dataset.

To simplify the approach, we agree that every time the user clicks on CSVDataset in viz, it will load the first 10 rows in the metadata panel. You can check out more from this viz PR #1288.

There will be more discussion to follow on how we can allow users to define the preview method and the number of rows that they want to preview through the framework side.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@Huongg Huongg changed the title preview func for csv dataset preview-csv-dataset Mar 16, 2023
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. thanks. we need to write a unit test as well in this file - https://github.com/kedro-org/kedro/blob/main/tests/extras/datasets/pandas/test_csv_dataset.py

@Huongg
Copy link
Contributor Author

Huongg commented Mar 16, 2023

looks good. thanks. we need to write a unit test as well in this file - https://github.com/kedro-org/kedro/blob/main/tests/extras/datasets/pandas/test_csv_dataset.py

hey thanks @rashidakanchwala , i was going to ask about the test. Should I write it in the kedro-plugins instead as i thought the one in kedro repo is not in use anymore?

@noklam
Copy link
Contributor

noklam commented Mar 16, 2023

@Huongg All datasets change should go with kedro-plugins.

@merelcht
Copy link
Member

Woohoo you're a Python dev now! 🐍
This looks good, let me know if you need any help with tests!

@tynandebold tynandebold removed their request for review March 17, 2023 12:53
Signed-off-by: huongg <[email protected]>
@Huongg Huongg requested a review from noklam March 20, 2023 13:51
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations on your first kedro core PR! 😀 Generally looks good but I think there's a few things to deal with here. Let me know if you need a hand with it.

kedro-datasets/kedro_datasets/pandas/csv_dataset.py Outdated Show resolved Hide resolved
kedro-datasets/kedro_datasets/pandas/parquet_dataset.py Outdated Show resolved Hide resolved
kedro-datasets/kedro_datasets/pandas/csv_dataset.py Outdated Show resolved Hide resolved
kedro-datasets/tests/pandas/test_csv_dataset.py Outdated Show resolved Hide resolved
@Huongg Huongg requested review from noklam and antonymilne March 23, 2023 09:02
@Huongg Huongg requested a review from rashidakanchwala March 23, 2023 09:02
Huongg added 2 commits March 23, 2023 09:17
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Comment on lines +261 to +267
def _preview(self, nrows: int = 40) -> Dict:
# Create a copy so it doesn't contaminate the original dataset
dataset_copy = self._copy()
dataset_copy._load_args["nrows"] = nrows # pylint: disable=protected-access
data = dataset_copy.load()

return data.to_dict(orient="split")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know do if we already have a test covering this. Will be good to have (in framework) but not a must for this PR.

Something like this

def test_copy_not_mutate_original_dataset(raw_dataset):
  copy = raw_dataset._copy()
  copy.load()
  raw_dataset == deepcopy(raw_dataset)  # This won't work yet but the idea is to check the copy doesn't interfere the raw_dataset

Copy link
Contributor Author

@Huongg Huongg Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, if we can do it separately it would be great as i think we will do the release very soon (hope so). Shall I create a ticket/issue for this?

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff 👍 ⭐

@noklam noklam self-requested a review March 24, 2023 11:19
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats for your first PR😄

@Huongg Huongg merged commit 7b5f222 into main Mar 24, 2023
@Huongg Huongg deleted the preview-csv-dataset branch March 24, 2023 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants