-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preview-csv-dataset #129
preview-csv-dataset #129
Conversation
Signed-off-by: huongg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. thanks. we need to write a unit test as well in this file - https://github.com/kedro-org/kedro/blob/main/tests/extras/datasets/pandas/test_csv_dataset.py
hey thanks @rashidakanchwala , i was going to ask about the test. Should I write it in the |
@Huongg All datasets change should go with |
Signed-off-by: huongg <[email protected]>
Woohoo you're a Python dev now! 🐍 |
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations on your first kedro core PR! 😀 Generally looks good but I think there's a few things to deal with here. Let me know if you need a hand with it.
Signed-off-by: huongg <[email protected]>
Co-authored-by: Antony Milne <[email protected]>
…ins into preview-csv-dataset Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
Signed-off-by: huongg <[email protected]>
def _preview(self, nrows: int = 40) -> Dict: | ||
# Create a copy so it doesn't contaminate the original dataset | ||
dataset_copy = self._copy() | ||
dataset_copy._load_args["nrows"] = nrows # pylint: disable=protected-access | ||
data = dataset_copy.load() | ||
|
||
return data.to_dict(orient="split") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know do if we already have a test covering this. Will be good to have (in framework) but not a must for this PR.
Something like this
def test_copy_not_mutate_original_dataset(raw_dataset):
copy = raw_dataset._copy()
copy.load()
raw_dataset == deepcopy(raw_dataset) # This won't work yet but the idea is to check the copy doesn't interfere the raw_dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, if we can do it separately it would be great as i think we will do the release very soon (hope so). Shall I create a ticket/issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff 👍 ⭐
Signed-off-by: huongg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats for your first PR😄
Signed-off-by: huongg <[email protected]>
…w-csv-dataset Signed-off-by: huongg <[email protected]>
Description
As a part of the discussion from #907, the preview function only applies to the CSV, Excel and Parquet Dataset.
To simplify the approach, we agree that every time the user clicks on CSVDataset in viz, it will load the first 10 rows in the metadata panel. You can check out more from this viz PR #1288.
There will be more discussion to follow on how we can allow users to define the preview method and the number of rows that they want to preview through the framework side.
Checklist
RELEASE.md
file