preview-csv-dataset #129

Huongg · 2023-03-16T10:55:01Z

Description

As a part of the discussion from #907, the preview function only applies to the CSV, Excel and Parquet Dataset.

To simplify the approach, we agree that every time the user clicks on CSVDataset in viz, it will load the first 10 rows in the metadata panel. You can check out more from this viz PR #1288.

There will be more discussion to follow on how we can allow users to define the preview method and the number of rows that they want to preview through the framework side.

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes

Signed-off-by: huongg <[email protected]>

rashidakanchwala

looks good. thanks. we need to write a unit test as well in this file - https://github.com/kedro-org/kedro/blob/main/tests/extras/datasets/pandas/test_csv_dataset.py

Huongg · 2023-03-16T11:44:38Z

looks good. thanks. we need to write a unit test as well in this file - https://github.com/kedro-org/kedro/blob/main/tests/extras/datasets/pandas/test_csv_dataset.py

hey thanks @rashidakanchwala , i was going to ask about the test. Should I write it in the kedro-plugins instead as i thought the one in kedro repo is not in use anymore?

noklam · 2023-03-16T12:56:39Z

@Huongg All datasets change should go with kedro-plugins.

Signed-off-by: huongg <[email protected]>

merelcht · 2023-03-17T10:45:36Z

Woohoo you're a Python dev now! 🐍
This looks good, let me know if you need any help with tests!

Signed-off-by: huongg <[email protected]>

kedro-datasets/kedro_datasets/pandas/csv_dataset.py

kedro-datasets/tests/pandas/test_csv_dataset.py

Signed-off-by: huongg <[email protected]>

antonymilne

Congratulations on your first kedro core PR! 😀 Generally looks good but I think there's a few things to deal with here. Let me know if you need a hand with it.

kedro-datasets/kedro_datasets/pandas/csv_dataset.py

kedro-datasets/kedro_datasets/pandas/parquet_dataset.py

kedro-datasets/kedro_datasets/pandas/csv_dataset.py

kedro-datasets/tests/pandas/test_csv_dataset.py

Signed-off-by: huongg <[email protected]>

Co-authored-by: Antony Milne <[email protected]>

…ins into preview-csv-dataset Signed-off-by: huongg <[email protected]>

Signed-off-by: huongg <[email protected]>

kedro-datasets/kedro_datasets/pandas/csv_dataset.py

noklam · 2023-03-23T12:52:54Z

kedro-datasets/kedro_datasets/pandas/excel_dataset.py

+    def _preview(self, nrows: int = 40) -> Dict:
+        # Create a copy so it doesn't contaminate the original dataset
+        dataset_copy = self._copy()
+        dataset_copy._load_args["nrows"] = nrows  # pylint: disable=protected-access
+        data = dataset_copy.load()
+
+        return data.to_dict(orient="split")


I don't know do if we already have a test covering this. Will be good to have (in framework) but not a must for this PR.

Something like this

def test_copy_not_mutate_original_dataset(raw_dataset): copy = raw_dataset._copy() copy.load() raw_dataset == deepcopy(raw_dataset) # This won't work yet but the idea is to check the copy doesn't interfere the raw_dataset

thanks for the suggestion, if we can do it separately it would be great as i think we will do the release very soon (hope so). Shall I create a ticket/issue for this?

kedro-datasets/kedro_datasets/snowflake/snowpark_dataset.py

antonymilne

Great stuff 👍 ⭐

Signed-off-by: huongg <[email protected]>

noklam

Congrats for your first PR😄

Signed-off-by: huongg <[email protected]>

…w-csv-dataset Signed-off-by: huongg <[email protected]>

preview func for csv dataset

906c040

Signed-off-by: huongg <[email protected]>

Huongg changed the title ~~preview func for csv dataset~~ preview-csv-dataset Mar 16, 2023

Huongg requested review from antonymilne, merelcht, rashidakanchwala and tynandebold March 16, 2023 11:01

rashidakanchwala reviewed Mar 16, 2023

View reviewed changes

preview for excel and parquet dataset

bfcb4d4

Signed-off-by: huongg <[email protected]>

tynandebold removed their request for review March 17, 2023 12:53

tests for _preview

635a2e4

Signed-off-by: huongg <[email protected]>

Huongg requested a review from noklam March 20, 2023 13:51

noklam reviewed Mar 20, 2023

View reviewed changes

kedro-datasets/kedro_datasets/pandas/csv_dataset.py Outdated Show resolved Hide resolved

kedro-datasets/tests/pandas/test_csv_dataset.py Outdated Show resolved Hide resolved

Huongg added 2 commits March 20, 2023 17:24

update var name in the loop

746835e

Signed-off-by: huongg <[email protected]>

create a copy of data and re-use _load

df7d6ba

Signed-off-by: huongg <[email protected]>

antonymilne reviewed Mar 21, 2023

View reviewed changes

Huongg and others added 10 commits March 21, 2023 21:10

remove parquet dataset preview

947934d

Signed-off-by: huongg <[email protected]>

default value for nrows

f957693

Co-authored-by: Antony Milne <[email protected]>

Merge branch 'preview-csv-dataset' of github.com:kedro-org/kedro-plug…

2dee922

…ins into preview-csv-dataset Signed-off-by: huongg <[email protected]>

set defaul val for excel nrows

064d3f4

Signed-off-by: huongg <[email protected]>

use orient='tight' in to_dict()

5a93cfe

Signed-off-by: huongg <[email protected]>

use split instead of tight as current pandas does not support it

5b52b5e

Signed-off-by: huongg <[email protected]>

add preview tests for excel and csv

888f4aa

Signed-off-by: huongg <[email protected]>

formatting

59d547e

Signed-off-by: huongg <[email protected]>

remove unused urllib

d41fc43

Signed-off-by: huongg <[email protected]>

pylint: disable=protected-access

eda8126

Signed-off-by: huongg <[email protected]>

Huongg requested review from noklam and antonymilne March 23, 2023 09:02

Huongg requested a review from rashidakanchwala March 23, 2023 09:02

Huongg added 2 commits March 23, 2023 09:17

ignore import error

81727df

Signed-off-by: huongg <[email protected]>

formatting

bb4373a

Signed-off-by: huongg <[email protected]>

noklam reviewed Mar 23, 2023

View reviewed changes

antonymilne approved these changes Mar 23, 2023

View reviewed changes

remove ignore import error as it happens locally only

5fa3084

Signed-off-by: huongg <[email protected]>

noklam self-requested a review March 24, 2023 11:19

noklam approved these changes Mar 24, 2023

View reviewed changes

Huongg added 2 commits March 24, 2023 11:24

fix lint error

4d386e1

Signed-off-by: huongg <[email protected]>

Merge branch 'main' of github.com:kedro-org/kedro-plugins into previe…

2202031

…w-csv-dataset Signed-off-by: huongg <[email protected]>

Huongg merged commit 7b5f222 into main Mar 24, 2023

Huongg deleted the preview-csv-dataset branch March 24, 2023 12:45

AhdraMeraliQB mentioned this pull request May 16, 2023

feat: Add metadata attribute to datasets #189

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preview-csv-dataset #129

preview-csv-dataset #129

Huongg commented Mar 16, 2023 •

edited

Loading

rashidakanchwala left a comment

Huongg commented Mar 16, 2023

noklam commented Mar 16, 2023 •

edited

Loading

merelcht commented Mar 17, 2023

antonymilne left a comment

noklam Mar 23, 2023

Huongg Mar 23, 2023 •

edited

Loading

antonymilne left a comment

noklam left a comment

preview-csv-dataset #129

preview-csv-dataset #129

Conversation

Huongg commented Mar 16, 2023 • edited Loading

Description

Checklist

rashidakanchwala left a comment

Choose a reason for hiding this comment

Huongg commented Mar 16, 2023

noklam commented Mar 16, 2023 • edited Loading

merelcht commented Mar 17, 2023

antonymilne left a comment

Choose a reason for hiding this comment

noklam Mar 23, 2023

Choose a reason for hiding this comment

Huongg Mar 23, 2023 • edited Loading

Choose a reason for hiding this comment

antonymilne left a comment

Choose a reason for hiding this comment

noklam left a comment

Choose a reason for hiding this comment

Huongg commented Mar 16, 2023 •

edited

Loading

noklam commented Mar 16, 2023 •

edited

Loading

Huongg Mar 23, 2023 •

edited

Loading