Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): Extend preview mechanism #595

Merged
merged 40 commits into from
Apr 3, 2024
Merged

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented Mar 5, 2024

Description

Related to: kedro-org/kedro-viz#1622

This PR introduces an update to extend the preview functionality across more kedro-datasets, extending support for different data formats.

List of Datasets we are aiming to support in this PR:

  • ParquetDataset
  • SQLTableDataset
  • pandas.JSONDataset
  • JSONDataset

Development notes

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@SajidAlamQB SajidAlamQB changed the title Extend preview mechanism feat(dataset) Extend preview mechanism Mar 15, 2024
@SajidAlamQB SajidAlamQB changed the title feat(dataset) Extend preview mechanism feat(datasets): Extend preview mechanism Mar 15, 2024
@noklam
Copy link
Contributor

noklam commented Mar 20, 2024

@SajidAlamQB Is this still a draft or waiting for something?

@SajidAlamQB
Copy link
Contributor Author

@SajidAlamQB Is this still a draft or waiting for something?

Its ready for review, just some coverage tests that are needed.

@SajidAlamQB SajidAlamQB marked this pull request as ready for review March 21, 2024 07:51
@SajidAlamQB SajidAlamQB requested a review from noklam March 21, 2024 13:48
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments:

  • I would try to stick with _load as much as possible to avoid re-implementation and edge case that run into difference load data. There may be some special case if load and preview expect different data, we discussed this in tech design before but that's another topic.
  • Performance - the preview should be a lightweight solution and should never read the full data.

kedro-datasets/kedro_datasets/pandas/parquet_dataset.py Outdated Show resolved Hide resolved
kedro-datasets/kedro_datasets/pandas/sql_dataset.py Outdated Show resolved Hide resolved
@merelcht merelcht mentioned this pull request Apr 2, 2024
2 tasks
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks @SajidAlamQB <3

@SajidAlamQB SajidAlamQB requested a review from noklam April 3, 2024 14:14
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the comments, this is a great addition to the previews feature:sparkles:

@SajidAlamQB SajidAlamQB merged commit 1a91787 into main Apr 3, 2024
14 checks passed
@SajidAlamQB SajidAlamQB deleted the extend-preview-mechanism branch April 3, 2024 14:22
tgoelles pushed a commit to tgoelles/kedro-plugins that referenced this pull request Jun 6, 2024
* Extend preview to Parquet

* Update sql_dataset.py

* Update sql_dataset.py

* update preview method for parquetdataset

* Update sql_dataset.py

* extend preview to JSONDataset

* add json preview

* add preview for pickledataset

* Update json_dataset.py

* lint

* add tests for parquet and json

* lint

* rem pickle fix docstring

* Fix parquet test

* fix pandas.json tests

* add coverage for sqldataset

* lint

* coverage for sanitisation of sql

* changes based on review

* use pyarrow for parquet preview

* align jsondataset with spike

* Update json_dataset.py

* Update json_dataset.py

* pass lines=true and nrows

* update docstring

* Update test_json_dataset.py

* revert change

* use sqlalchemy instead of query

* fix sql tests

Signed-off-by: tgoelles <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants