Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh Python usage documentation #539

Merged
merged 5 commits into from
Jan 10, 2022

Conversation

wjones127
Copy link
Collaborator

@wjones127 wjones127 commented Jan 9, 2022

Description

I rewrote most of the Python usage documentation.

  • Reflowed the usage documentation to start with loading, then look at log introspection, and finally querying tables. At the end I created a placeholder for writing, noting that it's not yet supported.
  • Added info about supported backends and data catalogs.
  • Added more examples and guidance on how to query Delta tables. Note: I'd like to include DataFusion here, but from what I've seen that's not yet possible, right?

By rewriting this I broke any links to sections within the page, but not to the page itself.

Related Issue(s)

None

Documentation

 * Reflowed the usage documentation to start with loading, then look at
   log introspection, and finally querying tables.
 * Added info about supported backends and data catalogs.
 * Added more examples and guidance on how to query Delta tables.
@matthewmturner
Copy link
Contributor

hi @wjones127 i was interested in this from a datafusion angle as well and added example to the rust documentation for querying here #519.

i had some issues with this though and had to point to specific git commits to get it to work. i havent tried since then though so not sure if the issue will still persist. im less familiar with the python bindings, so take this with a grain of salt, but i believe the general structures are in place to replicate how we query in rust with the datafusion python bindings.

on a separate but somewhat related note, i am working on adding s3 support to datafusion (https://github.com/datafusion-contrib/datafusion-objectstore-s3). my loose understanding of delta lake is that its often cloud based, so getting s3 support added to datafusion should make querying it easier. i had actually previously tried querying deltalake on s3 and that was how i found out that datafusion didnt support it which started me on that path.

hope this helps!

@wjones127 wjones127 marked this pull request as ready for review January 10, 2022 00:43
houqp
houqp previously approved these changes Jan 10, 2022
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the detailed write up @wjones127 !

@houqp houqp requested a review from fvaleye January 10, 2022 02:47
@houqp
Copy link
Member

houqp commented Jan 10, 2022

FYI @fvaleye @zijie0

fvaleye
fvaleye previously approved these changes Jan 10, 2022
Copy link
Collaborator

@fvaleye fvaleye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wjones127

LGTM! There is a minor error in the CI before merging the PR.

Alternatively, if you have a data catalog you can load it by reference to a
database and table name. Currently only AWS Glue is supported.

.. TODO: auth to data catalog?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI #522 explains the requirements of the Data Catalog integration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@wjones127 wjones127 dismissed stale reviews from fvaleye and houqp via fda4884 January 10, 2022 04:52
@fvaleye fvaleye merged commit 25ef5d9 into delta-io:main Jan 10, 2022
@wjones127 wjones127 deleted the python-docs-enhancements branch January 10, 2022 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants