Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds partial Delta Lake support #328

Draft
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

erp12
Copy link
Collaborator

@erp12 erp12 commented Apr 22, 2021

Adds support for Delta Lake storage format. This PR is best reviewed 1 commit at a time.

Currently, some core functionality of Delta is disabled due to a known incompatibility between Spark 3.1 and Delta 0.7+. Once these issues are addressed in the next Delta release, the commented out code in this PR should be uncommented. All commented out code has been tested on Spark 3.0.2 + Delta 0.8.

I chose to implement the Delta API as using with-dynamic-import, but I am not confident in this decision and would love feedback. My editor fails to recognize any symbol defined within with-dynamic-import which hinders navigation and refactoring.

Some things I would like to do before calling this PR done:

@dakra
Copy link

dakra commented Aug 9, 2021

@erp12 Can you update the code for delta 1.0.0.

I use this current PR sporadically for ad-hoc analysis and works well. Thanks for it :)

@dakra
Copy link

dakra commented Dec 8, 2021

Just want to mention that there is now already delta 1.1 which supports Spark 3.2.

@dakra
Copy link

dakra commented Oct 28, 2022

@erp12 @anthony-khong For delta there is now version 2.1.1
which supports Spark version 3.3

It's obviously difficult do keep everything up-to-date and
compatible with each other. Especially when even minor version
increases apparently break things.
ftw, at least delta and AWS EMR support Spark 3.3 which was
released June.

Are there plans to upgrade the geni spark version to 3.3?
And if that's done, we could consider merging this PR as well.
I'm using it pretty regularly for personal queries/debugging
things and works pretty well.

Thanks,
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants