Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow metadata for write_deltalake #587

Merged
merged 4 commits into from
Apr 17, 2022
Merged

Allow metadata for write_deltalake #587

merged 4 commits into from
Apr 17, 2022

Conversation

PadenZach
Copy link
Contributor

Description

  • Add Name, Description, and Configuration arguments to write_new_datalake in python and python-rust bindings
  • Add round trip metadata writing test
  • Suppress write clippy "too many arguments" (Default max seems to be 7, function has 8).

Related Issue(s)

Should resolve #576

Documentation

@houqp houqp requested a review from wjones127 April 16, 2022 22:48
houqp
houqp previously approved these changes Apr 16, 2022
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will leave it to @fvaleye and @wjones127 to do the final review and merge.

@houqp
Copy link
Member

houqp commented Apr 16, 2022

looks like python linter is also complaining about too many arguments.

@PadenZach
Copy link
Contributor Author

Not sure why the lambda_checkpoint_build failed, doesn't seem to be related to changes made here. LMK if I need to change anything else here, but doesn't seem to be required so I'll be leaving it as-is for now.

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Code changes look pretty clean, just needs a few updates to the documentation. 👍

Also this made me realize we'll also probably want AWS Glue and Hive metastore integration; otherwise giving the table a name isn't that useful 😕 . But I think it's worth doing that in a different PR.

python/deltalake/writer.py Show resolved Hide resolved
@@ -44,6 +44,9 @@ def write_deltalake(
partition_by: Optional[List[str]] = None,
filesystem: Optional[pa_fs.FileSystem] = None,
mode: Literal["error", "append", "overwrite", "ignore"] = "error",
name: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a note in the docstring that this function does not register this table in your data catalog? Users will have to either use the filesystem path to the table or manually register this table in the catalog. I'll create a follow up issue for us to handle that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a brief note. Let me know if more explanation re: manual add is needed.

@wjones127
Copy link
Collaborator

And don't worry about the checkpoint build, we've seen it be flaky recently and it doesn't seem like it could be related to these changes.

Copy link
Collaborator

@fvaleye fvaleye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @PadenZach

LGTM 👍

@wjones127
Copy link
Collaborator

Thank you @PadenZach!

@wjones127 wjones127 merged commit 54da787 into delta-io:main Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python: allow metadata for write_deltalake
4 participants