Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: apply upstream commit for "enable parallelized writes with ParquetSink" #13

Closed
wants to merge 1 commit into from

Conversation

appletreeisyellow
Copy link

@appletreeisyellow appletreeisyellow commented Apr 24, 2024

⚠️ This will not be merged. ⚠️

This PR is based on apache@671cef8 which was merged to DataFusion on April 13, 2024.

All the patches in the previous temporary dependency branch (WIP(iox-10577): patched df upgrade 202-04-14) are caught up at the end of April 13 commit (apache@671cef8), so creating a new branch for @wiedld's "enable parallelized writes with ParquetSink" work, which an upstream PR (apache#10224 / apache@9c8873a) is cherry-picked on this branch

  1. Cherry picked Allow adding user defined metadata to ParquetSink apache/datafusion#10224 / apache@9c8873a
commit e5d618f89ca5300748a164582265f05d4072a097
Author: wiedld <[email protected]>
Date:   Fri Apr 26 03:42:16 2024 -0700

    Allow adding user defined metadata to `ParquetSink` (#10224)

@appletreeisyellow
Copy link
Author

appletreeisyellow commented Apr 24, 2024

This is what I did on this branch:

  1. On influxdata/arrow-datafusion repo, start on the main branch
git checkout main
  1. Check out a new branch which based on the April 13 commit apache@671cef8 to catch up with all the fixes
git co -b chunchun/denise-df-11 671cef85c550969ab2c86d644968a048cb181c0c
  1. Cherry pick all the commits from WIP: changes to upstream DF, in order to enable parallelized writes with ParquetSink #11
    upstream Allow adding user defined metadata to ParquetSink apache/datafusion#10224
git cherry-pick 9c8873af12826e47f5743991859790df7a3b6400

* chore: make explicit what ParquetWriterOptions are created from a subset of TableParquetOptions

* refactor: restore the ability to add kv metadata into the generated file sink

* test: demomnstrate API contract for metadata TableParquetOptions

* chore: update code docs

* fix: parse on proper delimiter, and improve tests

* fix: enable any character in the metadata string value, by having any key parsing be a part of the format.metadata::key
@wiedld
Copy link
Collaborator

wiedld commented Apr 26, 2024

Current main is up to this April 26th commit: c9bd291

The current main includes the merged upstream PR (for the parquet sink metadata change). So no cherry-pick is needed! 🎉

Screen Shot 2024-04-26 at 11 24 42 AM

@appletreeisyellow appletreeisyellow changed the title WIP: apply patched version of "enable parallelized writes with ParquetSink" WIP: apply upstream commit for "enable parallelized writes with ParquetSink" Apr 26, 2024
@appletreeisyellow
Copy link
Author

appletreeisyellow commented Apr 30, 2024

Closing as the update was done without issue

@appletreeisyellow appletreeisyellow deleted the chunchun/denise-df-11 branch April 30, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants