-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet output using parquet2 via Rust #1240
base: main
Are you sure you want to change the base?
Conversation
376b54c
to
ec5cf3d
Compare
a3657c1
to
2f82a0b
Compare
This is ready for review. What's been done so far:
The code size when compiled is about 3.5 MiB, which can probably be reduced further with future work. (Most of the .text size is compression libraries, and brotli is one of the larger ones.) This compares favorably to 10-30 MiB for C++ Arrow (depending on how you build it and how unmaintainable you're willing to make things for the sake of shrinking the build). The build time for the whole thing is about 5 seconds. Again this compares favorably to Arrow (in Pedro, that build takes about 30-60 seconds.) As I mentioned to Pete, this PR is quite large, but I think anything smaller wouldn't be reviewable, because it couldn't work end-to-end. I'm happy to jump on a video call and walk people through it. There is also quite detailed commit history, if you'd like to step through it. (Though a word of warning, the first draft of the Rust code got rewritten.) Future work:
|
6e8136b
to
8c288c2
Compare
8c288c2
to
608a650
Compare
Sorry for the timeline saying I added 19 commits, I just rebased the branch onto main and GH got confused. (I also added a few more comments.) |
Still looking into internal build stuff with this. |
Apologies for the long delay here. There were some internal things that needed to happen before we could resume looking at this. |
load("@build_bazel_rules_apple//apple:macos.bzl", "macos_unit_test") | ||
load("@build_bazel_rules_apple//apple:resources.bzl", "apple_resource_group") | ||
load("@rules_cc//cc:defs.bzl", "cc_library") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit but it complicates the build internally: this can be removed and native.cc_library
used below.
@the80srobot Can you make this a static library as it simplifies the internal builds dramatically? |
Moving back to draft for now. This PR has become a bit stale and will require some effort to get back to being merge-ready. |
This patch series adds support for building the parquet2 Rust crate and using it, from C++, to write a parquet file.