Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python FFI bridge for Schema, Field and DataType #439

Merged
merged 12 commits into from
Jul 2, 2021

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Jun 9, 2021

TODOs:

  • cover the updated with rust tests
  • cover the updates with python tests
  • create follow-up tickets for supporting dictionaries and custom metadata

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@kszucs
Copy link
Member Author

kszucs commented Jun 9, 2021

@jorgecarleitao what is the rational behind wrapping (FFI_ArrowArray, FFI_ArrowSchema) with ArrowArray and the children with ArrowArrayChild ?

@codecov-commenter
Copy link

codecov-commenter commented Jun 9, 2021

Codecov Report

Merging #439 (7acd3a5) into master (ef88876) will decrease coverage by 0.05%.
The diff coverage is 63.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #439      +/-   ##
==========================================
- Coverage   82.76%   82.71%   -0.06%     
==========================================
  Files         165      166       +1     
  Lines       45724    45853     +129     
==========================================
+ Hits        37844    37927      +83     
- Misses       7880     7926      +46     
Impacted Files Coverage Δ
arrow-pyarrow-integration-testing/src/lib.rs 0.00% <0.00%> (ø)
arrow/src/datatypes/mod.rs 100.00% <ø> (ø)
arrow/src/datatypes/ffi.rs 72.86% <72.86%> (ø)
arrow/src/ffi.rs 87.07% <92.50%> (+8.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef88876...7acd3a5. Read the comment docs.

@jorgecarleitao
Copy link
Member

At the time I had limited knowledge of what I was doing. I think that we could refactor this so that the arrays are shared and the datatypes are copied, since we do not really need to share metadata.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks a lot, @kszucs !

arrow/src/ffi.rs Outdated Show resolved Hide resolved
@kszucs
Copy link
Member Author

kszucs commented Jun 14, 2021

Thanks for the review, I'm going to add more tests.

@kszucs kszucs changed the title [WIP] FFI bridge for Schema, Field and DataType Python FFI bridge for Schema, Field and DataType Jun 17, 2021
arrow/src/ffi.rs Outdated Show resolved Hide resolved
@kszucs
Copy link
Member Author

kszucs commented Jun 17, 2021

While the conversions do copy between the C schema and rust DataType/Field/Schema structs cc @pitrou to confirm that the lifetimes are properly handled. This PR shouldn't affect how the arrays get exchanged.

@kszucs
Copy link
Member Author

kszucs commented Jun 17, 2021

@jorgecarleitao what do you think about implementing arrow <-> pyarrow interoperability directly in the arrow-rs repository and use that from the datafusion bindings?

@kszucs
Copy link
Member Author

kszucs commented Jun 17, 2021

The clippy errors should be unrelated.

arrow/src/datatypes/ffi.rs Show resolved Hide resolved
arrow/src/ffi.rs Outdated Show resolved Hide resolved
arrow/src/ffi.rs Show resolved Hide resolved
@alippai
Copy link
Contributor

alippai commented Jun 19, 2021

@jorgecarleitao what do you think about implementing arrow <-> pyarrow interoperability directly in the arrow-rs repository and use that from the datafusion bindings?

@ritchie46 you might be interested in this, as I saw https://github.com/pola-rs/polars/tree/master/py-polars/src/arrow_interop

@jorgecarleitao
Copy link
Member

@jorgecarleitao what do you think about implementing arrow <-> pyarrow interoperability directly in the arrow-rs repository and use that from the datafusion bindings?

I think that it is a good idea.

I am not sure how to execute on it though, as I never built a Rust library out of pyo3. E.g. DataFusion requires some conversions that are currently called from Rust (not Python). Is it enough for it to depend on "arrow-rs-python"?

So, I would say let's try it and see how this goes in terms of packaging and dependency management.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 21, 2021
@kszucs kszucs added the python label Jun 21, 2021
@kszucs
Copy link
Member Author

kszucs commented Jun 21, 2021

@jorgecarleitao created the relevant follow-up tickets and cleaned up the PR.
@pitrou didn't notice any outstanding problems, so I think this should be ready to go now.

@kszucs
Copy link
Member Author

kszucs commented Jun 23, 2021

@jorgecarleitao @alamb resolved conflicts with 8672274

@kszucs
Copy link
Member Author

kszucs commented Jul 1, 2021

@jorgecarleitao may I go ahead and merge it?

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this slipped under my radar :(

I went through this carefully and it is an amazing improvement imo. Really neat stuff,
@kszucs . Thank you very much for this.

I do not have comments; it is ready imo

@alamb
Copy link
Contributor

alamb commented Jul 1, 2021

@kszucs I took the liberty of rebasing this PR and fixing clippy -- it should be good to go now once the CI passes

@kszucs
Copy link
Member Author

kszucs commented Jul 2, 2021

Thanks everyone! Merging it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants