Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyO3 bridge for pyarrow interoperability / fix arrow integration test #691

Merged
merged 8 commits into from
Sep 1, 2021

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Aug 13, 2021

Which issue does this PR close?

Closes #.

Rationale for this change

Motivation comes from apache/datafusion#856 (comment)

PyO3 provides conversion traits between rust and python types. Using the arrow-rs types in the datafusion python bindings required a lot of boilerplate though we could just simply annotate the right type in the function signature and let PyO3 to do the majority of the work.

The error handling should be improved.

What changes are included in this PR?

Are there any user-facing changes?

}
}

impl PyArrowConvert for ArrayData {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the conversion traits either on ArrayRef or T: Array is not possible, so I sticked with ArrayData but implemented the PyArrowConvert trait on the array types and ArrayRef for convenience.

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2021

Codecov Report

Merging #691 (c2e4be0) into master (fa5acd9) will decrease coverage by 0.04%.
The diff coverage is 24.36%.

❗ Current head c2e4be0 differs from pull request most recent head a656da2. Consider uploading reports for the commit a656da2 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #691      +/-   ##
==========================================
- Coverage   82.43%   82.39%   -0.05%     
==========================================
  Files         168      168              
  Lines       47340    47396      +56     
==========================================
+ Hits        39027    39050      +23     
- Misses       8313     8346      +33     
Impacted Files Coverage Δ
arrow/src/array/array.rs 81.23% <0.00%> (-0.51%) ⬇️
arrow/src/pyarrow.rs 0.00% <0.00%> (ø)
arrow/src/array/array_string.rs 97.88% <100.00%> (+0.06%) ⬆️
arrow/src/array/builder.rs 86.13% <100.00%> (+0.15%) ⬆️
arrow/src/ffi.rs 86.54% <0.00%> (-0.27%) ⬇️
parquet/src/encodings/encoding.rs 94.48% <0.00%> (-0.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa5acd9...a656da2. Read the comment docs.

@kszucs kszucs changed the title PyO3 bridge for pyarrow interoperability [WIP] PyO3 bridge for pyarrow interoperability Aug 16, 2021
@kszucs kszucs requested a review from jorgecarleitao August 16, 2021 16:02

impl<'a> IntoPy<PyObject> for $typ {
fn into_py(self, py: Python) -> PyObject {
self.to_pyarrow(py).unwrap()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to unwrap here because IntoPy is infallible, see PyO3/pyo3#1813


pub trait PyArrowConvert: Sized {
fn from_pyarrow(value: &PyAny) -> PyResult<Self>;
fn to_pyarrow(&self, py: Python) -> PyResult<PyObject>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should split it into two conversion traits: TryFromPyArrow and TryIntoPyArrow ?

@kszucs kszucs requested review from cpcloud and alamb August 18, 2021 15:03
@alamb
Copy link
Contributor

alamb commented Aug 19, 2021

I feel I could review the Rust part of this code for basic hygiene (and what I looked at looks good) but I don't understand the nuances with pyo3 or pyarrow so I am not sure I can give this code a thorough review

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @kszucs !

Note that the integration tests are currently not running, see #690. Maybe we should fix them first?

arrow/Cargo.toml Outdated Show resolved Hide resolved
@kszucs
Copy link
Member Author

kszucs commented Aug 31, 2021

@jorgecarleitao addressed the review comments and fixed the python integration test.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Thanks @kszucs !

@kszucs
Copy link
Member Author

kszucs commented Sep 1, 2021

Thanks Jorge! Merging it then.

@kszucs kszucs merged commit 34df13a into apache:master Sep 1, 2021
@@ -50,6 +50,8 @@ chrono = "0.4"
flatbuffers = { version = "=2.0.0", optional = true }
hex = "0.4"
comfy-table = { version = "4.0", optional = true, default-features = false }
prettytable-rs = { version = "0.8.0", optional = true }
Copy link
Contributor

@roee88 roee88 Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kszucs Was this (re)added by mistake or is it intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it? prettytable-rs should have been replaced by prettyprint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake, sorry about that! Created a PR to remove it #737

@alamb alamb changed the title PyO3 bridge for pyarrow interoperability PyO3 bridge for pyarrow interoperability / fix arrow integration test Sep 9, 2021
@@ -138,7 +138,7 @@ pub(super) fn list_equal<T: OffsetSizeTrait>(
child_rhs_nulls.as_ref(),
lhs_offsets[lhs_start].to_usize().unwrap(),
rhs_offsets[rhs_start].to_usize().unwrap(),
(lhs_offsets[len] - lhs_offsets[lhs_start])
(lhs_offsets[lhs_start + len] - lhs_offsets[lhs_start])
Copy link
Contributor

@alamb alamb Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this line fixed #690 from what I can tell

Edit: Actually, it fixes the integration test itself, #690 was technically that the integration test was broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code/bug in question appears to have been introduced in apache/arrow#8541 (and thus is also in arrow 5.0, and thus does not need to be backported directly into arrow 5.x) -- I was worried I had backported a bug into arrow 5.3 which needed a fix backported anyways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants