Fixed error in passing sliced arrays via FFI #564

jorgecarleitao · 2021-11-01T18:05:53Z

This PR closes #538 by re-aligning buffers prior to exporting to the C data interface.

Background

Given two arrays, a and b, on which a is sliced and b has no validity, the computation of equal(a, b) is optimally done by equating the values (bitmap) and cloning the validity of a.

The c data interface allows arrays to be sliced at zero cost via a constant "offset", denoting by how much an array is offsetted by, but it does not allow individual bitmaps to be sliced. Consequently, implementations of the arrow format must "de-offset" the bitmaps to align them.

In the example above, this means that the validity of a cannot be combined with the resulting values bitmap because it has a different offset. Instead, a new validity must be computed that has a zero offset (and thus aligned with the offset of the resulting values bitmap).

This crate currently supports the former compute model because it tracks individual bitmap and buffer offsets. However, doing so forbids us from passing the bitmaps as is via the c data interface, as the c data interface only accepts a single offset per array (and not per buffer).

Solution

This PR de-offsets the bitmaps when they are to be passed to the C data interface, thereby delaying when to de-offset happens: at the c-data interface boundary as opposed to at every compute operation.

Note that this is a relatively small set of use-cases because most of the times the arrays are not sliced.

Alternatives

The alternative is to fallback to what other implementations do: de-offset the buffers on every operation. However, this resulted in a 15% performance drop in some of the kernels and a much more complex implementation of kernels since they all have to remember to de-offset bitmaps.

codecov · 2021-11-01T19:25:57Z

Codecov Report

Merging #564 (e51c918) into main (5fc843d) will decrease coverage by 0.11%.
The diff coverage is 43.09%.

@@            Coverage Diff             @@
##             main     #564      +/-   ##
==========================================
- Coverage   78.92%   78.81%   -0.12%     
==========================================
  Files         382      395      +13     
  Lines       24276    25075     +799     
==========================================
+ Hits        19160    19763     +603     
- Misses       5116     5312     +196

Impacted Files	Coverage Δ
src/array/binary/mod.rs	`81.01% <ø> (+2.26%)`	⬆️
src/array/boolean/ffi.rs	`0.00% <0.00%> (ø)`
src/array/boolean/mod.rs	`69.38% <ø> (-0.62%)`	⬇️
src/array/dictionary/mod.rs	`56.89% <ø> (+10.22%)`	⬆️
src/array/fixed_size_binary/ffi.rs	`0.00% <0.00%> (ø)`
src/array/fixed_size_binary/mod.rs	`53.73% <ø> (+3.73%)`	⬆️
src/array/fixed_size_list/ffi.rs	`0.00% <0.00%> (ø)`
src/array/fixed_size_list/mod.rs	`65.00% <ø> (+5.29%)`	⬆️
src/array/map/ffi.rs	`0.00% <0.00%> (ø)`
src/array/mod.rs	`58.77% <ø> (+2.63%)`	⬆️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fc843d...e51c918. Read the comment docs.

jorgecarleitao added 4 commits November 1, 2021 06:11

Fixing ffi

c48b594

Testing.

ff930e4

Save

d420bda

Fixed

e51c918

jorgecarleitao added the bug Something isn't working label Nov 1, 2021

jorgecarleitao merged commit e4c37f9 into main Nov 1, 2021

jorgecarleitao deleted the fix_ffi2 branch November 1, 2021 19:42

jorgecarleitao changed the title ~~Fix FFI of sliced arrays~~ Fixed error in passing sliced arrays via FFI Nov 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed error in passing sliced arrays via FFI #564

Fixed error in passing sliced arrays via FFI #564

jorgecarleitao commented Nov 1, 2021

codecov bot commented Nov 1, 2021

Fixed error in passing sliced arrays via FFI #564

Fixed error in passing sliced arrays via FFI #564

Conversation

jorgecarleitao commented Nov 1, 2021

Background

Solution

Alternatives

codecov bot commented Nov 1, 2021

Codecov Report