Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support for IPC 2.0 #168

Merged
merged 2 commits into from
Jun 26, 2021
Merged

Added support for IPC 2.0 #168

merged 2 commits into from
Jun 26, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Jun 24, 2021

Closes #163

This adds support for IPC format 2.0, which includes compression. The implementation is not the fastest, because it currently requires an extra allocation per buffer, but the roundtrip over the golden files passes, so we can live with it for now. ^_^

Note that this adds a new feature io_ipc_compression, to activate the corresponding dependencies. These overlap with io_parquet, but it is still good to be aware of.

@codecov
Copy link

codecov bot commented Jun 24, 2021

Codecov Report

Merging #168 (b9fc9cf) into main (7f2c3c1) will increase coverage by 0.09%.
The diff coverage is 94.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #168      +/-   ##
==========================================
+ Coverage   75.63%   75.73%   +0.09%     
==========================================
  Files         212      213       +1     
  Lines       18509    18603      +94     
==========================================
+ Hits        14000    14089      +89     
- Misses       4509     4514       +5     
Impacted Files Coverage Δ
src/io/ipc/read/deserialize.rs 95.88% <93.38%> (-0.35%) ⬇️
src/io/ipc/compression.rs 100.00% <100.00%> (ø)
src/io/ipc/read/common.rs 83.72% <100.00%> (+0.38%) ⬆️
src/io/ipc/read/reader.rs 85.83% <100.00%> (+0.11%) ⬆️
src/io/ipc/read/stream.rs 86.13% <100.00%> (+0.28%) ⬆️
src/compute/nullif.rs 0.00% <0.00%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f2c3c1...b9fc9cf. Read the comment docs.

buffer.as_mut_ptr() as *mut u8,
length * std::mem::size_of::<T>(),
);
compression::decompress_lz4(&slice[8..], out_slice)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a writer tries to compress a buffer, and then it finds that the compressed buffer ratio isn't good enough, it can still write uncompressed buffers. In that case, we have to check if the first 8 bytes = -1i64. It's some of the details that I struggled with

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am following the c++ implementation here, but I can't find that case there :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also can't find it now, but I'll look in more detail; as I recall a review comment being raised in the Java implementation for this detal.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be fine for now to use the same as the c++ implementation does, and improve that behavior over a future PR? None of this results in UB; the worse case is a panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it definitely would :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does arrow2 support codecs when reading IPC files?
2 participants