Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out arrow-ipc #3022

Merged
merged 7 commits into from
Nov 6, 2022
Merged

Split out arrow-ipc #3022

merged 7 commits into from
Nov 6, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Nov 5, 2022

Which issue does this PR close?

Part of #2594

Rationale for this change

This will allow arrow-flight, parquet and potentially more to no longer need to depend on the full arrow crate

What changes are included in this PR?

Splits out the arrow-ipc crate

Are there any user-facing changes?

No breaking changes, there is a slight change in that CompressionCodec now contains all variants even when no features are enabled.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 5, 2022
@@ -22,6 +22,24 @@ use crate::{new_empty_array, Array, ArrayRef, StructArray};
use arrow_schema::{ArrowError, DataType, Field, Schema, SchemaRef};
use std::sync::Arc;

/// Trait for types that can read `RecordBatch`'s.
pub trait RecordBatchReader: Iterator<Item = Result<RecordBatch, ArrowError>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from the arrow crate to allow using it in arrow-ipc and a future arrow-csv

}
}

#[cfg(feature = "lz4")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have separate features for the different compression codecs

@@ -1503,18 +1501,18 @@ mod tests {
],
)
.unwrap();
let file_name = format!("target/debug/testdata/nulls_{}.arrow_file", suffix);
let mut file = tempfile::tempfile().unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was failing about files not existing, I suspect some issue with moving to a different crate. Fortunately switching to use tempfile is cleaner and avoids this issue

use std::io::Seek;

#[test]
fn read_union_017() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test needed to be moved as test_util is still part of the top-level crate, this makes more sense as an integration test anyway so I think this is fine

@@ -1626,45 +1610,6 @@ mod tests {
assert!(dict_tracker.written.contains_key(&2));
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to an integration test

use crate::ipc::CompressionType;

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum CompressionCodec {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use a stub CompressionCodec, and instead generate runtime errors if the necessary features aren't enabled. This is consistent with how we handle this in parquet and how we handle other optional features such as timezone support

@tustvold tustvold requested a review from viirya November 6, 2022 05:48
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Looks like it moves codes around with only necessary changes e.g. module names.

@tustvold tustvold merged commit deb6455 into apache:master Nov 6, 2022
@ursabot
Copy link

ursabot commented Nov 6, 2022

Benchmark runs are scheduled for baseline = 108e7d2 and contender = deb6455. deb6455 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

tustvold added a commit to tustvold/arrow-rs that referenced this pull request Nov 7, 2022
tustvold added a commit that referenced this pull request Nov 7, 2022
* Move reader_parser to arrow-cast (#3022)

* Format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants