-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add bench: decimal with byte array and fixed length byte array #2529
add bench: decimal with byte array and fixed length byte array #2529
Conversation
parquet/src/arrow/mod.rs
Outdated
@@ -122,7 +122,7 @@ | |||
experimental!(mod array_reader); | |||
pub mod arrow_reader; | |||
pub mod arrow_writer; | |||
mod buffer; | |||
experimental!(mod buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this, experimental
, I really don't want buffer
exposed externally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after your refactor #2528, I think we can use some public api like read_array_reader
to create the reader in the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After remove #2529 (comment) from your suggestion, we can avoid using the buffer crate in benchmark code
parquet/benches/arrow_reader.rs
Outdated
match data_type { | ||
DataType::Decimal128(precision, scale) => { | ||
// read decimal data from parquet binary physical type | ||
let convert = DecimalByteArrayConvert::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI these functions are being removed... #2528
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need reviews on the PRs it builds on, that's the only thing holding it up being ready for review
parquet/src/data_type.rs
Outdated
@@ -1235,7 +1241,7 @@ impl FromBytes for ByteArray { | |||
} | |||
|
|||
impl FromBytes for FixedLenByteArray { | |||
type Buffer = [u8; 8]; | |||
type Buffer = Vec<u8>; | |||
|
|||
fn from_le_bytes(bs: Self::Buffer) -> Self { | |||
Self(ByteArray::from(bs.to_vec())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is performing an unnecessary clone now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
I think this will serve as a nice validation of #2528 - the performance should be significantly improved |
hope so, I will review your refactor work tomorrow. |
It's better to remove the |
CompkexObjectArrayReader reads to the row format, i.e. separately boxed ByteArray for each row. ByteArrayReader reads to a contiguous arrow buffer. It also has an optimised path for reading non-nested definition levels. It should therefore be significantly faster, not to mention correctly handling nesting See #1082 |
great improvement for the decimal reader using the fixed length byte array after the refactor #2528
|
a038bba
to
862cf28
Compare
@tustvold PTAL |
Benchmark runs are scheduled for baseline = c6e7680 and contender = 81f1f81. 81f1f81 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
BYTE_ARRAY
|
Which issue does this PR close?
part of #2388
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?