-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Decimal256
validation based on bytes comparison and add benchmark test
#2360
Speed up Decimal256
validation based on bytes comparison and add benchmark test
#2360
Conversation
arrow/src/datatypes/datatype.rs
Outdated
@@ -479,6 +563,76 @@ pub(crate) fn validate_decimal_precision(value: i128, precision: usize) -> Resul | |||
} | |||
} | |||
|
|||
// duplicate code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy from decimal.rs file
arrow/src/datatypes/datatype.rs
Outdated
Ordering::Equal | ||
} | ||
|
||
pub(crate) fn validate_decimal_precision_with_bytes(lt_value: &[u8], precision: usize) -> Result<i128> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change to Result<()>
I just add benchmark of validation for decimal128array. Got the result below:
From the current code, there is a 10~20% performance improvement. |
arrow/src/array/array_decimal.rs
Outdated
@@ -336,6 +336,31 @@ impl BasicDecimalArray<Decimal128, Decimal128Array> for Decimal128Array { | |||
} | |||
Ok(()) | |||
} | |||
|
|||
fn validate_decimal_with_bytes(&self, precision: usize) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, the performance benefit from this is derived from the additional work being performed by the iterator. In particular BasicDecimal::new
. Perhaps we should fix this, as this will have benefits all over the codebase? For one thing BasicDecimal
could easily take a slice with a known size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think the overhead is from the xxDecimal::new()
and convert to i128
.
Using this pr method, we can improve the performance from reducing above overhead.
In current iter
or index
decimal array, the result of value is Decimal128 and Decimal256.
In many case, we don't need the struct of decimal, and just need the little-endian bytes.
Can we add more method to index
or iter
decimal array, and the result type of value is &[u8]
?
Do you mean this?
@tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think the overhead is from the xxDecimal::new() and convert to i128.
LLVM should be smart enough to elide this, it likely isn't because there are bound checks on the byte array length (which are completely unnecessary) and it isn't smart enough to elide these, which prevents eliding the rest.
It should definitely be possible to make these methods perform the same, it just might require a bit of hand-holding to help LLVM. This would then benefit all call-sites that use the iterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think the overhead is from the xxDecimal::new() and convert to i128.
LLVM should be smart enough to elide this, it likely isn't because there are bound checks on the byte array length (which are completely unnecessary) and it isn't smart enough to elide these, which prevents eliding the rest.
It should definitely be possible to make these methods perform the same, it just might require a bit of hand-holding to help LLVM. This would then benefit all call-sites that use the iterator
Do you have some suggestions or plan for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it to my list of things to do this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it to my list of things to do this week
Do you think this PR is on the right path, If so, I will going on this work.
Refactor the decimal256array validation, and add iter bytes array from the decimal array.
@tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, I would like to take some time at some point in the next couple of days to take a holistic look at what is going on with decimals, and then get back to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should definitely be possible to make these methods perform the same, it just might require a bit of hand-holding to help LLVM. This would then benefit all call-sites that use the iterator
This sounds promising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold I remove the changes for validation for decimal128 and just refactor the logic of validation decimal256.
PTAL
I guess that we can get more improvement for decimal256array type |
Hi @liukun4515 -- I don't fully understand this approach for validating Decimal values though I have not studied it as carefully as I would like to. Most other arrays have two paths:
Since this PR seems to do something else a bit different I need to think about it some more, but I don't think I'll have time until later this week or this weekend to do so (I got caught up in #2369 for longer than I expected and now I am behind in some other areas) It sounds like @tustvold has some thoughts about the "big picture" of Decimal in general so perhaps that will help |
Possibly also related: #2362 |
6ebf1da
to
f03830b
Compare
refactor the validation for decimal256 @viirya
almost 50x faster than before. |
What's weird thing you met? @liukun4515 |
Diff sizes of Decimal128Array have diff benchmark result. I'am working on that to find out the reason, but don't want to block this work forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks like a nice improvement to me -- I realize it doesn't affect but the improvement to validate min/max values of Decimals using a precomputed tables of binary representations rather than &str
seems like a very reasonable performance improvement to me
I realize there is still an outstanding question on #2387 of if we should be doing decimal value validation at all, but I think we can improve the current implementation even as we refine the overall semantics
@tustvold @HaoYang670 @viirya any concern with merging this PR as is?
Decimal256
validation based on bytes comparison and add benchmark test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos.
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
if precision < self.precision { | ||
for v in self.iter().flatten() { | ||
validate_decimal256_precision(&v.to_big_int(), precision)?; | ||
fn validate_decimal_precision(&self, precision: usize) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer this function to be an independent fn
, but not a method of Decimal256Array
,
because
- DecimalArray has its own precision, providing another
precision
to its method is somewhat weird. - This method is just for
Decimal256Array
, but not the generic decimal array. Moving it out theimpl Decimal256Array
can make the code cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer this function to be an independent
fn
, but not a method ofDecimal256Array
, because
If this is method is only for Decimal256Array
, why we should move it out of the impl
? I am confused about this suggestion.
Besides, It's a private method.
If move it out of the impl Decimal256Array
, how to get the data
of the Self
?
- DecimalArray has its own precision, providing another
precision
to its method is somewhat weird.
with_precision_and_scale
also provide the precision
and scale
, is this also weird?
- This method is just for
Decimal256Array
, but not the generic decimal array. Moving it out theimpl Decimal256Array
can make the code cleaner.
I feel confused about this reason. The method is only used to Decimal256Array
, why not treat it as private function?
if precision < self.precision { | ||
for v in self.iter().flatten() { | ||
validate_decimal256_precision(&v.to_big_int(), precision)?; | ||
fn validate_decimal_precision(&self, precision: usize) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check if precision >= self.precision
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
} else { | ||
let offset = current + data.offset(); | ||
current += 1; | ||
let raw_val = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be extracted as a separate method DecimalArray::raw_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value_unchecked
also use it.
Maybe you can extract them with follow-up pr
arrow/src/array/array_decimal.rs
Outdated
let current_end = self.data.len(); | ||
let mut current: usize = 0; | ||
let data = &self.data; | ||
|
||
while current != current_end { | ||
if self.is_null(current) { | ||
current += 1; | ||
continue; | ||
} else { | ||
let offset = current + data.offset(); | ||
current += 1; | ||
let raw_val = unsafe { | ||
let pos = self.value_offset_at(offset); | ||
std::slice::from_raw_parts( | ||
self.raw_value_data_ptr().offset(pos as isize), | ||
Self::VALUE_LENGTH as usize, | ||
) | ||
}; | ||
validate_decimal256_precision_with_lt_bytes(raw_val, precision)?; | ||
} | ||
} | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let current_end = self.data.len(); | |
let mut current: usize = 0; | |
let data = &self.data; | |
while current != current_end { | |
if self.is_null(current) { | |
current += 1; | |
continue; | |
} else { | |
let offset = current + data.offset(); | |
current += 1; | |
let raw_val = unsafe { | |
let pos = self.value_offset_at(offset); | |
std::slice::from_raw_parts( | |
self.raw_value_data_ptr().offset(pos as isize), | |
Self::VALUE_LENGTH as usize, | |
) | |
}; | |
validate_decimal256_precision_with_lt_bytes(raw_val, precision)?; | |
} | |
} | |
Ok(()) | |
(0..self.len()) | |
.filter(|idx| self.is_valid(*idx)) | |
.try_for_each(|idx| { | |
let raw_val = unsafe { | |
let pos = self.value_offset(idx); | |
std::slice::from_raw_parts( | |
self.raw_value_data_ptr().offset(pos as isize), | |
Self::VALUE_LENGTH as usize, | |
) | |
}; | |
validate_decimal256_precision_with_lt_bytes(raw_val, precision) | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not apply the suggestion, because of the performance regression.
The performance of your version:
validate_decimal256_array 20000
time: [393.73 us 402.59 us 412.64 us]
change: [+0.0864% +1.9579% +3.8640%] (p = 0.05 < 0.05)
Change within noise threshold.
My version:
validate_decimal256_array 20000
time: [282.54 us 289.68 us 297.22 us]
change: [-29.976% -27.993% -25.897%] (p = 0.00 < 0.05)
Performance has improved.
I guess the reason is loop twice and create an intermediate Iter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HaoYang670 I think this is an interesting result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please try
(0..self.len())
.try_for_each(|idx| {
if self.is_valid(idx) { ... }
}
?
I am afraid the filter function introduce some overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change make sense.
Will try it and post the benchmark result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no performance regression without the filter
, and I will apply the change to the codebase.
If the changes looks good to you, please approve it. @HaoYang670 |
2b353a3
to
3aab001
Compare
The it failed, but nothing about this pr. |
Benchmark runs are scheduled for baseline = e7dcfbc and contender = b235173. b235173 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Thanks everyone for getting this through. Good team work 👍 |
Which issue does this PR close?
Closes #2320
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?