-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally disable validate_decimal_precision
check in DecimalBuilder.append_value
for interop test
#1767
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1767 +/- ##
==========================================
+ Coverage 83.32% 83.49% +0.17%
==========================================
Files 196 196
Lines 55961 55954 -7
==========================================
+ Hits 46627 46719 +92
+ Misses 9334 9235 -99
Continue to review full report at Codecov.
|
I'm not sure how I feel about this, the promise of the builders is they won't let you construct an ArrayData that would fail validation, and can therefore elide this validation. I think this change violates that? Tbh this feels like a bug in the C++ implementation... Perhaps we should raise a ticket to at the very least get context? |
In C++, the similar check is done in validating ArrayData, not during appending values into the builder. I'm not sure which one is more correct under Arrow's context. I'm trying to remove the check and (not committed yet) put the precision check into the validation of ArrayData (this is what C++ Arrow does). In one way, as we don't do full validation when finishing the builder, it seems this check is necessary if it is the promise of the builder to always build a valid ArrayData. Then the check is needed and it seems C++ needs to add the check there. I will open a ticket there and try to get some feedbacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting Rust to follow the C++ implementation seems like a good idea to me. The ticket does a good job of explaining this.
cc @liukun4515 who originally wrote these checks, I think
@@ -1497,38 +1497,6 @@ mod tests { | |||
assert_eq!(16, decimal_array.value_length()); | |||
} | |||
|
|||
#[test] | |||
fn test_decimal_append_error_value() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend we change this test to show that it is ok to store these values in a decimal rather than removing it completely (aka change the test to validate there is no error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the test. As I move the precision check to ArrayData full validation, I add another test for that too.
I think this is the right thing to do -- thank you @viirya |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after @alamb's suggestion
Thanks @alamb @nevi-me @tustvold. I've opened a ticket https://issues.apache.org/jira/browse/ARROW-16696 to hear some feedback there. |
b179054
to
3a221ce
Compare
@@ -999,6 +999,27 @@ impl ArrayData { | |||
|
|||
pub fn validate_dictionary_offset(&self) -> Result<()> { | |||
match &self.data_type { | |||
DataType::Decimal(p, _) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ ArrayData full validation performs the precision check for decimal type. I think this is necessary to add even we don't remove validate_decimal_precision
from DecimalBuilder.append_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this code is necessary in validate
👍
@@ -1486,7 +1486,7 @@ mod tests { | |||
192, 219, 180, 17, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 36, 75, 238, 253, | |||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, | |||
]; | |||
let array_data = ArrayData::builder(DataType::Decimal(23, 6)) | |||
let array_data = ArrayData::builder(DataType::Decimal(38, 6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught this invalid decimal value by decimal check in full validation. Increasing precision to pass it.
@@ -1130,6 +1131,7 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
#[cfg(not(feature = "force_validate"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests use the same 0.14.1 decimal file too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @viirya
After thinking about it for a while, I think:
- Adding validation checks to ArrayData::validate for decimal is ❤️
- I am a little worried about removing the validation from DecimalBuilder -- I think one of the philosophies we are shooting for is if you use
safe
APIs you will create valid Arrow arrays.
What would you think about making validation optional for the builder?
Something like:
struct DecimalBuilder {
...
/// Should i128 values be validated for compatibility with scale and precision?
/// defaults to true
value_validation: true
}
impl DecomalBuilder {
...
/// Disable validation
pub unsafe disable_value_validation(mut self) -> Self {
self.validate_values = false;
}
#[inline]
pub fn append_value(&mut self, value: i128) -> Result<()> {
let value = if sef.validate_values {
validate_decimal_precision(value, self.precision)?
} else {
value
}
}
...
}
I think this would avoid having to add #[cfg(not(feature = "force_validate"))]
and it would allow the interop test to bypass value validation only when needed?
@@ -999,6 +999,27 @@ impl ArrayData { | |||
|
|||
pub fn validate_dictionary_offset(&self) -> Result<()> { | |||
match &self.data_type { | |||
DataType::Decimal(p, _) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this code is necessary in validate
👍
arrow/src/array/data.rs
Outdated
}; | ||
let as_array = raw_val.try_into(); | ||
match as_array { | ||
Ok(v) if raw_val.len() == 16 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason this code seems overly complicated -- I wonder if you could call i128::from_le_bytes directly on a slice like
for pos in 0..values_buffer.len() {
let v = value_buffer.data[pos..pos+16];
let value = i128::from_le_bytes(v)
}
or something 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
is private. 😞
I simplified it as:
for pos in 0..values_buffer.len() {
let raw_val = unsafe {
std::slice::from_raw_parts(
values_buffer.as_ptr().add(pos),
16_usize,
)
};
let value = i128::from_le_bytes(raw_val.try_into().unwrap());
validate_decimal_precision(value, *p)?;
}
Hmm, seems at C++ implementation, the builder is unsafe API, in Rust parlance. This is based on the reply I got in the JIRA ticket. This seems the difference between Rust and C++ implementations. |
Generally it sounds good to me. But in the test, we are not using the builder directly. The builder is called by more higher API when reading JSON/Arrow files into Arrays. So seems we need more than adding an API like Let me try to play it around and see if an optional validation of the builder is possible to solve the interop test issue. |
Hmm, I think it is not directly related. C++ also does this check but it does in full validation, not in the builder. Actually C++ Arrow avoids the full validation too in the interop to avoid similar issue. |
Oh, BTW, adding These tests build ArrayData with incompatible decimal values. As this adds the check when full validation, these tests will fail during that. |
I'm wrong. Fortunately, for interop test, we directly invoke DecimalBuilder to read JSON file to Array. So adding |
validate_decimal_precision
check in DecimalBuilder.append_value
validate_decimal_precision
check in DecimalBuilder.append_value
for interop test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya -- this looks wonderful. Thank you for sticking with it
|
||
#[test] | ||
#[cfg(not(feature = "force_validate"))] | ||
fn test_decimal_full_validation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
Closes #1766.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?