-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The constructor DataType::Decimal(usize, usize)
is unvalidated
#2362
Comments
I want to go through the code from c++ or java version, and comment this issue. |
Because this changes will cause some api changes. |
C++ Decimal type has such check, e.g. Decimal128Type: https://github.com/apache/arrow/blob/6cc37cf2d1ba72c46b64fbc7ac499bd0d7296d20/cpp/src/arrow/type.cc#L870-L874 But I am not sure if we need to have the check. For unsafe paths, we run value validation which includes such check. For safe paths, I think it assumes the inputs are all valid (including decimal type). You can mess the array with either wrong type or input in safe paths. It appears to me that's a bit verbose when constructing a decimal type with a struct or matching a decimal type. So I personally feel that it has a little advantage but also has some more disadvantage. I don't feel strong against it, though. |
Here the problem is that we may do redundant validation.
That's the concern. |
For unsafe paths, I think we always do validation, no? Not sure if we can remove validation on unsafe paths. That is why they are "unsafe". I think the point of removing redundant validation is on the paths we think it is safe to do.
I think that you can also do similar thing with safe paths, not only for this decimal type constructor. As we don't validate (not just for decimal arrays) values in safe paths, you can put some unreasonable input there and cause unexpected result. As it is safe paths, I think that the responsibility is on users hands. |
Sorry, I can't 100% agree with you. For safe (soundness) path, the responsibility in on the software side. Software should make sure all things it allows users to do are valid, otherwise, it is unsound. For unsafe (completeness) path, the responsibility is on users hands, however. That is why we often see code like this in our crate: // Soundness: users should make sure ...
unsafe {fn xxx_unchecked()} |
I guess @tustvold has more concern on the soundness of the code. |
Ah, yea, that's right. We did validation on safe paths, no unsafe paths...I thought "safe" and "unsafe" reversely. Please swap "safe" and "unsafe" in my previous comments. (Updated)
|
Ha, we reach a consensus! 😁 |
I agree that the current decimal types constructors have the mentioned issue. It can be used to construct invalid decimal types. My point above is, we have certain checks for it during performing value validation, in the places we want to make sure the data is valid (safe paths, 😄 ). So it will be caught nowadays. For the unsafe paths, we have many ways to mess the inputs, not just with the type constructor. So I don't think it is an issue there. Users should know what they're doing with such APIs. The only holes, I guess, are where we don't do validation so invalid decimal type is not caught. But we also don't want to run real decimal value validation if we think it is computation heavy. Alternatively, we can do only precision check on such places, instead of full value validation. |
BTW, as mentioned above, I'm okay for the proposed change if most of you think it is good to go. |
Ok. I will file a PR and then we can have more discussion. |
I agree with @viirya I think perhaps we were using different definitions of "unsound" . I was erroneously thinking it meant "unsafe" in the classic rust sense that doing so would allow reading/writing uninitialized memory, reading/writing out of bounds of the allocated memory, or data races. It doesn't mean that user's can't provide bad inputs that result in I think it is a good idea to make the arrow library be better about error checking in general as long as it isn't too cumbersome to use. Thank you @HaoYang670 for starting this conversation |
DataType::Decimal(usize, usize)
is unsound.DataType::Decimal(usize, usize)
is unvalidated
Describe the bug
Also related to the Decimal. Currently, the constructor
DataType::Decimal128/256(precision: usize, scale: usize)
is unsound, because users can put any value in it. Also this leads to 2 kinds of bug in the code:1. forget to check the value of precision and scale
2. redundant checking.
Expected behavior
I’d like to eliminate this unsoundness by using stronger type, which means create a new type for precision and scale:
Additional context
The text was updated successfully, but these errors were encountered: