-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26566 Optimize encodeNumeric in OrderedBytes #3940
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Is there any test or unit test for verifying the behaviour of before-change and after-change is equal? |
There is a UT named TestOrderedNumeric. The changes passed this UT. |
I feel it is better to give more details, even math theoretics, to make me or other reviewers more clear and understand your changes... |
Have added details in the corresponding jira ticket. |
I think we'd better have some JMH tests to show the performance impact. |
Did a JMH test and attached the result in the jira ticket. |
Found the function could be optimized one step forward. Has update the code in the latest commit and upload the jmh test results in the jira ticket. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
The result seems be very impressive.
|
@Reidddddd After checking the current UT and some configurations. I found that, although the method encodes BigDecimal, but current MAX_PRECISION is really limited (31). It can only fit the domain of Long. I added some test cases beyond this domain. The old implementation will also fail. The new change just follows the old logic. If we do not need to fit some really big decimal, we do not need new UT I think. On the contrary if we want to support them, the function logic should be reconstructed and more large decimal cases should be added in current UTs. What do you think? |
e = 351; | ||
lengthToMoveLeft = 702; | ||
} | ||
abs = abs.movePointLeft(lengthToMoveLeft); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This old 350 limit is problematic, for value > 100^350, the abs will > 1, which will make the following steps have an incorrect encoding for M . And for value < 100^350, this limit is useless. This limit should be removed in my opinion.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Improvements to this code path are welcome! Thank you for putting together the jmh benchmark. For context, know that the original implementation was ported more-or-less directly from sqlite. At this point, the code is rather opaque to me. Is it the case that the new encoder is forward and backward compatible with the previous implementation? That is, can a byte[] written by the new encoder be read correctly by the old implementation? Can a byte[] written by the old implementation be read correctly by the new? Thanks. |
Yes. The new one is compatible with the previous implementation. The new encoder just optimized the performance but not changed the encoding logic and this commit only made changes at the encoding part. The old read function can correctly read the bytes encoded by the new encoding in our UTs. Besides, I also randomly generated 200 examples to check the correctness. The results of the both encoding are matched. |
Signed-off-by: Reid Chan <[email protected]>
Signed-off-by: Reid Chan <[email protected]>
https://issues.apache.org/jira/browse/HBASE-26566