-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode functions for range field binary encoded doc values #41206
Decode functions for range field binary encoded doc values #41206
Conversation
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a drive-by comment. Please do not consider this a full review
static List<RangeFieldMapper.Range> decodeRanges(BytesRef encodedRanges, RangeFieldMapper.RangeType rangeType, | ||
TriFunction<byte[], Integer, Integer, Object> decodeBytes) { | ||
|
||
BinaryDocValuesRangeQuery.LengthType lengthType = rangeType.lengthType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It seems a bit weird for a utility class to be using something from a query class rather than the other way around. Maybe we should consider either moving BinaryDocValuesRangeQuery.LengthType
to be an inner class of this BinaryRangeUtil
class or make it a standalone class on its own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even make it an inner class of RangeFieldMapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of feel like we already have too many inner classes crammed into RangeFieldMapper
, but moving LengthType
to BinaryRangeUtil
makes a lot of sense to me. Pushed a change set doing that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good @not-napoleon . I left some comments
server/src/test/java/org/elasticsearch/index/mapper/BinaryRangeUtilTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/BinaryRangeUtilTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/BinaryRangeUtilTests.java
Show resolved
Hide resolved
import java.util.Comparator; | ||
import java.util.List; | ||
import java.util.Set; | ||
|
||
enum BinaryRangeUtil { | ||
public enum BinaryRangeUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract LengthType in his own file and leave this class package protected ? The encoding should remain internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I feel really bad about making LengthType
a top level enum; it's very much an implementation detail of the range encoding. The more I thought about it, the more I came to feel it really should be part of RangeType
, and my only objection to putting it there in the first place was that RangeFieldMapper
is already 1000 lines and defines half a dozen classes. So I made RangeType
a top level enum and put LengthType
under that. RangeType
needs to be public anyway, so there's no increased API surface with this arrangement.
This seems like the most natural refactoring to me, since LengthType
is a direct function of RangeType
, but I'm open to rolling that back and just making LengthType
a top level if you feel strongly that's the right way to do this. Thanks for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fine with me, thanks for explaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR is part of the range aggregation work from #34644. Decoding functions will be necessary for aggregations to operate on the numeric values of the ranges.
The decode logic for IP, Float, and Double values is straightforward as the classes we use for encoding already provide a decoder. Longs are a custom encoding, and I hand-rolled the bit manipulation to do the decoding. Reviewer, please pay the most attention to this. I believe my logic to be sound and my test cases are passing, but it is the part of this work I am least confidant in. Thanks.
In addition to the decode logic, I did some small refactoring:
RangeType
enum; I moved it toBinaryRangeUtil
, where all the other encoding logic is.equals
(andhashcode
) function forRange
, to make life easier for writing tests.LengthType#readLength
method public, and madeLengthType
a field ofRangeType
since those two concepts are fundamentally linked.Also note, I expect a small conflict with #41160. I'll fix once that merges.