Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode functions for range field binary encoded doc values #41206

Conversation

not-napoleon
Copy link
Member

This PR is part of the range aggregation work from #34644. Decoding functions will be necessary for aggregations to operate on the numeric values of the ranges.

The decode logic for IP, Float, and Double values is straightforward as the classes we use for encoding already provide a decoder. Longs are a custom encoding, and I hand-rolled the bit manipulation to do the decoding. Reviewer, please pay the most attention to this. I believe my logic to be sound and my test cases are passing, but it is the part of this work I am least confidant in. Thanks.

In addition to the decode logic, I did some small refactoring:

  • The encoding logic for IP ranges was in the RangeType enum; I moved it to BinaryRangeUtil, where all the other encoding logic is.
  • Defined equals (and hashcode) function for Range, to make life easier for writing tests.
  • Made the LengthType#readLength method public, and made LengthType a field of RangeType since those two concepts are fundamentally linked.

Also note, I expect a small conflict with #41160. I'll fix once that merges.

@not-napoleon not-napoleon added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types 7x labels Apr 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a drive-by comment. Please do not consider this a full review

static List<RangeFieldMapper.Range> decodeRanges(BytesRef encodedRanges, RangeFieldMapper.RangeType rangeType,
TriFunction<byte[], Integer, Integer, Object> decodeBytes) {

BinaryDocValuesRangeQuery.LengthType lengthType = rangeType.lengthType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It seems a bit weird for a utility class to be using something from a query class rather than the other way around. Maybe we should consider either moving BinaryDocValuesRangeQuery.LengthType to be an inner class of this BinaryRangeUtil class or make it a standalone class on its own?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even make it an inner class of RangeFieldMapper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel like we already have too many inner classes crammed into RangeFieldMapper, but moving LengthType to BinaryRangeUtil makes a lot of sense to me. Pushed a change set doing that :)

@not-napoleon not-napoleon changed the base branch from master to feature-range-aggregations April 25, 2019 19:06
@not-napoleon not-napoleon requested a review from jimczi April 25, 2019 19:41
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good @not-napoleon . I left some comments

import java.util.Comparator;
import java.util.List;
import java.util.Set;

enum BinaryRangeUtil {
public enum BinaryRangeUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract LengthType in his own file and leave this class package protected ? The encoding should remain internal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I feel really bad about making LengthType a top level enum; it's very much an implementation detail of the range encoding. The more I thought about it, the more I came to feel it really should be part of RangeType, and my only objection to putting it there in the first place was that RangeFieldMapper is already 1000 lines and defines half a dozen classes. So I made RangeType a top level enum and put LengthType under that. RangeType needs to be public anyway, so there's no increased API surface with this arrangement.

This seems like the most natural refactoring to me, since LengthType is a direct function of RangeType, but I'm open to rolling that back and just making LengthType a top level if you feel strongly that's the right way to do this. Thanks for the feedback!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fine with me, thanks for explaining

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@not-napoleon not-napoleon merged commit aebb974 into elastic:feature-range-aggregations May 7, 2019
@not-napoleon not-napoleon deleted the feature/binary-range-decoder branch May 7, 2019 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants