Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient encoding of range fields. #26470

Merged
merged 1 commit into from
Sep 13, 2017

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Sep 1, 2017

This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes #26443

@jpountz jpountz added :Search Foundations/Mapping Index mappings, including merging and defining field types >enhancement v6.0.0 labels Sep 1, 2017
@jpountz jpountz requested a review from martijnvg September 1, 2017 11:48
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement!

@@ -161,4 +170,42 @@ boolean matches(BytesRef from, BytesRef to, BytesRef otherFrom, BytesRef otherTo

}

public enum LengthType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better than having the length logic on several places!

@jpountz
Copy link
Contributor Author

jpountz commented Sep 11, 2017

@elasticmachine test it please

This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes elastic#26443
@jpountz jpountz force-pushed the enhancement/range_docvalue_encoding branch from 0669b4e to 175426a Compare September 13, 2017 11:10
@jpountz jpountz merged commit 454cfc2 into elastic:master Sep 13, 2017
@jpountz jpountz deleted the enhancement/range_docvalue_encoding branch September 13, 2017 13:26
jpountz added a commit that referenced this pull request Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes #26443
jpountz added a commit that referenced this pull request Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes #26443
jasontedor added a commit to synhershko/elasticsearch that referenced this pull request Sep 14, 2017
* master: (39 commits)
  [Docs] Correct typo in removal_of_types.asciidoc (elastic#26646)
  [Docs] "The the" is a great band, but ... (elastic#26644)
  Move all repository-azure classes under one single package (elastic#26624)
  [Docs] Update link in removal_of_types.asciidoc (elastic#26614)
  Fix percolator highlight sub fetch phase to not highlight query twice (elastic#26622)
  Refactor bootstrap check results and error messages
  Add BootstrapContext to expose settings and recovered state to bootstrap checks (elastic#26628)
  [Tests] Removing skipping tests in search rest tests
  Initialize checkpoint tracker with allocation ID
  Move non-core mappers to a module. (elastic#26549)
  [Docs] Clarify size parameter in Completion Suggester doc (elastic#26617)
  Add soft limit on allowed number of script fields in request (elastic#26598)
  Remove MapperService#dynamic. (elastic#26603)
  Fix incomplete sentences in parent-join docs (elastic#26623)
  More efficient encoding of range fields. (elastic#26470)
  Ensure module is bundled before installing in tests
  Add boolean similarity to built in similarity types (elastic#26613)
  [Tests] Remove skip tests in search/30_limits.yml
  Let search phases override max concurrent requests
  Add a soft limit for the number of requested doc-value fields (elastic#26574)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types v6.0.0-rc1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Doc values encoding of range fields is inefficient
3 participants