-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc values encoding of range fields is inefficient #26443
Labels
blocker
:Search Foundations/Mapping
Index mappings, including merging and defining field types
Team:Search Foundations
Meta label for the Search Foundations team in Elasticsearch
v6.0.0-rc1
Comments
colings86
added
the
:Search Foundations/Mapping
Index mappings, including merging and defining field types
label
Sep 13, 2017
jpountz
added a commit
to jpountz/elasticsearch
that referenced
this issue
Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long they are. Instead the query takes an enum that tells how to compute the length of values: for fixed-length data (ip addresses, double, float) the length is a constant while longs and integers use a variable-length representation that allows the length to be computed from the encoded values. Also the encoding of ints/longs was made a bit more efficient in order not to waste 3 bits in the header. As a consequence, values between -8 and 7 can now be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2 bytes or less. Closes elastic#26443
jpountz
added a commit
that referenced
this issue
Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long they are. Instead the query takes an enum that tells how to compute the length of values: for fixed-length data (ip addresses, double, float) the length is a constant while longs and integers use a variable-length representation that allows the length to be computed from the encoded values. Also the encoding of ints/longs was made a bit more efficient in order not to waste 3 bits in the header. As a consequence, values between -8 and 7 can now be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2 bytes or less. Closes #26443
jpountz
added a commit
that referenced
this issue
Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long they are. Instead the query takes an enum that tells how to compute the length of values: for fixed-length data (ip addresses, double, float) the length is a constant while longs and integers use a variable-length representation that allows the length to be computed from the encoded values. Also the encoding of ints/longs was made a bit more efficient in order not to waste 3 bits in the header. As a consequence, values between -8 and 7 can now be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2 bytes or less. Closes #26443
jpountz
added a commit
that referenced
this issue
Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long they are. Instead the query takes an enum that tells how to compute the length of values: for fixed-length data (ip addresses, double, float) the length is a constant while longs and integers use a variable-length representation that allows the length to be computed from the encoded values. Also the encoding of ints/longs was made a bit more efficient in order not to waste 3 bits in the header. As a consequence, values between -8 and 7 can now be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2 bytes or less. Closes #26443
javanna
added
the
Team:Search Foundations
Meta label for the Search Foundations team in Elasticsearch
label
Jul 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
blocker
:Search Foundations/Mapping
Index mappings, including merging and defining field types
Team:Search Foundations
Meta label for the Search Foundations team in Elasticsearch
v6.0.0-rc1
I was just looking again at how we encode doc values for range fields and I think we should make them more efficient. For instance floats are encoded like doubles and doubles use a varbyte encoding of their long bits even though they most likely use all bytes so the continuation bits are wasted.
Marking this as a 6.0 blocker since this change is much easier to do if not released yet.
The text was updated successfully, but these errors were encountered: