Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc values encoding of range fields is inefficient #26443

Closed
jpountz opened this issue Aug 30, 2017 · 0 comments · Fixed by #26470
Closed

Doc values encoding of range fields is inefficient #26443

jpountz opened this issue Aug 30, 2017 · 0 comments · Fixed by #26470
Labels
blocker :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v6.0.0-rc1

Comments

@jpountz
Copy link
Contributor

jpountz commented Aug 30, 2017

I was just looking again at how we encode doc values for range fields and I think we should make them more efficient. For instance floats are encoded like doubles and doubles use a varbyte encoding of their long bits even though they most likely use all bytes so the continuation bits are wasted.

Marking this as a 6.0 blocker since this change is much easier to do if not released yet.

@colings86 colings86 added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Sep 13, 2017
jpountz added a commit to jpountz/elasticsearch that referenced this issue Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes elastic#26443
jpountz added a commit that referenced this issue Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes #26443
jpountz added a commit that referenced this issue Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes #26443
jpountz added a commit that referenced this issue Sep 13, 2017
This PR removes the vInt that precedes every value in order to know how long
they are. Instead the query takes an enum that tells how to compute the length
of values: for fixed-length data (ip addresses, double, float) the length is a
constant while longs and integers use a variable-length representation that
allows the length to be computed from the encoded values.

Also the encoding of ints/longs was made a bit more efficient in order not to
waste 3 bits in the header. As a consequence, values between -8 and 7 can now
be encoded on 1 byte and values between -2048 and 2047 can now be encoded on 2
bytes or less.

Closes #26443
@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v6.0.0-rc1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants