Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better sizing BytesRef for Strings in Queries #115655

Merged
merged 11 commits into from
Nov 7, 2024

Conversation

piergm
Copy link
Member

@piergm piergm commented Oct 25, 2024

When creating BytesRef with the standard constructor we end up over estimating the size of the byte array (UTF8Size = UTF16Size * 3) in order to avoid parsing the input string to properly calculate UTF8Size from UTF16.
We now instead precisely calculate the length and therefore correctly size the byte array with the result of being slightly slower when parsing but more memory efficient.

@piergm piergm added >enhancement Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.0.0 labels Oct 25, 2024
@piergm piergm self-assigned this Oct 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Hi @piergm, I've created a changelog YAML for you.

@javanna
Copy link
Member

javanna commented Oct 25, 2024

is this fixing some existing issue?

@piergm
Copy link
Member Author

piergm commented Oct 28, 2024

@javanna This should delay/avoid OOMs by using less memory when creating BytesRef.
I set this as enhancement because is not really a bug but one optimisation we could do.

} else if (obj instanceof CharBuffer v) {
return BytesRefs.checkIndexableLength(new BytesRef(v));
} else if (obj instanceof BigInteger v) {
return BytesRefs.toBytesRef(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test the change?

Copy link
Member Author

@piergm piergm Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though we could not because of Java, but actually we can and I implemented it here.
Before my change the BytesRef could have length != bytes.length now it's the same and consistent and always <= than previous length that was String#length*3

if (obj instanceof String v) {
byte[] b = new byte[UnicodeUtil.calcUTF16toUTF8Length(v, 0, v.length())];
UnicodeUtil.UTF16toUTF8(v, 0, v.length(), b);
return BytesRefs.checkIndexableLength(new BytesRef(b, 0, b.length));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we extract these 3 lines to a separate utility method (maybe on a more appropriate class)? :) This would be very useful for saving non-trivial amounts of heap in other places!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, moved to BytesRefs and added Java Docs 😄

@piergm
Copy link
Member Author

piergm commented Nov 6, 2024

@elasticmachine update branch

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

* @return a BytesRef object representing the input string
*/
public static BytesRef toExactSizedBytesRef(String s) {
byte[] b = new byte[UnicodeUtil.calcUTF16toUTF8Length(s, 0, s.length())];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: could cache s.length() to a variable for a tiny speedup :P

@piergm
Copy link
Member Author

piergm commented Nov 7, 2024

@elasticmachine update branch

@piergm piergm added v8.17.0 auto-backport Automatically create backport pull requests when merged labels Nov 7, 2024
@piergm piergm merged commit 9ebe95a into elastic:main Nov 7, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 115655

piergm added a commit to piergm/elasticsearch that referenced this pull request Nov 7, 2024
* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 9ebe95a)
@piergm
Copy link
Member Author

piergm commented Nov 7, 2024

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

piergm added a commit that referenced this pull request Nov 7, 2024
* Better sizing BytesRef for Strings in Queries (#115655)

* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 9ebe95a)

* iter
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Nov 7, 2024
* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <[email protected]>
jozala pushed a commit that referenced this pull request Nov 13, 2024
* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <[email protected]>
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants