Better sizing BytesRef for Strings in Queries #115655

piergm · 2024-10-25T12:27:31Z

When creating BytesRef with the standard constructor we end up over estimating the size of the byte array (UTF8Size = UTF16Size * 3) in order to avoid parsing the input string to properly calculate UTF8Size from UTF16.
We now instead precisely calculate the length and therefore correctly size the byte array with the result of being slightly slower when parsing but more memory efficient.

elasticsearchmachine · 2024-10-25T12:27:57Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine · 2024-10-25T12:27:58Z

Hi @piergm, I've created a changelog YAML for you.

javanna · 2024-10-25T14:57:50Z

is this fixing some existing issue?

piergm · 2024-10-28T08:48:56Z

@javanna This should delay/avoid OOMs by using less memory when creating BytesRef.
I set this as enhancement because is not really a bug but one optimisation we could do.

javanna · 2024-10-28T16:39:01Z

server/src/main/java/org/elasticsearch/index/query/AbstractQueryBuilder.java

+        } else if (obj instanceof CharBuffer v) {
+            return BytesRefs.checkIndexableLength(new BytesRef(v));
+        } else if (obj instanceof BigInteger v) {
+            return BytesRefs.toBytesRef(v);


can we test the change?

I though we could not because of Java, but actually we can and I implemented it here.
Before my change the BytesRef could have length != bytes.length now it's the same and consistent and always <= than previous length that was String#length*3

original-brownbear · 2024-10-29T09:55:42Z

server/src/main/java/org/elasticsearch/index/query/AbstractQueryBuilder.java

+        if (obj instanceof String v) {
+            byte[] b = new byte[UnicodeUtil.calcUTF16toUTF8Length(v, 0, v.length())];
+            UnicodeUtil.UTF16toUTF8(v, 0, v.length(), b);
+            return BytesRefs.checkIndexableLength(new BytesRef(b, 0, b.length));


Could we extract these 3 lines to a separate utility method (maybe on a more appropriate class)? :) This would be very useful for saving non-trivial amounts of heap in other places!

Done, moved to BytesRefs and added Java Docs 😄

piergm · 2024-11-06T09:20:19Z

@elasticmachine update branch

original-brownbear

LGTM :)

original-brownbear · 2024-11-06T11:02:13Z

server/src/main/java/org/elasticsearch/common/lucene/BytesRefs.java

+     * @return a BytesRef object representing the input string
+     */
+    public static BytesRef toExactSizedBytesRef(String s) {
+        byte[] b = new byte[UnicodeUtil.calcUTF16toUTF8Length(s, 0, s.length())];


NIT: could cache s.length() to a variable for a tiny speedup :P

piergm · 2024-11-07T07:23:50Z

@elasticmachine update branch

elasticsearchmachine · 2024-11-07T08:35:02Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 115655

* Better sizing BytesRefs for Strings in Queries * Update docs/changelog/115655.yaml * iter * added test * iter * extracted method * iter --------- Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit 9ebe95a)

piergm · 2024-11-07T08:36:12Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

* Better sizing BytesRef for Strings in Queries (#115655) * Better sizing BytesRefs for Strings in Queries * Update docs/changelog/115655.yaml * iter * added test * iter * extracted method * iter --------- Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit 9ebe95a) * iter

* Better sizing BytesRefs for Strings in Queries * Update docs/changelog/115655.yaml * iter * added test * iter * extracted method * iter --------- Co-authored-by: Elastic Machine <[email protected]>

Better sizing BytesRefs for Strings in Queries

3a56144

piergm added >enhancement Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.0.0 labels Oct 25, 2024

piergm requested a review from original-brownbear October 25, 2024 12:27

piergm self-assigned this Oct 25, 2024

Update docs/changelog/115655.yaml

008f3ad

iter

aa515da

javanna reviewed Oct 28, 2024

View reviewed changes

piergm added 2 commits October 29, 2024 09:06

added test

bd5c321

iter

81b2b19

original-brownbear reviewed Oct 29, 2024

View reviewed changes

piergm added 2 commits October 29, 2024 16:42

extracted method

4553d9f

Merge branch 'elastic:main' into correctly-size-BytesRefs-for-Strings

75ca070

piergm requested review from javanna and original-brownbear November 6, 2024 09:19

Merge branch 'main' into correctly-size-BytesRefs-for-Strings

72ee4c4

original-brownbear approved these changes Nov 6, 2024

View reviewed changes

piergm added 2 commits November 6, 2024 13:42

iter

581f777

Merge branch 'elastic:main' into correctly-size-BytesRefs-for-Strings

1de2d8f

Merge branch 'main' into correctly-size-BytesRefs-for-Strings

71b8279

piergm added v8.17.0 auto-backport Automatically create backport pull requests when merged labels Nov 7, 2024

piergm merged commit 9ebe95a into elastic:main Nov 7, 2024
16 checks passed

elasticsearchmachine added the backport pending label Nov 7, 2024

piergm mentioned this pull request Nov 7, 2024

[8.x] Better sizing BytesRef for Strings in Queries (#115655) #116381

Merged

piergm mentioned this pull request Nov 7, 2024

fix testMaybeConvertToBytesRefStringCorrectSize #116386

Merged

piergm removed the backport pending label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better sizing BytesRef for Strings in Queries #115655

Better sizing BytesRef for Strings in Queries #115655

piergm commented Oct 25, 2024

elasticsearchmachine commented Oct 25, 2024

elasticsearchmachine commented Oct 25, 2024

javanna commented Oct 25, 2024

piergm commented Oct 28, 2024 •

edited

Loading

javanna Oct 28, 2024

piergm Oct 29, 2024 •

edited

Loading

original-brownbear Oct 29, 2024

piergm Oct 29, 2024

piergm commented Nov 6, 2024

original-brownbear left a comment

original-brownbear Nov 6, 2024

piergm commented Nov 7, 2024

elasticsearchmachine commented Nov 7, 2024

piergm commented Nov 7, 2024

Better sizing BytesRef for Strings in Queries #115655

Better sizing BytesRef for Strings in Queries #115655

Conversation

piergm commented Oct 25, 2024

elasticsearchmachine commented Oct 25, 2024

elasticsearchmachine commented Oct 25, 2024

javanna commented Oct 25, 2024

piergm commented Oct 28, 2024 • edited Loading

javanna Oct 28, 2024

Choose a reason for hiding this comment

piergm Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

original-brownbear Oct 29, 2024

Choose a reason for hiding this comment

piergm Oct 29, 2024

Choose a reason for hiding this comment

piergm commented Nov 6, 2024

original-brownbear left a comment

Choose a reason for hiding this comment

original-brownbear Nov 6, 2024

Choose a reason for hiding this comment

piergm commented Nov 7, 2024

elasticsearchmachine commented Nov 7, 2024

💔 Backport failed

piergm commented Nov 7, 2024

💚 All backports created successfully

Questions ?

piergm commented Oct 28, 2024 •

edited

Loading

piergm Oct 29, 2024 •

edited

Loading