Deduplicate BucketOrder when deserializing #112707

iverase · 2024-09-10T14:25:19Z

I was looking into a heap dump where we were having millions of instances of BucketOrder, all the same. This was due to a nested string terms and huge amount of buckets. I am wondering if we can use something similar to what we are doing with string to deduplicate BucketOrder instances. This is what this PR is doing so I am looking for feedback in what folks think.

elasticsearchmachine · 2024-09-10T14:25:44Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-09-10T14:25:44Z

Hi @iverase, I've created a changelog YAML for you.

nik9000 · 2024-09-10T18:52:40Z

server/src/main/java/org/elasticsearch/search/aggregations/InternalOrder.java

                case CompoundOrder.ID:
                    int size = in.readVInt();
                    List<BucketOrder> compoundOrder = new ArrayList<>(size);
                    for (int i = 0; i < size; i++) {
                        compoundOrder.add(Streams.readOrder(in));
                    }
-                    return new CompoundOrder(compoundOrder, false);
+                    return bucketOrderDeduplicator.deduplicate(new CompoundOrder(compoundOrder, false));


ESQL uses a wrapper around the StreamInput that keeps the cache in a regular old variable rather than a static. I'd prefer that if we can manage it.

I have moved it as a wrapper of StreamInput by (ab)using the fact that aggregations are deserialize using DelayableWritable. I have to introduce an interface so we can deduplicate when it is found.

…earch into BucketOrderDeduplicator

Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.

elasticsearchmachine · 2024-09-12T07:50:36Z

💚 Backport successful

Status	Branch	Result
✅	8.x

…tion-ironbank-ubi * upstream/main: (302 commits) Deduplicate BucketOrder when deserializing (elastic#112707) Introduce test utils for ingest pipelines (elastic#112733) [Test] Account for auto-repairing for shard gen file (elastic#112778) Do not throw in task enqueued by CancellableRunner (elastic#112780) Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439 Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779 Use a dedicated test executor in MockTransportService (elastic#112748) Estimate segment field usages (elastic#112760) (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522) Fix verifyVersions task (elastic#112765) (Doc+) Terminating Exit Codes (elastic#112530) (Doc+) CAT Nodes default columns (elastic#112715) [DOCS] Augment installation warnings (elastic#112756) Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769 Bump Elasticsearch to a minimum of JDK 21 (elastic#112252) ESQL: Compute support for filtering ungrouped aggs (elastic#112717) Bump Elasticsearch version to 9.0.0 (elastic#112570) add CDR related data streams to kibana_system priviliges (elastic#112655) Support widening of numeric types in union-types (elastic#112610) Introduce data stream options and failure store configuration classes (elastic#109515) ...

Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.

Deduplicate BucketOrder when deserializing

fb15fe8

iverase added >enhancement :Analytics/Aggregations Aggregations v8.16.0 labels Sep 10, 2024

iverase requested review from nik9000, not-napoleon and original-brownbear September 10, 2024 14:25

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 10, 2024

Update docs/changelog/112707.yaml

1c506aa

nik9000 reviewed Sep 10, 2024

View reviewed changes

iverase added 3 commits September 11, 2024 11:44

Merge branch 'main' into BucketOrderDeduplicator

f5f5b90

Move dedupe as a wrapper of StreamInput

11be96c

Merge branch 'BucketOrderDeduplicator' of github.com:iverase/elastics…

96d4ebb

…earch into BucketOrderDeduplicator

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

nik9000 approved these changes Sep 11, 2024

View reviewed changes

iverase added v8.16.0 auto-backport-and-merge labels Sep 12, 2024

iverase merged commit 0ab2afb into elastic:main Sep 12, 2024
15 checks passed

iverase deleted the BucketOrderDeduplicator branch September 12, 2024 07:49

iverase mentioned this pull request Sep 12, 2024

[8.x] Deduplicate BucketOrder when deserializing (#112707) #112789

Merged

iverase added a commit to iverase/elasticsearch that referenced this pull request Sep 12, 2024

Deduplicate BucketOrder when deserializing (elastic#112707)

3902460

Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.

elasticsearchmachine pushed a commit that referenced this pull request Sep 12, 2024

Deduplicate BucketOrder when deserializing (#112707) (#112789)

6067464

Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.

davidkyle pushed a commit that referenced this pull request Sep 12, 2024

Deduplicate BucketOrder when deserializing (#112707)

6ef94ac

Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.

This was referenced Nov 6, 2024

Deduplicate the list of names when deserializing InternalTopMetrics #116298

Open

Deduplicate the name of the aggregation when deserializing InternalAgregation #116307

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate BucketOrder when deserializing #112707

Deduplicate BucketOrder when deserializing #112707

iverase commented Sep 10, 2024

elasticsearchmachine commented Sep 10, 2024

elasticsearchmachine commented Sep 10, 2024

nik9000 Sep 10, 2024

iverase Sep 11, 2024

elasticsearchmachine commented Sep 12, 2024

Deduplicate BucketOrder when deserializing #112707

Deduplicate BucketOrder when deserializing #112707

Conversation

iverase commented Sep 10, 2024

elasticsearchmachine commented Sep 10, 2024

elasticsearchmachine commented Sep 10, 2024

nik9000 Sep 10, 2024

Choose a reason for hiding this comment

iverase Sep 11, 2024

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 12, 2024

💚 Backport successful