-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate BucketOrder when deserializing #112707
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @iverase, I've created a changelog YAML for you. |
case CompoundOrder.ID: | ||
int size = in.readVInt(); | ||
List<BucketOrder> compoundOrder = new ArrayList<>(size); | ||
for (int i = 0; i < size; i++) { | ||
compoundOrder.add(Streams.readOrder(in)); | ||
} | ||
return new CompoundOrder(compoundOrder, false); | ||
return bucketOrderDeduplicator.deduplicate(new CompoundOrder(compoundOrder, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ESQL uses a wrapper around the StreamInput
that keeps the cache in a regular old variable rather than a static. I'd prefer that if we can manage it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it as a wrapper of StreamInput by (ab)using the fact that aggregations are deserialize using DelayableWritable. I have to introduce an interface so we can deduplicate when it is found.
Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.
💚 Backport successful
|
…tion-ironbank-ubi * upstream/main: (302 commits) Deduplicate BucketOrder when deserializing (elastic#112707) Introduce test utils for ingest pipelines (elastic#112733) [Test] Account for auto-repairing for shard gen file (elastic#112778) Do not throw in task enqueued by CancellableRunner (elastic#112780) Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439 Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779 Use a dedicated test executor in MockTransportService (elastic#112748) Estimate segment field usages (elastic#112760) (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522) Fix verifyVersions task (elastic#112765) (Doc+) Terminating Exit Codes (elastic#112530) (Doc+) CAT Nodes default columns (elastic#112715) [DOCS] Augment installation warnings (elastic#112756) Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769 Bump Elasticsearch to a minimum of JDK 21 (elastic#112252) ESQL: Compute support for filtering ungrouped aggs (elastic#112717) Bump Elasticsearch version to 9.0.0 (elastic#112570) add CDR related data streams to kibana_system priviliges (elastic#112655) Support widening of numeric types in union-types (elastic#112610) Introduce data stream options and failure store configuration classes (elastic#109515) ...
Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.
I was looking into a heap dump where we were having millions of instances of BucketOrder, all the same. This was due to a nested string terms and huge amount of buckets. I am wondering if we can use something similar to what we are doing with string to deduplicate BucketOrder instances. This is what this PR is doing so I am looking for feedback in what folks think.