Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate BucketOrder when deserializing #112707

Merged
merged 5 commits into from
Sep 12, 2024

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Sep 10, 2024

I was looking into a heap dump where we were having millions of instances of BucketOrder, all the same. This was due to a nested string terms and huge amount of buckets. I am wondering if we can use something similar to what we are doing with string to deduplicate BucketOrder instances. This is what this PR is doing so I am looking for feedback in what folks think.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've created a changelog YAML for you.

case CompoundOrder.ID:
int size = in.readVInt();
List<BucketOrder> compoundOrder = new ArrayList<>(size);
for (int i = 0; i < size; i++) {
compoundOrder.add(Streams.readOrder(in));
}
return new CompoundOrder(compoundOrder, false);
return bucketOrderDeduplicator.deduplicate(new CompoundOrder(compoundOrder, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ESQL uses a wrapper around the StreamInput that keeps the cache in a regular old variable rather than a static. I'd prefer that if we can manage it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved it as a wrapper of StreamInput by (ab)using the fact that aggregations are deserialize using DelayableWritable. I have to introduce an interface so we can deduplicate when it is found.

@iverase iverase merged commit 0ab2afb into elastic:main Sep 12, 2024
15 checks passed
@iverase iverase deleted the BucketOrderDeduplicator branch September 12, 2024 07:49
iverase added a commit to iverase/elasticsearch that referenced this pull request Sep 12, 2024
Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

v1v added a commit to v1v/elasticsearch that referenced this pull request Sep 12, 2024
…tion-ironbank-ubi

* upstream/main: (302 commits)
  Deduplicate BucketOrder when deserializing (elastic#112707)
  Introduce test utils for ingest pipelines (elastic#112733)
  [Test] Account for auto-repairing for shard gen file (elastic#112778)
  Do not throw in task enqueued by CancellableRunner (elastic#112780)
  Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439
  Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779
  Use a dedicated test executor in MockTransportService (elastic#112748)
  Estimate segment field usages (elastic#112760)
  (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522)
  Fix verifyVersions task (elastic#112765)
  (Doc+) Terminating Exit Codes (elastic#112530)
  (Doc+) CAT Nodes default columns (elastic#112715)
  [DOCS] Augment installation warnings (elastic#112756)
  Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769
  Bump Elasticsearch to a minimum of JDK 21 (elastic#112252)
  ESQL: Compute support for filtering ungrouped aggs (elastic#112717)
  Bump Elasticsearch version to 9.0.0 (elastic#112570)
  add CDR related data streams to kibana_system priviliges (elastic#112655)
  Support widening of numeric types in union-types (elastic#112610)
  Introduce data stream options and failure store configuration classes (elastic#109515)
  ...
elasticsearchmachine pushed a commit that referenced this pull request Sep 12, 2024
Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.
davidkyle pushed a commit that referenced this pull request Sep 12, 2024
Deduplicate BucketOrder object by wrapping the StreamInput generated by DelayableWritable objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants