Estimate segment field usages #112760

dnhatn · 2024-09-11T18:36:42Z

We have introduced a new memory estimation method in serverless, based on the number of segments and the fields within them. This new approach works well overall, but it still falls short in cases where most fields are used more than once - for example, in both doc_values and postings, or doc_values and points. This change exposes the total usage of fields in segments, allowing us to adjust the memory estimate for these cases.

elasticsearchmachine · 2024-09-11T20:04:27Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

jpountz · 2024-09-11T21:13:06Z

server/src/main/java/org/elasticsearch/index/codec/FieldInfosWithUsages.java

+            if (fi.hasNorms()) {
+                usages++;
+            }
+            if (fi.hasVectors()) {


We can probably skip term vectors: their memory usage does not scale with the number of fields (like stored fields).

Yes, I have pushed 5cc4f35 to remove this.

jpountz · 2024-09-11T21:16:18Z

The change looks fine. I wonder if we need to go this granular though, or if we should assume that all fields that exist have an index (either terms, points or vectors) and doc values. Some fields may only have doc values, but then it's fine if we overestimate a bit?

dnhatn · 2024-09-11T23:18:04Z

Thanks, Adrien. I considered this option, but it would require overestimating the current estimate by 20% in all cases, which might prevent us from running 2GB instances. I'll merge this PR and discuss the follow-up changes for serverless. If we find a different solution, I'll revert this change.

We have introduced a new memory estimation method in serverless, based on the number of segments and the fields within them. This new approach works well overall, but it still falls short in cases where most fields are used more than once - for example, in both doc_values and postings, or doc_values and points. This change exposes the total usage of fields in segments, allowing us to adjust the memory estimate for these cases.

elasticsearchmachine · 2024-09-11T23:19:53Z

💚 Backport successful

Status	Branch	Result
✅	8.x

We have introduced a new memory estimation method in serverless, based on the number of segments and the fields within them. This new approach works well overall, but it still falls short in cases where most fields are used more than once - for example, in both doc_values and postings, or doc_values and points. This change exposes the total usage of fields in segments, allowing us to adjust the memory estimate for these cases.

…tion-ironbank-ubi * upstream/main: (302 commits) Deduplicate BucketOrder when deserializing (elastic#112707) Introduce test utils for ingest pipelines (elastic#112733) [Test] Account for auto-repairing for shard gen file (elastic#112778) Do not throw in task enqueued by CancellableRunner (elastic#112780) Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439 Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779 Use a dedicated test executor in MockTransportService (elastic#112748) Estimate segment field usages (elastic#112760) (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522) Fix verifyVersions task (elastic#112765) (Doc+) Terminating Exit Codes (elastic#112530) (Doc+) CAT Nodes default columns (elastic#112715) [DOCS] Augment installation warnings (elastic#112756) Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769 Bump Elasticsearch to a minimum of JDK 21 (elastic#112252) ESQL: Compute support for filtering ungrouped aggs (elastic#112717) Bump Elasticsearch version to 9.0.0 (elastic#112570) add CDR related data streams to kibana_system priviliges (elastic#112655) Support widening of numeric types in union-types (elastic#112610) Introduce data stream options and failure store configuration classes (elastic#109515) ...

We have introduced a new memory estimation method in serverless, based on the number of segments and the fields within them. This new approach works well overall, but it still falls short in cases where most fields are used more than once - for example, in both doc_values and postings, or doc_values and points. This change exposes the total usage of fields in segments, allowing us to adjust the memory estimate for these cases.

elasticsearchmachine added the v9.0.0 label Sep 11, 2024

Estimate segment field usages

481f542

dnhatn force-pushed the field-infos-usage branch from e3e5128 to 481f542 Compare September 11, 2024 18:52

dnhatn added v8.16.0 auto-backport-and-merge :StorageEngine/Mapping The storage related side of mappings >non-issue labels Sep 11, 2024

dnhatn requested review from pxsalehi, henningandersen, jpountz and martijnvg September 11, 2024 20:03

dnhatn marked this pull request as ready for review September 11, 2024 20:04

elasticsearchmachine added the Team:StorageEngine label Sep 11, 2024

jpountz approved these changes Sep 11, 2024

View reviewed changes

remove term vectors

5cc4f35

dnhatn merged commit ed41445 into elastic:main Sep 11, 2024
15 checks passed

dnhatn deleted the field-infos-usage branch September 11, 2024 23:18

dnhatn mentioned this pull request Sep 11, 2024

[8.x] Estimate segment field usages (#112760) #112777

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate segment field usages #112760

Estimate segment field usages #112760

dnhatn commented Sep 11, 2024 •

edited

Loading

elasticsearchmachine commented Sep 11, 2024

jpountz Sep 11, 2024

dnhatn Sep 11, 2024

jpountz commented Sep 11, 2024

dnhatn commented Sep 11, 2024

elasticsearchmachine commented Sep 11, 2024

Estimate segment field usages #112760

Estimate segment field usages #112760

Conversation

dnhatn commented Sep 11, 2024 • edited Loading

elasticsearchmachine commented Sep 11, 2024

jpountz Sep 11, 2024

Choose a reason for hiding this comment

dnhatn Sep 11, 2024

Choose a reason for hiding this comment

jpountz commented Sep 11, 2024

dnhatn commented Sep 11, 2024

elasticsearchmachine commented Sep 11, 2024

💚 Backport successful

dnhatn commented Sep 11, 2024 •

edited

Loading