Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching many fields takes much more time than retrieving _source #96349

Closed
mayya-sharipova opened this issue May 25, 2023 · 5 comments
Closed
Assignees
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented May 25, 2023

In a search request, asking to retrieve many fields can take substantially more time than retrieving the whole source.

For example, tested in ES 8.6 asking to retrieve 200 fields takes > 6s. On the same shard without fetching fields, but grabbing the whole source takes around 139 ms.

"fields": [
    {
      "field": "*",
      "include_unmapped": "true"
    },
    {
      "field": "@timestamp",
      "format": "strict_date_optional_time"
    },
    {
      "field": "field1",
      "format": "strict_date_optional_time"
    },
    {
      "field": "field2",
      "format": "strict_date_optional_time"
    },
    {
      "field": "field3",
      "format": "strict_date_optional_time"
    },
@mayya-sharipova mayya-sharipova added >bug :Search/Search Search-related issues that do not fall into other categories labels May 25, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label May 25, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@mayya-sharipova mayya-sharipova self-assigned this May 26, 2023
@cpmoore
Copy link

cpmoore commented Jun 28, 2023

Hi @mayya-sharipova
I was the one to initially report this on a support case. I was wondering if you had any updates since I haven’t received any from my support case.
I’m happy to help in any way I can.

@andreidan
Copy link
Contributor

andreidan commented Jun 26, 2024

I've done some testing trying to reproduce this issue and ruled out some of potential scenarios like reparsing each source for each requested field (this is NOT happening as we only parse the source once per document and cache it) or reloading the source from stored fields for each document.

I've run an extensive set of micro benchmarks for parsing fields and extracting values from the source (with various source configurations like 100k fields, fewer but larger fields - 4MB each, 10k fields with large arrays, and a mix of all of these) and was not able to reproduce the slowdown.

Finally, I've turned my attention to benchmarking fetching the values from source (https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/mapper/SourceValueFetcher.java#L56) as I suspected parsing some field types might be more heavy than others. This way I believe I've found a case where retrieving 100 timestamp fields, from 8k documents takes 2 - 2.5 seconds on one Elasticsearch nodes (whilst retrieving the whole source takes 0.2 seconds)

This is due to the data parsing we execute when retrieving the fields.
The time to parse dates is something the wider Elasticsearch team works on periodically (e.g. #106486 ) but I don't think we can do much else here in the meantime.

I'll keep this issue open for a few more days to potentially collect more comments but otherwise I believe we should rely on the improvements we make to date/time parsing to speed up the retrieval of many date fields.

@benwtrent benwtrent added the priority:normal A label for assessing bug priority to be used by ES engineers label Jul 9, 2024
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@andreidan
Copy link
Contributor

Closing this for now but please reopen if more details/information is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants