-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we limit the number of fields that can be retrieved by the “fields” API #69983
Comments
Pinging @elastic/es-search (Team:Search) |
Currently we use a CharacterRunAutomaton build from all field pattern that have the "include_unmapped" option set. This can lead to unnecessarily large automata for cases where the user unessesarily list all known fields and adds the "include_unmapped" option. We only really need the automaton for pattern that contain wildcards and can look up any other field path directly and only need to do so if the field path isn't indeed mapped. Relates to elastic#69983
Just want to mention that if we introduce a limit similar to docvalue fields, it will cause a regression in Kibana (specifically elastic/kibana#22897 will become an issue for Discover again). As @jtibshirani explained in elastic/kibana#75813, Kibana switched from using I'm not suggesting that there should be no limit whatsoever, but wanted to add this to your list of considerations as the team weighs the best path forward. Edit: Just saw this comment from the linked PR -- I guess this means no limit is planned for the time being? |
Correct, I think we tend to not introduce a soft or hard limit at the moment, however this issue is here to discuss this. I'm aware that should we introduce anything along these lines that it is important to consider Kibana and other users of the API. |
Currently we use a CharacterRunAutomaton build from all field pattern that have the "include_unmapped" option set. This can lead to unnecessarily large automata for cases where the user unessesarily list all known fields and adds the "include_unmapped" option. We only really need the automaton for pattern that contain wildcards and can look up any other field path directly and only need to do so if the field path isn't indeed mapped. Relates to #69983
Currently we use a CharacterRunAutomaton build from all field pattern that have the "include_unmapped" option set. This can lead to unnecessarily large automata for cases where the user unessesarily list all known fields and adds the "include_unmapped" option. We only really need the automaton for pattern that contain wildcards and can look up any other field path directly and only need to do so if the field path isn't indeed mapped. Relates to #69983
Closed by #69984 |
Currently we don’t limit the number of fields that can be retrieved using the “fields” API.
The original reasoning was that field values are retrieved from an already loaded “source”, so the actual lookup from the source map should come with relatively small cost.
In order to add of the ability to include unmapped fields, we are making use of Automata to match field patterns that have the “include_unmapped” option set. We do this because we don’t know which unmapped leave values the source contains and want to be able to efficiently match wildcard field paths while traversing the source. These Automata by default come with a limit on the number of states they can have (by default 10000) in order to prevent unexpected memory consumption. This is quite sufficient when we have a small number of “fields” pattern with “include _unmapped” set, as should normally be the case.
However, it is possible to exceed this size limit when using the “_fields” API with hundreds or thousands of field patterns that all have the "include_unmapped” option turned on, in which case the request will fail.
This led us to think about whether we should put a limit on the number of fields (or field patterns?) that the API can retrieve, which could be a dynamic index setting like the ones we e.g. have for doc value fields (index.max_docvalue_fields_search).
There are some questions here:
If the main motivation for introducing any limit here is the potential danger of reaching the size limit of the automaton used for “include_unmapped” fields, I think we can lower that risk even more by limiting its use to field patterns using wildcards. Concrete field paths (as in the case when enumerating known field names) can be directly looked up from source without using the automaton. I wonder if this leaves many non-esoteric use cases in which we would run into a size limit.
Relates to #60985
The text was updated successfully, but these errors were encountered: