-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve FieldFetcher retrieval of fields #66160
Conversation
Currently FieldFetcher stores all the FieldContexts that are later used to retrieve the fields in a List. This has the disadvantage that the same field path can be retrieved several times (e.g. if multiple patterns match the same path or if similar paths are defined several times e.g. with different formats). Currently the last value to be retrieved "wins" and gets returned. We might as well de-duplicate the FieldContexts by using a map internally, keyed by the field path that is going to be retrieved, to avoid more work later.
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a nice simplification.
@@ -48,7 +48,7 @@ public static FieldFetcher create(QueryShardContext context, | |||
SearchLookup searchLookup, | |||
Collection<FieldAndFormat> fieldAndFormats) { | |||
|
|||
List<FieldContext> fieldContexts = new ArrayList<>(); | |||
Map<String, FieldContext> fieldContexts = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make sense to use a LinkedHashMap
here -- then the fields would come back in the order requested. For example if you passed "fields": [ "name", "title"]
, then it'd be nice to return
"fields": {
"name": [ "christoph"],
"title": [ "engineer" ]
}
We wouldn't make a formal guarantee about order (since it's a JSON map), I think it helps readability for debugging, etc.
@@ -48,7 +48,7 @@ public static FieldFetcher create(QueryShardContext context, | |||
SearchLookup searchLookup, | |||
Collection<FieldAndFormat> fieldAndFormats) { | |||
|
|||
List<FieldContext> fieldContexts = new ArrayList<>(); | |||
Map<String, FieldContext> fieldContexts = new HashMap<>(); | |||
List<String> unmappedFetchPattern = new ArrayList<>(); | |||
Set<String> mappedToExclude = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this set, and instead rely on fieldContexts.containsKey(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, great unintended side effect
@jtibshirani thanks for the review, I pushed the changes you requested. |
@elasticmachine update branch |
@elasticmachine run elasticsearch-ci/1 |
Currently FieldFetcher stores all the FieldContexts that are later used to retrieve the fields in a List. This has the disadvantage that the same field path can be retrieved several times (e.g. if multiple patterns match the same path or if similar paths are defined several times e.g. with different formats). Currently the last value to be retrieved "wins" and gets returned. We might as well de-duplicate the FieldContexts by using a map internally, keyed by the field path that is going to be retrieved, to avoid more work later.
Currently FieldFetcher stores all the FieldContexts that are later used to retrieve the fields in a List. This has the disadvantage that the same field path can be retrieved several times (e.g. if multiple patterns match the same path or if similar paths are defined several times e.g. with different formats). Currently the last value to be retrieved "wins" and gets returned. We might as well de-duplicate the FieldContexts by using a map internally, keyed by the field path that is going to be retrieved, to avoid more work later.
* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...
Currently FieldFetcher stores all the FieldContexts that are later used to
retrieve the fields in a List. This has the disadvantage that the same field
path can be retrieved several times (e.g. if multiple patterns match the same
path or if similar paths are defined several times e.g. with different formats).
Currently the last value to be retrieved "wins" and gets returned. We might as
well de-duplicate the FieldContexts by using a map internally, keyed by the
field path that is going to be retrieved, to avoid more work later.