-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] [RFC] Query Shape Field Data Type #69
Comments
Thanks for the RFC!
Do we know what is the coverage? i.e. how many builders and what are the builders have data type available (can we attach a more detailed list here [1] )?
This option is not as intrusive as #1, but there are chances the mapping could be outdated. If we want to keep the mapping up to date, we need to constantly fetch and refersh them. Instead of adding field data type to all query types, we should probably identify the important ones and only add type info for them. For example, as you mentioned we can always add types for range queries if type information can be easily fetched based on researches done in [1]. |
@ansjcy Looks like most builders don't already look up data type Count of all Query/Agg/Sort Builders
Count of Builders which look up data type
|
@dzane17 - Thank you for putting up this proposal and different possible options. While I am leaning towards option 1, I am wondering about below regarding the same:
I am slightly less in favor of Option 2, as I see no good reason for |
Query Builders look up field type during the As mentioned previously, not all *Builder classes currently look up fieldType. In those instances we will not have data type or will need to add a lookup.
Yes, we can add the class variable and getter method in some parent class, but we will still need to edit child Builder classes to set the values.
New builder classes will not work by default. They will require onboarding in up to 2 places
|
@dzane17 Thanks for the proposal and I too am leaning towards option 1 as we should avoid having this logic in the query-insights plugin. For option 1 I am concerned regarding the following:
Please let us know if this would be an issue and how we plan to handle this? |
@deshsidd Actually this concern is related to option 2. There is a time interval between core search execution and generating query shape in Query Insights plugin, so we cannot guarantee that the index mapping for a particular field is present or accurate. This will impact a minute percent of query shapes but is certainly a flaw. Even if we routinely fetch index mappings from Query Insights, I don't see a way to lock the exact mappings at the time of execution. |
Got it. In that case approach 1 looks good to me. Only drawback I see is the following:
Is there any way we can enforce new builders to onboard by default (at least in the core code?). |
Thanks @dzane17 for the response. I am wondering if we can modify the |
A possible solution that's backwards compatible and won't break plugins would be adding a |
@msfroh - While this works well if only the
We also considered an interface similar to |
There is no trade-off. I don't know how I can make this any clearer: WE ARE NOT GOING TO STOP CURRENT PLUGINS FROM COMPILING AGAINST 2.X. Edit: also, don't try to piggy-back on the toQuery call. If the query insights plug-in needs mapping information, it should take a dependency on MapperService itswlf, via its plugin initialization. |
@msfroh - There is no piggy-backing here. Currently, |
Then what does You said that Query Insights needs to get from the field name to the field type. That requires the index name (which it can get from the SearchRequest) and the MappingService. Note that by design, QueryBuilder doesn't hold info about field types. QueryBuilder is essentially an AST for the query. It uses context (without holding onto it) to produce the appropriate Lucene query for the mapping. |
QueryBuilder being an AST makes strong case for it to contain the field information!? The leaf nodes in AST are nothing but the expression, where one of the operand generally is the field name. |
If you look at https://en.wikipedia.org/wiki/Compiler#Front_end, the Can you please take the time to understand how |
Are you still referring to the |
What does the rewrite method take as its input? It takes the semantic context, which it uses to return a different QueryBuilder (which still does not hold the semantic context, but rather the query may be simpler/more efficient based on semantic context). |
The point that I'm trying (and evidently failing) to communicate is that QueryBuilder models syntax not semantics. The mapped field types are part of the semantic context in which a query executes. This is why I googled "abstract syntax tree" to see where the confusion might lie. In my mind, syntax is distinct from semantics. |
I'm referring to the code snippet in #69 (comment), which is also a terrible idea. |
Chatted w/ @jainankitk offline. We realized that one of the biggest challenges here is the fact that Query Insights runs at the coordinator layer, not the shard layer. So, the At the coordinator level, we don't have much information at all about the index mappings, which surprises me a little. (I had always assumed there would be some high-level awareness of index schema at the coordinator level.) The closest path that I see is Here are more examples of OpenSearch plugins looking at |
Thanks @msfroh for succinctly capturing our discussion offline!
This is a strong reason to not have
Yeah, this is big part of the problem.
Thanks for these pointers. I am working with @dzane17 to see if there is good way of getting the type information corresponding to field name. Although, I am also wondering if search request is targeting multiple indices, the same field might be indexed as multiple types across those indices. For example - a field called |
As far as I know, it's possible. In that case, on the one index, the query will run against an This is where the idea of a coordinator-level "field type" is a bit messy. Coordinators just deal with the syntactic structure of the query. The semantic field type context is only resolved on the shards (and different shards may resolve different types). |
Although possible, I would assume such cases to be rare for same workload type. Having different types associated for same field names would be pretty confusing for the client as well, I guess!? If true, from Query Insights perspective, we can ensure the field type across all the indices within search request is same, else we ignore the field type.
Ideally speaking, coordinators should be resolving "field type" at index-level for doing the necessary validations. Fanning out the request to 100s of shards and duplicating validation seems wasteful to me |
Following up from the discussions and PRs, looks like with @dzane17 's changes we should be able to get field name form the
Index mappings can be stored as part of the cluster state, which is held by the master nodes of the cluster, or in the metadata of each index on disk. If we get the index mapping for each search request, it will potentially have an impact on the performance. |
Background
Query shape currently has the ability to append field names. In addition, we'd like the option to add field data type (often called
fieldType
in code).In this example,
field1, field2, field3
is are field names andtext, date, keyword
are data types.Problem
It is impossible to determine data type from just the search source.
In this match query, field name is "title", however the exact data type is unknown. It could be
text
orkeyword
.What solution would you like?
Three possible solutions:
dataType
in each *QueryBuilder, *AggregationBuilder, *SortBuilder class during core search execution. Then when records are processed in query-insights plugin we can simply callaggBuilder.getDataType()
. I have seen builder classes get the data type mapping from shardContext like:Cons: Not all builders look up data type, Need to edit *Builder classes in core
Fetch mappings from query-insights plugin. Then we can get data type from known field name.
Note: Need to find a way to fetch _mapping data from query-insights
Ignore field data type in query shape
In many cases, data type is known given the query/agg/sort type. For example, data type for date histogram aggregations is always
date
, boolean query is always boolean data type. In these cases, adding data type adds no value:On the other hand, Range queries support a variety of data types so we would lose information with this option.
a. Date
b. Keyword
c. Numeric (which consists of int, long, float, double, short, byte)
Other Consideration
The text was updated successfully, but these errors were encountered: