-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FieldCapabilities (_field_caps) API #23007
Conversation
Thanks @jimczi I have a couple of questions:
@spalger any thought about the above? |
For consistency yes. I'll fix this.
I thought that putting fields on top would simplify the navigation. I can switch back to a map keyed by index, not sure that it'd produce far fewer objects. It all depends on the number of indices vs number of fields. |
The typical case is requesting But before you change this back, I'd like to hear from @spalger about which access pattern makes more sense to him |
Looks nice @jimczi In regards to the {
"fields": {
"field1": {
"type": "string",
"searchable": true,
"aggregatable": true
},
"field2": {
"type": "conflict",
"type_indices": {
"keyword": ["t"],
"long": ["v"]
},
// should these still be true? Is a field that is both keyword and long really aggregatable?
"searchable": true,
"aggregatable": true
}
}
} re |
Also, I'm pretty sure we've decided not to go this route in the past, because consistency, but I'd like to reiterate my desire for for lists of objects as arrays rather than maps, like this: {
"fields": [
{
"name": "field1",
// ...
}
]
} |
I am on the fence regarding this. In most cases it's not but what if you use a script or whatever agg that can handle this ? For searching capability it's the same so I think it's more important to have good error messages when an agg or a query is conflicting rather than forbidding the capability completely.
The information is available so it could look like this:
What about map with consistent order ;) ? |
Yes consistency :) Can you explain why you would prefer an array? |
In this context it's mostly because the objects in the map are incomplete without the value they are keyed by, so our iteration logic of responses like this usually goes something like:
More generally though, my primary motivator is that the meaning of the keys in these maps is not described by the text alone. Only by knowing the API can know really know what the keys actually are. In the search/aggregation requests/responses for instance, some of the keys are for the type of query/aggregation, some are field names, some are user-generated id's, it's quite ambiguous. |
That's my preference. The request is executed with a specific list of indices and I think the merged value should represent the aggregatable-ness of that specific list (not a subset of that list).
A script agg isn't necessarily field dependent. In my opinion it's script dependent, and knowing whether a script is aggregatable is obviously a very different problem
However we present it, if we have the information I would prefer that it was included in the response |
If the above comment refers to the general case where there is no field conflict, then: If the comment refers only to the conflicting case, then I'm on the fence. Searches may still work but aggregations are much trickier and much more likely to fail. I like the syntax that @jimczi suggests in #23007 (comment) as it gives you all the information you need. The only downside is that there is no explicit |
I was talking about the conflict case: if I asked for the field capabilities of
Kibana is just going to treat conflicting fields with multiple types as not-aggregatable, regardless of what the API says, but I don't think the API should say that fields are aggregatable or searchable when that's the meaning behind it. |
I think this is problematic since multiple types are not necessary not-aggregatable. This is the problem with
Ok I can flip the logic to do that. For |
@spalger after some discussion in FixItFriday today, we've decided to go with the format suggested by @jimczi in #23007 (comment) as it is the most usable way we can present the required info. For The question arises when you have a field that has So the question is: should this field be marked as aggregatable or not? |
@sophiec20 Pinging you for input on this API as your team will be using it too |
@dimitris-athanasiou please review. |
I'm guessing that we all agree that the API should be correct as possible, but we don't seem to agree if we should treat I'm still leaning toward
|
OK, let's do that |
Looks good from ML side of things. |
@jimczi do you think it would be reasonable to include details like |
IMO no, the way we decide if the field is aggregatable or searchable depends on the field type. Some fields are searchable even though they have
... when |
@jimczi this looks great, do you think the response is summarized correctly below?
|
Only because of the field configuration in index "t". We have searchable and aggregatable for each type so the conflict is not taken into account. For Kibana you can consider every field with multiple types as conflicting but that should not be reflected in fieldcaps IMO |
Right, I'm suggesting that as how we could represent it to users in Kibana. Thanks! |
Another "capability" that we currently look at the mappings for is "sortable". Is aggregatable == sortable, or would these two attributes ever diverge in the future? If so would it be possible to also add a |
Oops, good catch @Bargs |
This change introduces a new API called `_field_caps` that allows to retrieve the capabilities of specific fields. This field centric API relies solely on the mapping of the requested indices to extract the following infos: * `types`: one or many field types if the type is not the same across the requested indices * `searchable`: Whether this field is indexed for search on at least one requested indices. * `aggregatable`: Whether this field can be aggregated on at least one requested indices. Example: ```` GET t,v/_field_caps?fields=field1,field2,field3 ```` returns: ```` { "fields": { "field1": { "_all": { "types": "text", "searchable": true, "aggregatable": false } }, "field3": { "_all": { "types": "text", "searchable": true, "aggregatable": false } }, "field2": { "_all": { "types": [ "keyword", "long" ], "searchable": true, "aggregatable": true } } } } ```` In this example `field1` and `field3` have the same type `text` across the requested indices `t` and `v`. Conversely `field2` is defined with two conflicting types `keyword` and `long`. Note that `_field_caps` does not treat this case as an error but rather return the list of unique types seen for this field. It is also possible to get a view of each index field capabilities with the `level` parameter: ```` GET t,v/_field_caps?fields=field2&level=indices ```` ```` { "fields": { "field2": { "t": { "types": "keyword", "searchable": true, "aggregatable": true }, "v": { "types": "long", "searchable": true, "aggregatable": true } } } } ````
I pushed another iteration that implements the format described in #23007 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great and clean. I left some questions about places where we use arrays that are logical sets, so maybe we should use sets directly? Also the documentation states clearly what null
values mean for the indices arrays, but I think it would help to reiterate it in the code to have this information in context.
|
||
private final String[] indices; | ||
private final String[] nonSearchableIndices; | ||
private final String[] nonAggregatableIndices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should those be sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builder uses sets that are transformed in arrays when the final object is built.
this.isAggregatable = isAggregatable; | ||
this.indices = indices; | ||
this.nonSearchableIndices = nonSearchableIndices; | ||
this.nonAggregatableIndices = nonAggregatableIndices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see from the serialization code that thiese arrays can be null sometimes, is there any validation we should do, eg. I suspect that if one array is not null then other arrays should not be null either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. indices
is null if all indices have the same type for the field, nonSearchableIndices
is null only if all indices are either searchable or non-searchable and nonAggregatableIndices
is null if all indices are either aggregatable or non-aggregatable.
} | ||
|
||
/** | ||
* The types of the field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/types/type/
FieldCapabilitiesIndexRequest(FieldCapabilitiesRequest request, String index) { | ||
super(index); | ||
Set<String> fields = new HashSet<>(); | ||
fields.addAll(Arrays.asList(request.fields())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's pass the list as a constructor arg of the HashSet?
public class FieldCapabilitiesIndexRequest | ||
extends SingleShardRequest<FieldCapabilitiesIndexRequest> { | ||
|
||
private String[] fields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we store it as a set?
this.responseMap = responseMap; | ||
} | ||
|
||
FieldCapabilitiesResponse() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you leave a comment that this ctor should only be used for serialization?
String[] concreteIndices = | ||
indexNameExpressionResolver.concreteIndexNames(clusterState, request); | ||
final AtomicInteger indexCounter = new AtomicInteger(); | ||
final AtomicInteger completionCounter = new AtomicInteger(concreteIndices.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use a single atomic int for both purposes? eg. we could consider things are done when indexCounter.getAndIncrement returns concreteIndices.length-1
?
for (int i = 0; i < indexResponses.length(); i++) { | ||
Object element = indexResponses.get(i); | ||
if (element instanceof FieldCapabilitiesIndexResponse == false) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert it is an exception?
|
||
@Override | ||
protected FieldCapabilitiesIndexResponse | ||
shardOperation(final FieldCapabilitiesIndexRequest request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird to go to a new line before the method name?
shardOperation(final FieldCapabilitiesIndexRequest request, | ||
ShardId shardId) { | ||
MapperService mapperService = | ||
indicesService.indexService(shardId.getIndex()).mapperService(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it use indexServiceSafe?
Thanks @jpountz
@Bargs @spalger if the field is @clintongormley next step is to deprecate the FieldStats API in 5.x and remove it in 6? |
@jimczi deprecate it, but let's not remove it from 6.0 just yet |
Cherry picked from a8250b2 This change introduces a new API called `_field_caps` that allows to retrieve the capabilities of specific fields. Example: ```` GET t,s,v,w/_field_caps?fields=field1,field2 ```` ... returns: ```` { "fields": { "field1": { "string": { "searchable": true, "aggregatable": true } }, "field2": { "keyword": { "searchable": false, "aggregatable": true, "non_searchable_indices": ["t"] "indices": ["t", "s"] }, "long": { "searchable": true, "aggregatable": false, "non_aggregatable_indices": ["v"] "indices": ["v", "w"] } } } } ```` In this example `field1` have the same type `text` across the requested indices `t`, `s`, `v`, `w`. Conversely `field2` is defined with two conflicting types `keyword` and `long`. Note that `_field_caps` does not treat this case as an error but rather return the list of unique types seen for this field.
🎉 🎉 💃 🎉 🎉 |
I like the API feature-wise, but can we reconsider the name? What does "caps" in "field_caps" stands for, "capabilities"? I understand the need to make the name short, but this is rather opaque? If the API will replace the "Field Stats" API, what if we just use |
} | ||
}, | ||
"params": { | ||
"fields": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi, I think this parameter should be set as required? Getting illegal_argument_exception
, "specified fields can't be null or empty" without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not required because you can use the body of the request to set the fields to retrieve.
See the body section at the end of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, but the question then is how to document this in the spec, when either the fields
attribute or the body
is required? /cc @clintongormley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's OK - elasticsearch will complain
This change introduces a new API called
_field_caps
that allows to retrieve the capabilities of specific fields.This field centric API relies solely on the mapping of the requested indices to extract the following infos:
types
: one or many field types if the type is not the same across the requested indicessearchable
: Whether this field is indexed for search on at least one requested indices.aggregatable
: Whether this field can be aggregated on at least one requested indices.Example:
returns:
In this example
field1
andfield3
have the same typetext
across the requested indicest
andv
.Conversely
field2
is defined with two conflicting typeskeyword
andlong
.Note that
_field_caps
does not treat this case as an error but rather return the list of unique types seen for this field.It is also possible to get a view of each index field capabilities with the
level
parameter:Closes #22438 (comment)