-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formalize dual text/keyword mappings #53181
Comments
Pinging @elastic/es-search (:Search/Mapping) |
Relates to #53020 |
Would the solution put into place to address text/keyword multi-fields also handle wildcards? In discussions I've seen it assumed that it would, but I wanted to clarify. |
Hopefully ECS has some fields that are mapped as |
Are |
@epixa As we've been thinking about how to migrate from existing mappings to |
I'd like to point out that ECS went with the reverse convention, on how to index strings. Since ECS started around monitoring, rather than full text search, the default datatype is I'm pointing this out because here we're talking about potentially building a shorthand notation that encodes the Elasticsearch default. As the proposal stands, it couldn't be used by users who are trying to build ECS-compatible indices.
I'm not sure I understand the 3rd point in the body of the issue. "This field": are we talking about wildcard? |
@webmat The idea here would be that ECS would not need to define multi-fields at all. ECS would define the field type as |
This makes sense, and would indeed be a good simplification. But this would force all string fields defined this way to be indexed both ways? ECS followed the Beats convention of trying to do |
Instead of introducing a new I like the idea of having a new Another way to organize |
Only if you want to support both text search (provided by
We could.
The space saving idea is interesting. I wonder if that would cause problems. Preserving support for scoring and multi-term queries would be challenging but I believe it could work? A problem with the proposal of this issue that we identified when discussing the |
This, indeed a substantial problem, and makes this proposal not worth it.
Speaking of these queries, if we go with a text field and a
It seems that most queries used in observability solutions are not concerned about textual scoring, but only filtering. Another idea to optimize space could be to have a text field and a |
This is the question that helped us discover this problem. :)
Agreed. |
We had a team discussion, and we are in favour to proceed with a
Some things still left for the discussion:
|
The I'd be happy to see us formalise these patterns by making them only target their respective field types. |
I had a couple questions about the proposal:
|
I think @jtibshirani makes a good point. In ECS we added a One place I think indexing both as |
Another interesting point that @jtibshirani is bringing forth, and which I've been thinking about as well, is the possibly overwhelming growing list of field types we offer. I think I understand the need for the new field types, and I've been loosely monitoring their addition more or less from an user point of view (finding out a new field type is being added, on the surface understanding the need for it and then just looking at the docs). And there are a lot of "specialized" field types out there, the later ones being added more from an "internal" usage need imho. I'd be curious if anyone else is thinking the same ^ and if we could better handle the way users look at our growing list of field types in the future. An example would be the way we document the field types. Now, almost all non-core field types are under the "specialized" section. I would argue that the IP field, for example, shouldn't be in the "specialized" section, but maybe in the "core" one. It has a long history, it's fairly easy to understand and it doesn't require an edge case scenario to be used (like it happens with most of the other specialized field types). I would even push this further and suggest a new field types section - "Advanced", maybe - where flattened, constant_keyword, histogram... should be moved. |
For me the biggest shift I've seen in requirements is the move away from traditional ideas of indexing human-authored text to indexing machine-generated text.
While useful on prose, none of the above is helpful when searching stacktraces, weblogs etc. This distinction between indexing for prose and indexing for exact-matching is perhaps the biggest change to reflect in our mapping choices. |
@jtibshirani Indeed,
|
Thanks @mayya-sharipova, this makes sense! If this is the main use case it would be nice to verify that we plan to make this change (starting to use a combined text/keyword for dynamic string mappings, instead of say switching to |
We had a discussion within the search team and have decided the following:
I am closing this issue because:
|
I am reopening this issue since we think that the feature request is still valid and could be beneficial for some use cases. |
We've re-discussed this with the team and concluded that this is something that we should put on our radar. While multi-fields are powerful and useful for a number of usecases, we would not use the text/keywrod multi-field in our default dynamic mappings for strings if we could go back: it forces consumers to decide which of the two field variants to access, eventually pushing the complexity to users for something that could be done transparently and only causes confusion for users. |
I'm coming to this discussion from making it easy to enable ECS in data streams in Elasticsearch without having to add all the fields, basically Elasticsearch knowing about ECS or set the right mappings by default. Because of this I looked at all the ECS fields and what I stumbled over are the kind of awkward keyword / text multi fields. Lets take Thinking about this in the context of ECS is that Two things I wonder, how would this apply to synthetic source and runtime fields. @nik9000 I on purpose wrote |
Synthetic source already loads from the keyword sub-field if available when it encounters a text field. Also support for fields that are stored separately is coming. In the context of runtime fields, we have seen users stumble upon the need for picking the right sub-field in scripts. They commonly forget the Having a unified |
#87480 covers stored |
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Our default dynamic mappings rules create both a text and a keyword field whenever they hit a JSON string:
And over the years, many clients implemented similar logic:
keyword
field,keyword
field,text
field.Is it logic that we should embed in Elasticsearch? Maybe we can find better ideas, but here is a proposal to get the discussion started:
exact_match
query, which tries to match against the whole string. It fails fortext
fields and has the same behavior asmatch
onkeyword
, numbers, ...text_keyword
field, which is essentially a wrapper around atext
and akeyword
field. Running aggregations or anexact_match
query against this field use the subkeyword
field whilematch
,query_string
,multi_match
andsimple_query_string
queries use thetext
field.text
+ subkeyword
mapping.The text was updated successfully, but these errors were encountered: