-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent usage of .raw, and discussion about .raw vs .keyword naming #87
Comments
We should definitively should fix I'm a bit hesitant to use |
Personally I use .keyword for text fields which rarely needs to be processed as keyword and .integer for other text types which rarely needs to be processed as integer if possible. From my experience this is more understandable for our users than .raw. In cases like user_agent.raw I use user_agent.line, but in user_agent.raw you can also see this is unparsed user_agent. For multi fields the .raw is very confusing because nobody can quickly see what is .raw - is it keyword/text/integer, raw of what ... |
I have no objection to changing |
Even though My preference would actually be to go with the ES default naming of
I initially stated that I didn't have a strong opinion, but as I'm writing this I'm convincing myself more and more ;-) |
We also discussed revisiting the current list of fields to see if more of them should be multi-field (#104). |
The field that actually started this whole discussion had been forgotten from elastic#87 and elastic#103 :-)
This issue is about two distinct problems that I think are closely related, hence the creation of a single issue.
Inconsistent usage of .raw
We currently have 7 fields ending in
.raw
in ECS. Here they are, listed with their type, and the "meaning" of.raw
in their context:.raw
event.raw
file.path.raw
file.target_path.raw
url.href.raw
url.path.raw
url.query.raw
user_agent.raw
In this list, the middle 5 are following one of the conventions for multi-field:
text
indexing for the top level field (e.g.file.path
) andkeyword
indexing for the nested field (e.g.file.path.raw
).The other two are not following that convention:
user_agent.raw
is the one breaking from the convention the most. It's not part of a multi-field, anduser_agent.raw
is actually of typetext
, not typekeyword
. Meaning a user could not use this field for aggregations, as opposed to what the.raw
convention establishes. It's named.raw
because it's the full user agent string, prior to breaking it up into name, OS, version fields & so on.event.raw
happens to be of typekeyword
which is good. But it's not actually part of a multi-field. It just means the original message is stored there.Naming of the nested field for multi-field
I'm not 100% clear on the exact timeline. But my experience with ElasticSearch in monitoring pipelines from time immemorial, has been using the
.raw
nomenclature for the name of a sub-field of typekeyword
. Since about Stack v5 (or perhaps ES 2.x and Kibana 4.x), I've seen the naming shift to using.keyword
for the nested field, instead of.raw
.Given the inconsistency I've outlined in the first part of this issue, I wonder if we shouldn't move to the new naming convention of using
.keyword
for multi-field, and having the freedom to use.raw
for fields where we actually mean an original value.If we decide to stick with the
.raw
naming convention for multi-field, I think we should address theevent.raw
anduser_agent.raw
inconsistencies, however. Perhaps rename themevent.original
anduser_agent.original
or something to that effect.I don't have a strong preference for
.keyword
or.raw
, but I think we need to address the inconsistency. I was curious what folks think about this.The text was updated successfully, but these errors were encountered: