-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added http request and response content-type field #554
base: main
Are you sure you want to change the base?
Conversation
The content-type field can be useful to identify what type of data is being transferred over http. The value isn't always accurate as a web-server can set the content-type to any value.
Yes, we want to add support (or at least guidance) for HTTP headers in ECS. Rather than adding Elastic APM already does this (field defs, sample event). APM nests the captured headers with their original capitalization. This approach is beautiful in its simplicity, but it conflicts with ECS' principle of using only lowercase letters. There's a few factors at play in thinking about how we want to approach this. Capitalization
Which headers to support
What do you think @ruflin @MikePaquette? Personally would be inclined to take the pragmatic route, and make an exception on capitalization for HTTP headers. This makes sense to me, because of the arbitrary and unknowable amount of headers that can be important to people, and the simplicity of implementation. I would actually go with what APM does pretty much as is. |
@webmat I agree with the approach APM has taken, and think the exception to the capitalization guideline is warranted in protocol-specific "extended" fields like this. 👍 |
I'm also good with this approach, especially as we don't define these fields in ECS but only provide the "box" for it. |
Actually the APM server canonicalizes the headers. In case multiple headers are sent up with the same header name but different casing, the values would end up in the same header field. E.g an event with following headers: is stored in ES as
|
@simitt Thanks for adding this bit of context. This may affect the approach we take. I'm curious, do agents and servers typically respect non-canonical headers, when they correspond to a canonical header? In other words, if someone specifies "content-type" without the caps, would a web server respect this as if the agent had specified "Content-Type"? Actually the underlying question is: is this the reason to canonicalize the headers? If not, why do so? |
@webmat per HTTP/1.1 specification on header fields (RFC 7230):
Specifically from a security analyst perspective, seeing differences in casing is informative. Some malware will actually transform header fields with random capitalization to evade detection from simple text-matching signatures. Additionally, some malware will send two copies of the same header key with a different value for detection evasion. A naïve logging solution will overwrite the first value with the contents of the second value, while the application will likely follow the branch of the first value. From an APM perspective, I could see canonicalizing the headers as useful, since how your software responds to a given request should be irrelevant to the case in which it is formatted. All that said, having a standard "box" in which to place client and server headers makes the most sense to me. Allow for a list of key-value pairs that are indexed and searchable, but needn't be defined by the schema, in the same way that HTTP no longer defines what specific headers must be sent. Additionally, this "box" should allow for a list of non-unique key-value pairs, or keys without values. Could also just be a list of strings 🤷♂ One other item to note is on The Zeek detected mime types could be stored in a In general, I think |
@webmat the main reason was to make it easier to search for the headers (if users make them searchable). It was also a side effect of using the standard go |
I wonder if we need to cover both cases with the same field. Could we follow APM here but also allow an option to store the raw header blob that then could be looked at if there are odd things around capatilization etc.? |
Expanding a bit more on that, HTTP headers are case-insensitive, es object keys are case-sensitive. APM users expect them to be stored in the same field so here we are.
Makes sense, as has been stated a few times, this can be left to the implementer. For APM, we'll canonicalize, for some logs maybe that's not appropriate if they're a main source of security data. Note that intermediate proxies and the like are free to change the case of headers, eg linkerd does this. Back to the original question, there are some headers that are so useful that extracting the information into dedicated fields is handy, #232 has some examples of using existing ECS outside of |
I didn't realize that. Thanks for the added context. Found this informative discussion thanks to this input. This discussion led me to the http/2 spec, which states that headers must be lowercased (see section 8.1.2)...
I think this is partially true. Not sure if this is an edge case, actually. A quick experiment on 7.3.1 shows that key names are case insensitive for aggs, but not for searches 🤔
The resulting mapping contains both the I think ECS could take the following stance:
This still leaves the question of how / whether to index these values. By default, both whole Given the point about mappings, is it possible to not index |
Or perhaps the simple answer is as @ruflin describes, and have a single raw text field for "all the headers", then a curated place for people to extract their most useful headers. This wouldn't directly solve @dcode's point about "what the headers say" v "what the payload contains". But I think this one can be solved by custom fields for now. I don't think it's a feature/capability that's widespread enough to warrant support directly in ECS. |
Plus 1 to content type header as mime_type as @dcode has mentioned.. it would be great if mime_type was a nested field.. as I believe mime type will be useful in additional schemas such as file, http, smtp, as well as data sources like AV/sandbox or anytime magic headers come into play. would then allow searching across all mime types |
@neu5ron Makes sense. Noted for later, as this isn't isn't specifically about HTTP headers. |
Any news on the approach that will be used? |
This PR is stale because it has been open for 60 days with no activity. |
The content-type field can be useful to identify what type of data is being transferred over http. The value isn't always accurate as a web-server can set the content-type to any value.
For example, security analysts might want to look at http requests to uncommon domains with the following content-type
application/x-www-form-urlencoded
Sometimes generic malware sets this content-type to application/x-www-form-urlencoded
Or the request accepts text and the proxy detects an executable being returned.