Added http request and response content-type field #554

mbudge · 2019-09-11T17:30:32Z

The content-type field can be useful to identify what type of data is being transferred over http. The value isn't always accurate as a web-server can set the content-type to any value.

For example, security analysts might want to look at http requests to uncommon domains with the following content-type
application/x-www-form-urlencoded

Sometimes generic malware sets this content-type to application/x-www-form-urlencoded
Or the request accepts text and the proxy detects an executable being returned.

The content-type field can be useful to identify what type of data is being transferred over http. The value isn't always accurate as a web-server can set the content-type to any value.

webmat · 2019-09-23T20:53:02Z

Yes, we want to add support (or at least guidance) for HTTP headers in ECS.

Rather than adding content_type directly under request. and response., I think it would make sense to define a place where all headers go, such as http.request.headers.* and http.response.headers.*.

Elastic APM already does this (field defs, sample event). APM nests the captured headers with their original capitalization. This approach is beautiful in its simplicity, but it conflicts with ECS' principle of using only lowercase letters.

There's a few factors at play in thinking about how we want to approach this.

Capitalization

Insisting that HTTP headers be lowercased and use underscores (e.g. "Content-Type" becoming "content_type") will introduce unnecessary difficulties in mapping these header fields to ECS.
Related to this, trying to enforce any name change on headers may create situations where one legit header overrides another. It's a bit contrived, but an HTTP request/response could very well have both a custom "content_type" header as well as the standard "Content-Type" header...

Which headers to support

As a schema, ECS could simply provide guidance on how and where to properly store header fields, as there's too many possibilities to list them all.
- This would ensure users are able to capture any header they care about.
- There may of course be additional guidance on which ones users/integrations should try to capture, to establish a baseline of what's most useful / commonly seen.
Or perhaps we don't care about having support for arbitrary headers, and we'd prefer to define a specific list of a few supported headers in ECS?
- If that's the case, then the problem of headers overwriting one another is goes away.

What do you think @ruflin @MikePaquette?

Personally would be inclined to take the pragmatic route, and make an exception on capitalization for HTTP headers. This makes sense to me, because of the arbitrary and unknowable amount of headers that can be important to people, and the simplicity of implementation. I would actually go with what APM does pretty much as is.

cc @simitt @graphaelli

MikePaquette · 2019-09-24T02:49:58Z

@webmat I agree with the approach APM has taken, and think the exception to the capitalization guideline is warranted in protocol-specific "extended" fields like this.

👍

ruflin · 2019-09-24T08:53:17Z

I'm also good with this approach, especially as we don't define these fields in ECS but only provide the "box" for it.

simitt · 2019-09-24T09:04:11Z

Actually the APM server canonicalizes the headers. In case multiple headers are sent up with the same header name but different casing, the values would end up in the same header field.

E.g an event with following headers: { "response": { "headers": { "content-type": "foo","Content-Type": "bar", "MyHeader":"abc" , "myheader": "xyz" }}}

is stored in ES as

{ 
  "http" : {
    "response" : {
      "headers" : {
        "Myheader" : [
          "abc",
          "xyz"
        ],
        "Content-Type" : [
          "foo",
          "bar"
        ]
      }
    }
  }
}

webmat · 2019-09-24T15:39:34Z

@simitt Thanks for adding this bit of context. This may affect the approach we take.

I'm curious, do agents and servers typically respect non-canonical headers, when they correspond to a canonical header?

In other words, if someone specifies "content-type" without the caps, would a web server respect this as if the agent had specified "Content-Type"?

Actually the underlying question is: is this the reason to canonicalize the headers? If not, why do so?

dcode · 2019-09-24T20:38:55Z

@webmat per HTTP/1.1 specification on header fields (RFC 7230):

consists of a case-insensitive field name followed
by a colon (":"), optional leading whitespace, the field value, and
optional trailing whitespace.

Specifically from a security analyst perspective, seeing differences in casing is informative. Some malware will actually transform header fields with random capitalization to evade detection from simple text-matching signatures. Additionally, some malware will send two copies of the same header key with a different value for detection evasion. A naïve logging solution will overwrite the first value with the contents of the second value, while the application will likely follow the branch of the first value.

From an APM perspective, I could see canonicalizing the headers as useful, since how your software responds to a given request should be irrelevant to the case in which it is formatted.

All that said, having a standard "box" in which to place client and server headers makes the most sense to me. Allow for a list of key-value pairs that are indexed and searchable, but needn't be defined by the schema, in the same way that HTTP no longer defines what specific headers must be sent. Additionally, this "box" should allow for a list of non-unique key-value pairs, or keys without values. Could also just be a list of strings 🤷‍♂

One other item to note is on content-type. Zeek, in particular, records orig_mime_types and resp_mime_types, which captures the detected mime types of one or more files that are transferred by the originator and responder. This is distinct from what appears in the request or response headers, which could also be logged from a couple of common zeek scripts circulating in the community.

The Zeek detected mime types could be stored in a file attribute, which follows the Zeek logic since that's also in file.mime_type. However, detecting a difference in the declared content type versus the actual file contents would not be directly possible with a query if it was nested in a list of headers, I think.

In general, I think http.request.headers and http.response.headers make sense, but content-type might be a special case like host that warrants a dedicated field.

simitt · 2019-09-25T08:19:35Z

@webmat the main reason was to make it easier to search for the headers (if users make them searchable). It was also a side effect of using the standard go http.Header functionality, that was used to collect headers (server side) and ensure none of the values is overwritten.

ruflin · 2019-09-25T11:00:38Z

I wonder if we need to cover both cases with the same field. Could we follow APM here but also allow an option to store the raw header blob that then could be looked at if there are odd things around capatilization etc.?

graphaelli · 2019-09-25T12:59:18Z

the main reason was to make it easier to search for the headers

Expanding a bit more on that, HTTP headers are case-insensitive, es object keys are case-sensitive. APM users expect them to be stored in the same field so here we are.

Specifically from a security analyst perspective, seeing differences in casing is informative.

Makes sense, as has been stated a few times, this can be left to the implementer. For APM, we'll canonicalize, for some logs maybe that's not appropriate if they're a main source of security data.

Note that intermediate proxies and the like are free to change the case of headers, eg linkerd does this.

Back to the original question, there are some headers that are so useful that extracting the information into dedicated fields is handy, #232 has some examples of using existing ECS outside of http fields for this.

webmat · 2019-09-25T14:37:36Z

intermediate proxies and the like are free to change the case of headers, eg linkerd does this

I didn't realize that. Thanks for the added context. Found this informative discussion thanks to this input. This discussion led me to the http/2 spec, which states that headers must be lowercased (see section 8.1.2)...

es object keys are case-sensitive

I think this is partially true. Not sure if this is an edge case, actually. A quick experiment on 7.3.1 shows that key names are case insensitive for aggs, but not for searches 🤔

PUT cap-diff/_doc/1
{ "method": "get" }
PUT cap-diff/_doc/2
{ "METHOD": "get" }

GET cap-diff/_search?q=method.keyword:*
# 1 hit

GET cap-diff/_search
{ "aggs" : { "methods" : { "terms" : { "field" : "method.keyword" } } } }
# 2 hits

The resulting mapping contains both the method and METHOD fields.

I think ECS could take the following stance:

Headers are recorded under http.request.headers.* and http.response.headers.*
Encourage implementations to lowercase the header names, but not mandatory. This is the direction the world is going :-)
If a header is passed multiple times, an array of each values is under the key name

This still leaves the question of how / whether to index these values. By default, both whole headers sections should not be indexed. Not only for the performance hit, but also because it opens a vector of attack, where any header passed by an agent now becomes an entry in the mapping.

Given the point about mappings, is it possible to not index headers.*, but selectively override and allow only the most useful headers (e.g. headers.content-type) to be indexed as keyword?

webmat · 2019-09-25T14:42:35Z

Or perhaps the simple answer is as @ruflin describes, and have a single raw text field for "all the headers", then a curated place for people to extract their most useful headers.

This wouldn't directly solve @dcode's point about "what the headers say" v "what the payload contains". But I think this one can be solved by custom fields for now. I don't think it's a feature/capability that's widespread enough to warrant support directly in ECS.

neu5ron · 2019-09-27T04:00:09Z

Plus 1 to content type header as mime_type as @dcode has mentioned.. it would be great if mime_type was a nested field.. as I believe mime type will be useful in additional schemas such as file, http, smtp, as well as data sources like AV/sandbox or anytime magic headers come into play. would then allow searching across all mime types *mime_type:$value

webmat · 2019-09-30T13:16:26Z

@neu5ron Makes sense. Noted for later, as this isn't isn't specifically about HTTP headers.

coudenysj · 2020-04-08T08:29:29Z

Any news on the approach that will be used?

github-actions · 2022-02-24T00:18:26Z

This PR is stale because it has been open for 60 days with no activity.

Added http request and response content-type field

6d55cf7

The content-type field can be useful to identify what type of data is being transferred over http. The value isn't always accurate as a web-server can set the content-type to any value.

webmat mentioned this pull request Oct 4, 2019

ECS - Squid Log mapping #549

Closed

neu5ron mentioned this pull request Feb 15, 2020

Mime Types #749

Closed

dcode mentioned this pull request Feb 24, 2020

Add initial stab at mime_type for file objects #760

Merged

ebeahan added the discuss label Jul 13, 2020

webmat mentioned this pull request Jul 20, 2020

added http.forwarded_for #874 #880

Closed

ebeahan mentioned this pull request Jul 31, 2020

Feat: make SQUID3 captures ecs compliant logstash-plugins/logstash-patterns-core#270

Merged

github-actions bot added the stale Stale issues and pull requests label Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added http request and response content-type field #554

Added http request and response content-type field #554

mbudge commented Sep 11, 2019

webmat commented Sep 23, 2019

MikePaquette commented Sep 24, 2019

ruflin commented Sep 24, 2019

simitt commented Sep 24, 2019

webmat commented Sep 24, 2019 •

edited

Loading

dcode commented Sep 24, 2019

simitt commented Sep 25, 2019

ruflin commented Sep 25, 2019

graphaelli commented Sep 25, 2019

webmat commented Sep 25, 2019

webmat commented Sep 25, 2019

neu5ron commented Sep 27, 2019 •

edited

Loading

webmat commented Sep 30, 2019

coudenysj commented Apr 8, 2020

github-actions bot commented Feb 24, 2022

Added http request and response content-type field #554

Are you sure you want to change the base?

Added http request and response content-type field #554

Conversation

mbudge commented Sep 11, 2019

webmat commented Sep 23, 2019

MikePaquette commented Sep 24, 2019

ruflin commented Sep 24, 2019

simitt commented Sep 24, 2019

webmat commented Sep 24, 2019 • edited Loading

dcode commented Sep 24, 2019

simitt commented Sep 25, 2019

ruflin commented Sep 25, 2019

graphaelli commented Sep 25, 2019

webmat commented Sep 25, 2019

webmat commented Sep 25, 2019

neu5ron commented Sep 27, 2019 • edited Loading

webmat commented Sep 30, 2019

coudenysj commented Apr 8, 2020

github-actions bot commented Feb 24, 2022

webmat commented Sep 24, 2019 •

edited

Loading

neu5ron commented Sep 27, 2019 •

edited

Loading