-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow searching and aggregations on the nginx.access.url
field
#6349
Comments
+1 on using multi_fields as a first improvement. This can be added in the @karmi For the ingest part, it would be nice if there would be an ingest processor that can be used for url fields so we don't have to create it ourselfs. Or a field type in ES that supports URI would also be nice ;-) We could also use this for file paths ... |
I'm a first time user of beats in the past week (been using ELK for a while), and this issue jumped out to me immediately - my use case is to group by prefixes, (ie compare counts of +1 on better field typing for nginx.access.url |
There's the |
@jswidler I'm wonder if for your use case you could use the prefix query? https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html |
This should be fixed with ECS migration. |
Having URL fields indexed as keyword will indeed happen with the migration to ECS. I've made a note to look into the path hierarchy tokenizer in the future, and introduce this to ECS. I think it's an excellent idea. After the sprint to get to ECS 1.0, we will also look into the idea of publishing reuseable ingest pipelines (now that they're chainable), to enrich specific bits of information. I think extracting the file type of file names is an excellent idea as well, that may fit into these small ingest pipelines. I've added that to our idea list in elastic/ecs#181 |
I think we can close this |
The issue will not directly fixed with ECS but we will have an advice for the users todo on this. It will the addition of And if the url should be split, we have support for this in ECS: https://github.com/elastic/ecs#url Closing also based on the comment from @webmat above and the follow up issue. |
Currently, the
nginx.access.url
field in the Nginx module is set to be mapped askeyword
:beats/filebeat/module/nginx/access/_meta/fields.yml
Lines 26 to 27 in e605495
Therefore, the whole URL, eg.
/application.css
,application.js?t=12345
,index.html
,index.php?id=foo
,index.php?id=bats
, and so on, is indexed as one big string.This makes it unnecesary hard for users to answer questions like "what percentage of requests is for CSS assets", to satisfy a requirement like "filter this dashboard only for *.html requests".
I suggest, as a first step, that the Elasticsearch template is extended with the
fields
("multi-fields") configuration, so thenginx.access.url
field is indexed askeyword
by default, keeping the current behaviour, but also eg. with the standard analyzer, ie. asnginx.access.url.text
. (If this would be problematic, another field,nginx.access.url_text
can be added, but I'd rather use the Elasticsearch'sfields
mapping configuration, because that would allow adding other analyzers in future in a predictable and clean way.)As a next step, the whole URL path and parameters can be made more friendly for search and aggregations: eg. automatically extracting the
extension
in the ingest pipeline, therefore making a search or aggregation forextension:png
trivial and fast. Even the URL parameters could be extracted into key:value pairs, eg.key:id
andvalue:foo
, therefore making it possible to search forkey:id AND value:foo
without blowing up the mapping.The general motivation is to make the URL field as rich as eg. the
remote_ip
oruser_agent
fields, which automatically extract geographical information, or operation system details.The text was updated successfully, but these errors were encountered: