Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow searching and aggregations on the nginx.access.url field #6349

Closed
karmi opened this issue Feb 11, 2018 · 8 comments
Closed

Allow searching and aggregations on the nginx.access.url field #6349

karmi opened this issue Feb 11, 2018 · 8 comments
Labels
ecs enhancement Filebeat Filebeat module Team:Integrations Label for the Integrations team

Comments

@karmi
Copy link

karmi commented Feb 11, 2018

Currently, the nginx.access.url field in the Nginx module is set to be mapped as keyword:

- name: url
type: keyword

Therefore, the whole URL, eg. /application.css, application.js?t=12345, index.html, index.php?id=foo, index.php?id=bats, and so on, is indexed as one big string.

This makes it unnecesary hard for users to answer questions like "what percentage of requests is for CSS assets", to satisfy a requirement like "filter this dashboard only for *.html requests".

I suggest, as a first step, that the Elasticsearch template is extended with the fields ("multi-fields") configuration, so the nginx.access.url field is indexed as keyword by default, keeping the current behaviour, but also eg. with the standard analyzer, ie. as nginx.access.url.text. (If this would be problematic, another field, nginx.access.url_text can be added, but I'd rather use the Elasticsearch's fields mapping configuration, because that would allow adding other analyzers in future in a predictable and clean way.)

As a next step, the whole URL path and parameters can be made more friendly for search and aggregations: eg. automatically extracting the extension in the ingest pipeline, therefore making a search or aggregation for extension:png trivial and fast. Even the URL parameters could be extracted into key:value pairs, eg. key:id and value:foo, therefore making it possible to search for key:id AND value:foo without blowing up the mapping.

The general motivation is to make the URL field as rich as eg. the remote_ip or user_agent fields, which automatically extract geographical information, or operation system details.

@ruflin
Copy link
Contributor

ruflin commented Feb 12, 2018

+1 on using multi_fields as a first improvement. This can be added in the fields.yml here: #6349 Here is an example how multi_fields works in fields.yml: https://github.com/elastic/beats/blob/master/auditbeat/_meta/fields.common.yml#L26

@karmi For the ingest part, it would be nice if there would be an ingest processor that can be used for url fields so we don't have to create it ourselfs. Or a field type in ES that supports URI would also be nice ;-) We could also use this for file paths ...

@jswidler
Copy link

jswidler commented Feb 22, 2018

I'm a first time user of beats in the past week (been using ELK for a while), and this issue jumped out to me immediately - my use case is to group by prefixes, (ie compare counts of api/v1/* to api/v2/*).

+1 on better field typing for nginx.access.url

@karmi
Copy link
Author

karmi commented May 16, 2018

(...) Or a field type in ES that supports URI would also be nice ;-) We could also use this for file paths ...

There's the path_hierarchy tokenizer, which could be a good fit. But we would need to experiment with good support for both searching and aggregations.

@ruflin
Copy link
Contributor

ruflin commented May 16, 2018

@jswidler I'm wonder if for your use case you could use the prefix query? https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html

@ruflin ruflin added the Team:Integrations Label for the Integrations team label Nov 27, 2018
@alvarolobato
Copy link

This should be fixed with ECS migration.

@webmat
Copy link
Contributor

webmat commented Dec 3, 2018

Having URL fields indexed as keyword will indeed happen with the migration to ECS. I've made a note to look into the path hierarchy tokenizer in the future, and introduce this to ECS. I think it's an excellent idea. After the sprint to get to ECS 1.0, we will also look into the idea of publishing reuseable ingest pipelines (now that they're chainable), to enrich specific bits of information. I think extracting the file type of file names is an excellent idea as well, that may fit into these small ingest pipelines. I've added that to our idea list in elastic/ecs#181

@webmat
Copy link
Contributor

webmat commented Dec 3, 2018

I think we can close this

@ruflin
Copy link
Contributor

ruflin commented Dec 4, 2018

The issue will not directly fixed with ECS but we will have an advice for the users todo on this. It will the addition of .text as multifield proposed by karmi above.

And if the url should be split, we have support for this in ECS: https://github.com/elastic/ecs#url

Closing also based on the comment from @webmat above and the follow up issue.

@ruflin ruflin closed this as completed Dec 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecs enhancement Filebeat Filebeat module Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants