Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Filebeat's traefik.access to ECS. #9005

Merged
merged 5 commits into from
Dec 21, 2018

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Nov 9, 2018

Caveats:

  • traefik.access.remote_ip is not renamed. If it's an IP, it's copied to source.ip,
    otherwise copied to source.domain.
    • Note that the visualization on traffic source still depends on traefik.access.remote_ip.
  • I don't like the naïve approach I took to user_agent.version and user_agent.os.version 😂. Shall we postpone this? I'd like to get the bulk of ECS translations done and merged quick. Then I'd like to figure out some of the tricker common adjustments later, and apply en masse.
  • Not populating client/server, as the definitions are not yet imported
  • Blocked by Update the HTTP field set with ECS definitions as of beta 2 #9645 to migrate body_sent.bytes

TODO:

  • Convert relevant fields to ECS in the processing & expected log
    • Beat dissect
    • Ingest Node pipeline
      • Output IN "user_agent" results to ECS user_agent field set
      • Everything else
  • Remove unneeded definitions in traefik/access/_meta/fields.yml
  • Create field aliases
  • Update ECS-migration.yml file
  • Changelog
  • Revert changes to Dashboards and ML
  • Coerce ints: bytes, request_count, status_code
  • Rebase
  • Get rid of url.original definition after rebase?
  • Redo the PR including as much of WIP Filebeat modules adjustments for ECS Beta 2 #9684 as possible
  • Review int coercions to use :long
  • Leverage the new .address fields for ambiguous address instead of traefik.access.remote_ip
  • user_agent.version

@webmat webmat self-assigned this Nov 9, 2018
@webmat webmat requested a review from ruflin November 9, 2018 05:11
@webmat webmat added in progress Pull request is currently in progress. module review Filebeat Filebeat ecs labels Nov 9, 2018
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably reviewed a bit early, still leave the comments in. No need to comment on the ones you going to tackle anyways.

filebeat/_meta/fields.common.yml Outdated Show resolved Hide resolved
filebeat/module/traefik/access/config/traefik-access.yml Outdated Show resolved Hide resolved
filebeat/module/traefik/access/ingest/pipeline.json Outdated Show resolved Hide resolved
filebeat/module/traefik/access/ingest/pipeline.json Outdated Show resolved Hide resolved
filebeat/module/traefik/access/test/test.log Outdated Show resolved Hide resolved
@webmat webmat force-pushed the ecs-traefik-access branch from a5d09e1 to e17ba12 Compare November 26, 2018 21:20
@webmat webmat removed the in progress Pull request is currently in progress. label Nov 26, 2018
@webmat webmat changed the title [WIP] Convert Filebeat's traefik.access to ECS. Convert Filebeat's traefik.access to ECS. Nov 26, 2018
@webmat
Copy link
Contributor Author

webmat commented Nov 26, 2018

@ruflin Just got back to this one and finished it. Will be the first decent test run it gets, but I don't expect anything weird this time. Been testing it pretty thoroughly locally (now that I understand better full testsuite).

If you could review as well, that would be great. I'd like to merge this one tomorrow too, and pick up the next PRs with nothing left holding me back ;-)

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field changes LGTM but changes to dashboards should be reverted.

@webmat
Copy link
Contributor Author

webmat commented Nov 27, 2018

Indeed, that was my very first PR. I will revert this.

@webmat webmat force-pushed the ecs-traefik-access branch from fa384a1 to 3cfc74f Compare November 29, 2018 15:07
@webmat
Copy link
Contributor Author

webmat commented Nov 29, 2018

@ruflin Renames in Kibana objects and ML are gone now. Should be good for a final review.

I also introduced a few unrelated improvements in ecs-migration.yml here: I had forgotten mentions of module.user_agent.* to user_agent.*, and I added comments between each module's section. This will really help with future rebases LOL

@webmat
Copy link
Contributor Author

webmat commented Nov 29, 2018

Only failure is unrelated. Processor test for field rename, AFAICT: https://beats-ci.elastic.co/job/elastic+beats+pull-request+multijob-darwin/2020/

@webmat
Copy link
Contributor Author

webmat commented Nov 29, 2018

jenkins test this

@webmat webmat changed the title Convert Filebeat's traefik.access to ECS. WIP Convert Filebeat's traefik.access to ECS. Dec 20, 2018
@webmat webmat added in progress Pull request is currently in progress. and removed review labels Dec 20, 2018
@webmat webmat force-pushed the ecs-traefik-access branch from 801a07f to 7145f97 Compare December 20, 2018 19:46
@webmat webmat requested a review from a team as a code owner December 20, 2018 19:46
@webmat
Copy link
Contributor Author

webmat commented Dec 20, 2018

This was the very first ECS transition PR I wrote, and it had a lot of nasty history. So the latest push is a full rewrite of the PR on the latest master. This warrants a full review again.

This is also the first access log I touch after ECS Beta 2. As such, it goes a bit farther than the previous access log migrations in migrating to ECS.

Finally, there are 3 new caveats. Things that are missing from this PR, but that I think should be done as follow-up PRs, as they are blocked by various dependencies.

@webmat webmat changed the title WIP Convert Filebeat's traefik.access to ECS. Convert Filebeat's traefik.access to ECS. Dec 20, 2018
@webmat webmat added review and removed in progress Pull request is currently in progress. labels Dec 20, 2018
@webmat
Copy link
Contributor Author

webmat commented Dec 21, 2018

jenkins, test this

@webmat
Copy link
Contributor Author

webmat commented Dec 21, 2018

Failure in Jenkins Filebeat build is caused by the script.max_compilations_rate error: link

Seems like the attempt to raise the limit (#9613) is not working. The error still shows the 75/5m default...

Issue to fix is #9587.

Full paste of the error, for posterity:

Error Message
not error expected but got: {u'http': {u'version': u'1.0', u'request': {u'method': u'GET'}, u'response': {u'status_code': 200}}, u'log': {u'file': {u'path': u'/go/src/github.com/elastic/beats/filebeat/module/traefik/access/test/test.log'}, u'offset': 1581}, u'url': {u'original': u'/apache_pb.gif'}, u'traefik': {u'access': {u'body_sent': {u'bytes': 2326}, u'user_identifier': u'-'}}, u'@timestamp': u'2000-10-10T20:55:36.000Z', u'read_timestamp': u'2018-12-21T01:39:33.544Z', u'agent': {u'hostname': u'5b3966f996f3', u'type': u'filebeat', u'version': u'7.0.0'}, u'source': {u'ip': u'127.0.0.1', u'address': u'127.0.0.1'}, u'host': {u'name': u'5b3966f996f3'}, u'user': {u'name': u'frank'}, u'error': {u'message': u'[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting'}, u'input': {u'type': u'log'}, u'event': {u'module': u'traefik', u'dataset': u'access'}}
-------------------- >> begin captured stdout << ---------------------
Using elasticsearch: http://elasticsearch:9200
Testing traefik/access on /go/src/github.com/elastic/beats/filebeat/tests/system/../../module/traefik/access/test/test.log

--------------------- >> end captured stdout << ----------------------
Stacktrace
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/go/src/github.com/elastic/beats/filebeat/build/python-env/local/lib/python2.7/site-packages/parameterized/parameterized.py", line 392, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/go/src/github.com/elastic/beats/filebeat/tests/system/test_modules.py", line 91, in test_fileset_file
    cfgfile=cfgfile)
  File "/go/src/github.com/elastic/beats/filebeat/tests/system/test_modules.py", line 140, in run_on_file
    obj)
not error expected but got: {u'http': {u'version': u'1.0', u'request': {u'method': u'GET'}, u'response': {u'status_code': 200}}, u'log': {u'file': {u'path': u'/go/src/github.com/elastic/beats/filebeat/module/traefik/access/test/test.log'}, u'offset': 1581}, u'url': {u'original': u'/apache_pb.gif'}, u'traefik': {u'access': {u'body_sent': {u'bytes': 2326}, u'user_identifier': u'-'}}, u'@timestamp': u'2000-10-10T20:55:36.000Z', u'read_timestamp': u'2018-12-21T01:39:33.544Z', u'agent': {u'hostname': u'5b3966f996f3', u'type': u'filebeat', u'version': u'7.0.0'}, u'source': {u'ip': u'127.0.0.1', u'address': u'127.0.0.1'}, u'host': {u'name': u'5b3966f996f3'}, u'user': {u'name': u'frank'}, u'error': {u'message': u'[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting'}, u'input': {u'type': u'log'}, u'event': {u'module': u'traefik', u'dataset': u'access'}}
-------------------- >> begin captured stdout << ---------------------
Using elasticsearch: http://elasticsearch:9200
Testing traefik/access on /go/src/github.com/elastic/beats/filebeat/tests/system/../../module/traefik/access/test/test.log

--------------------- >> end captured stdout << ----------------------
Standard Output
Using elasticsearch: http://elasticsearch:9200
Testing traefik/access on /go/src/github.com/elastic/beats/filebeat/tests/system/../../module/traefik/access/test/test.log

@webmat
Copy link
Contributor Author

webmat commented Dec 21, 2018

jenkins, test this

"user_agent.original": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
"user_agent.os.full_name": "Linux",
"user_agent.os.name": "Linux",
"user_agent.os.version": "..",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what the second caveat is about. Right now the UA parser gives us major/minor/patch most of the time, and doesn't give us a full version string.

So I'm setting user_agent.version and user_agent.os.version (here and here respectively) with a trivial interpolation. To actually do this right, ideally IN would give this back to us as a full version string (may even be better wrt test versions like betas), second best approach would be to put together a small reuseable pipeline that rebuilds the version string based on these 3-4 fields that may or may not be there, and leverage that everywhere at once.

And the worst but simplest approach was simply to interpolate directly there. Even if I guard it with an if on the presence of .major, we'll have broken-looking version numbers whenever another one of the numbers is missing (like here). So I didn't want to waste a bunch of time on this.

Another alternative for this PR is simply to address this later, and leave major/minor/patch in place for now, like all of the other access log modules.

Copy link
Contributor

@ruflin ruflin Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM to delay/skip this for now.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor comment.

++ for the caveats and skipping them for now.

@webmat webmat merged commit 119e5e5 into elastic:master Dec 21, 2018
@webmat webmat deleted the ecs-traefik-access branch December 21, 2018 17:26
@webmat
Copy link
Contributor Author

webmat commented Dec 21, 2018

GitHub lost my merge commit and went with the issue definition again 🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants