[Metricbeat] Simplify testing http Metricbeat modules #10648

ruflin · 2019-02-08T13:30:19Z

Currently most modules are tested against a docker container. This leads to long setup times and potentially flakyness. Also it requires additional setup to test actual changes on a module without running CI. The goal of this PR is to reduce this overhead, make it possible to easily test new data sets without having to write go code. Expected files were added to verify that changes had no effect on the generated data. The tests with the environment are still needed but should become less critical during development.

The structure and logic is inspired by the testing of the Filebeat modules. So far 3 metricsets were convert to test the implementation. It's all based on conventions:

Tests outputs from a JSON endpoint must go int _meta/testdata
A testdata/config.yml file must exists to specify url under which the testdata should be served
A golden files is generated by adding -expected.json.

For a metricset to be converted it must have the reporter interface, be http and json based and only have 1 endpoint requested at the time. All metricsets should be converted to the reporter interface.

As there is now a more global view on the testing of a metricset, this code can potentially also take over the check to make sure that all fields are documented or at least the generated files can be used to do these checks.

To support metricsets which generate one or multiple events the output is always an array of JSON objects. These arrays can also contain errors, meaning also invalid data can be tested.

The data.json we had so far was hard to update and changed every time it was updated because it was pulled from a life instance. For the metricsets that are switched over to this testing, it's not the case anymore. The data.json is generated from the first event in the docs.json. This is by convention and allows to have a docs.json with a specially interesting event. This should make condition checks for which event should be shown also partially obsolete.

Future work:

Support multiple endpoints: Elasticsearch metricsets do not work with the above model yet as they need multiple endpoints to be available at the same time. Config options for this could be introduced.
Support more then .json: Currently only .json is supported. More config options could be added to support other data formats for example for the apache module
Support other protocols then http: Not all modules are http based, 2-3 other comments protocols could be added.
Extend with additional config options: Some metricsets need additional config options to be set for testing. It should be possible to pass these as part of the config.yml file.
Generate the includes automatically: Currently if a new directory with testdata is added to a metricset, it will be discovered by the tests but then throws and error because the metricset is not registered. The metricset then has to be manually added to the data_test.go file. This works for now but potentially should be automated.

The overall goal of all the above is to have Metricbeat modules more and more config based instead of golang code based.

jsoriano

I like the overall idea, it looks like a quick win for some http modules. I also like the idea of making things similar between filebeat and metricbeat modules.

But I think that it is going to be complicated to extend it for the future points you mention, if we want to mock some services we are going to need to add some logic for different endpoints, or different responses in the same endpoint, or different protocols. If we do all this in a single generic place this is going to grow in complexity.
It works reasonably well in filebeat because there most of the times we read from files, but it starts to be tricky even there when we want to test with other inputs (#8140) or configs (#10182).

Other possible approach could be to mimic httptest, and have some kind of helper to create mock servers, like a generalization of this for HTTP, and other similar things for other protocols. We'd still need code to define tests, but I think we are still going to need go code for most of metricbeat modules.

(And btw, +1000 to the idea of bringing to go the check for documented fields).

metricbeat/mb/testing/data_test.go

metricbeat/module/traefik/health/_meta/testdata/docs.json

cachedout · 2019-02-11T23:49:43Z

The tests with the environment are still needed but should become less critical during development.

I take this to mean that the idea here is that the Docker environments would still be used in the CI as I (think?) they are now and that this new technique would just be something lighter and faster that a developer could use directly?

ruflin · 2019-02-12T09:22:08Z

@cachedout Exactly. To test most cases a .json output from for example Elasticsearch 6.5.2 can be put into the directory and it's directly tested. This also allows us to test outputs from lots of different versions. The docker setup for development purpose would be more use to get these example json docs out.

Taking the Elasticsearch module as an example here, I expect us long term run the tests against many different versions to detect edge cases which might not be covered in the JSON files. These tests would not run as part of each PR but only on master or even a separate testing environment and report in case it finds some edge cases. Each module would then provide a tests matrix config against which versions things should be tested.

ruflin · 2019-02-12T09:28:28Z

@jsoriano The changes here definitively got heavily influenced by https://github.com/elastic/beats/blob/master/metricbeat/module/rabbitmq/mtest/server.go and I expect us to end up with something similar to that but in a more generic way. I initially started this PR with a more complex config for the tests but then threw it all away as I realised that we can probably cover 80% of the http module with a much simpler approach. What about the other 20%? Either we can extend the framework to fit, build something specific for these or keep it in go code. All of this fine as long all test frameworks stay reasonably simple.

In general I agree with all the abstraction requests you made above and I'm pretty sure at one stage we will need to implement them. But I prefer to implement them when we get to the point that we need them and have a test require it. So instead of having one massive PR I rather have 20 small ones improving the tests and get them in quickly. I hope that works for you.

Currently most modules are tested against a docker container. This leads to long setup times and potentially flakyness. Also it requires additional setup to test actual changes on a module without running CI. The goal of this PR is to reduce this overhead, make it possible to easily test new data sets without having to write go code. Expected files were added to verify that changes had no effect on the generated data. The tests with the environment are still needed but should become less critical during development. The structure and logic is inspired by the testing of the Filebeat modules. So far 3 metricsets were convert to test the implementation. It's all based on conventions: * Tests outputs from a JSON endpoint must go int `_meta/testdata` * A `testdata/config.yml` file must exists to specify url under which the testdata should be served * A golden files is generated by adding `-expected.json`. For a metricset to be converted it must have the reporter interface, be http and json based and only have 1 endpoint requested at the time. All metricsets should be converted to the reporter interface. As there is now a more global view on the testing of a metricset, this code can potentially also take over the check to make sure that all fields are documented or at least the generated files can be used to do these checks. To support metricsets which generate one or multiple events the output is always an array of JSON objects. These arrays can also contain errors, meaning also invalid data can be tested. The `data.json` we had so far was hard to update and changed every time it was updated because it was pulled from a life instance. For the metricsets that are switched over to this testing, it's not the case anymore. The `data.json` is generated from the first event in the `docs.json`. This is by convention and allows to have a `docs.json` with a specially interesting event. This should make condition checks for which event should be shown also partially obsolete. Future work: * Support multiple endpoints: Elasticsearch metricsets do not work with the above model yet as they need multiple endpoints to be available at the same time. Config options for this could be introduced. * Support more then .json: Currently only .json is supported. More config options could be added to support other data formats for example for the apache module * Support other protocols then http: Not all modules are http based, 2-3 other comments protocols could be added. * Extend with additional config options: Some metricsets need additional config options to be set for testing. It should be possible to pass these as part of the config.yml file. The overall goal of all the above is to have Metricbeat modules more and more config based instead of golang code based.

ruflin · 2019-03-05T12:16:54Z

jenkins, test this

This change is based on #10648 to migrate to golden files instead of the dynamically generated data files. This adds also support for query params to the testing framework.

ruflin added review Metricbeat Metricbeat :Testing labels Feb 8, 2019

ruflin requested review from a team as code owners February 8, 2019 13:30

jsoriano reviewed Feb 8, 2019

View reviewed changes

ruflin added 4 commits March 4, 2019 14:46

add missing files and headers

06f0d1f

apply review feedback

bf7b30d

fix typo

976e12c

ruflin force-pushed the metricbeat-module-data-json-generation branch from 66cb518 to 976e12c Compare March 4, 2019 13:46

debug windows

243a617

ruflin mentioned this pull request Mar 5, 2019

[Metricbeat] Migrate Ceph cluster_disk to use ReporterV2 interface #10990

Merged

sayden mentioned this pull request Mar 5, 2019

[Metricbeat] Improve / add integration tests of Ceph #11080

Closed

use path seperator

e17d977

ruflin merged commit cca36b2 into elastic:master Mar 5, 2019

ruflin deleted the metricbeat-module-data-json-generation branch March 5, 2019 13:12

ruflin mentioned this pull request Mar 6, 2019

[Metricbeat] Add http endpoint tests for php_fpm module #11088

Merged

sayden mentioned this pull request Mar 6, 2019

[Metricbeat] Migrate Kubernetes state_container Metricset to use ReporterV2 interface #10858

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Simplify testing http Metricbeat modules #10648

[Metricbeat] Simplify testing http Metricbeat modules #10648

ruflin commented Feb 8, 2019

jsoriano left a comment

cachedout commented Feb 11, 2019

ruflin commented Feb 12, 2019

ruflin commented Feb 12, 2019

ruflin commented Mar 5, 2019

[Metricbeat] Simplify testing http Metricbeat modules #10648

[Metricbeat] Simplify testing http Metricbeat modules #10648

Conversation

ruflin commented Feb 8, 2019

jsoriano left a comment

Choose a reason for hiding this comment

cachedout commented Feb 11, 2019

ruflin commented Feb 12, 2019

ruflin commented Feb 12, 2019

ruflin commented Mar 5, 2019