Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Simplify testing http Metricbeat modules #10648

Merged

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Feb 8, 2019

Currently most modules are tested against a docker container. This leads to long setup times and potentially flakyness. Also it requires additional setup to test actual changes on a module without running CI. The goal of this PR is to reduce this overhead, make it possible to easily test new data sets without having to write go code. Expected files were added to verify that changes had no effect on the generated data. The tests with the environment are still needed but should become less critical during development.

The structure and logic is inspired by the testing of the Filebeat modules. So far 3 metricsets were convert to test the implementation. It's all based on conventions:

  • Tests outputs from a JSON endpoint must go int _meta/testdata
  • A testdata/config.yml file must exists to specify url under which the testdata should be served
  • A golden files is generated by adding -expected.json.

For a metricset to be converted it must have the reporter interface, be http and json based and only have 1 endpoint requested at the time. All metricsets should be converted to the reporter interface.

As there is now a more global view on the testing of a metricset, this code can potentially also take over the check to make sure that all fields are documented or at least the generated files can be used to do these checks.

To support metricsets which generate one or multiple events the output is always an array of JSON objects. These arrays can also contain errors, meaning also invalid data can be tested.

The data.json we had so far was hard to update and changed every time it was updated because it was pulled from a life instance. For the metricsets that are switched over to this testing, it's not the case anymore. The data.json is generated from the first event in the docs.json. This is by convention and allows to have a docs.json with a specially interesting event. This should make condition checks for which event should be shown also partially obsolete.

Future work:

  • Support multiple endpoints: Elasticsearch metricsets do not work with the above model yet as they need multiple endpoints to be available at the same time. Config options for this could be introduced.
  • Support more then .json: Currently only .json is supported. More config options could be added to support other data formats for example for the apache module
  • Support other protocols then http: Not all modules are http based, 2-3 other comments protocols could be added.
  • Extend with additional config options: Some metricsets need additional config options to be set for testing. It should be possible to pass these as part of the config.yml file.
  • Generate the includes automatically: Currently if a new directory with testdata is added to a metricset, it will be discovered by the tests but then throws and error because the metricset is not registered. The metricset then has to be manually added to the data_test.go file. This works for now but potentially should be automated.

The overall goal of all the above is to have Metricbeat modules more and more config based instead of golang code based.

@ruflin ruflin requested review from a team as code owners February 8, 2019 13:30
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall idea, it looks like a quick win for some http modules. I also like the idea of making things similar between filebeat and metricbeat modules.

But I think that it is going to be complicated to extend it for the future points you mention, if we want to mock some services we are going to need to add some logic for different endpoints, or different responses in the same endpoint, or different protocols. If we do all this in a single generic place this is going to grow in complexity.
It works reasonably well in filebeat because there most of the times we read from files, but it starts to be tricky even there when we want to test with other inputs (#8140) or configs (#10182).

Other possible approach could be to mimic httptest, and have some kind of helper to create mock servers, like a generalization of this for HTTP, and other similar things for other protocols. We'd still need code to define tests, but I think we are still going to need go code for most of metricbeat modules.

(And btw, +1000 to the idea of bringing to go the check for documented fields).

metricbeat/mb/testing/data_test.go Show resolved Hide resolved
metricbeat/mb/testing/data_test.go Show resolved Hide resolved
metricbeat/mb/testing/data_test.go Outdated Show resolved Hide resolved
metricbeat/mb/testing/data_test.go Outdated Show resolved Hide resolved
metricbeat/mb/testing/data_test.go Show resolved Hide resolved
metricbeat/mb/testing/data_test.go Outdated Show resolved Hide resolved
@cachedout
Copy link
Contributor

The tests with the environment are still needed but should become less critical during development.

I take this to mean that the idea here is that the Docker environments would still be used in the CI as I (think?) they are now and that this new technique would just be something lighter and faster that a developer could use directly?

@ruflin
Copy link
Contributor Author

ruflin commented Feb 12, 2019

@cachedout Exactly. To test most cases a .json output from for example Elasticsearch 6.5.2 can be put into the directory and it's directly tested. This also allows us to test outputs from lots of different versions. The docker setup for development purpose would be more use to get these example json docs out.

Taking the Elasticsearch module as an example here, I expect us long term run the tests against many different versions to detect edge cases which might not be covered in the JSON files. These tests would not run as part of each PR but only on master or even a separate testing environment and report in case it finds some edge cases. Each module would then provide a tests matrix config against which versions things should be tested.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 12, 2019

@jsoriano The changes here definitively got heavily influenced by https://github.com/elastic/beats/blob/master/metricbeat/module/rabbitmq/mtest/server.go and I expect us to end up with something similar to that but in a more generic way. I initially started this PR with a more complex config for the tests but then threw it all away as I realised that we can probably cover 80% of the http module with a much simpler approach. What about the other 20%? Either we can extend the framework to fit, build something specific for these or keep it in go code. All of this fine as long all test frameworks stay reasonably simple.

In general I agree with all the abstraction requests you made above and I'm pretty sure at one stage we will need to implement them. But I prefer to implement them when we get to the point that we need them and have a test require it. So instead of having one massive PR I rather have 20 small ones improving the tests and get them in quickly. I hope that works for you.

ruflin added 4 commits March 4, 2019 14:46
Currently most modules are tested against a docker container. This leads to long setup times and potentially flakyness. Also it requires additional setup to test actual changes on a module without running CI. The goal of this PR is to reduce this overhead, make it possible to easily test new data sets without having to write go code. Expected files were added to verify that changes had no effect on the generated data. The tests with the environment are still needed but should become less critical during development.

The structure and logic is inspired by the testing of the Filebeat modules. So far 3 metricsets were convert to test the implementation. It's all based on conventions:

* Tests outputs from a JSON endpoint must go int `_meta/testdata`
* A `testdata/config.yml` file must exists to specify url under which the testdata should be served
* A golden files is generated by adding `-expected.json`.

For a metricset to be converted it must have the reporter interface, be http and json based and only have 1 endpoint requested at the time. All metricsets should be converted to the reporter interface.

As there is now a more global view on the testing of a metricset, this code can potentially also take over the check to make sure that all fields are documented or at least the generated files can be used to do these checks.

To support metricsets which generate one or multiple events the output is always an array of JSON objects. These arrays can also contain errors, meaning also invalid data can be tested.

The `data.json` we had so far was hard to update and changed every time it was updated because it was pulled from a life instance. For the metricsets that are switched over to this testing, it's not the case anymore. The `data.json` is generated from the first event in the `docs.json`. This is by convention and allows to have a `docs.json` with a specially interesting event. This should make condition checks for which event should be shown also partially obsolete.

Future work:

* Support multiple endpoints: Elasticsearch metricsets do not work with the above model yet as they need multiple endpoints to be available at the same time. Config options for this could be introduced.
* Support more then .json: Currently only .json is supported. More config options could be added to support other data formats for example for the apache module
* Support other protocols then http: Not all modules are http based, 2-3 other comments protocols could be added.
* Extend with additional config options: Some metricsets need additional config options to be set for testing. It should be possible to pass these as part of the config.yml file.

The overall goal of all the above is to have Metricbeat modules more and more config based instead of golang code based.
@ruflin ruflin force-pushed the metricbeat-module-data-json-generation branch from 66cb518 to 976e12c Compare March 4, 2019 13:46
@ruflin
Copy link
Contributor Author

ruflin commented Mar 5, 2019

jenkins, test this

@ruflin ruflin merged commit cca36b2 into elastic:master Mar 5, 2019
@ruflin ruflin deleted the metricbeat-module-data-json-generation branch March 5, 2019 13:12
ruflin added a commit that referenced this pull request Mar 6, 2019
This change is based on #10648 to migrate to golden files instead of the dynamically generated data files.

This adds also support for query params to the testing framework.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants