Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] Metricbeat migration status logic #34871

Merged
merged 18 commits into from
Apr 17, 2019

Conversation

chrisronline
Copy link
Contributor

@chrisronline chrisronline commented Apr 10, 2019

Relates to #39010

This is step 1 from the ticket linked above. It contains the server logic and endpoint necessary to determine the migration state of each product based on the existence of certain indices.

We perform a single query across all monitoring indices, using a terms agg on the _index field. Then, we perform multiple terms subaggs for each stack product, determining from which product instance uuid the index exists for. Then, we collect the response and determine the state for each product: fully migrated, partially migrated, or internally collected.

As a way to provide support for a net new user experience (where the user has no monitoring setup), we are also attempting to detect the existence of certain products (mainly beats and logstash) based on the presence of other indices. This is a bit limiting as it will only look at the dedicated monitoring cluster for these indices (maybe we should do a CCS search here?) but it should provide some help.

This PR contains both mocha server tests and api integration tests (which is the majority of this diff).

Testing

Testing is limited to just Kibana and ES for the main part, but Logstash and Beats should be used for detection testing.

Main Testing - Part 1

  • Setup internal collection monitoring for all stack products (Kibana, ES, Logstash, Beats, APM)
  • POST the API and verify the response makes sense (/api/monitoring/v1/setup/collection)
    • We should see all as internal collectors

Main Testing - Part 2

  • Setup internal collection monitoring for Logstash and Beats
  • Use Metricbeat for collection of ES and Kibana but do not disable internal collection.
  • POST the API and verify the response makes sense (/api/monitoring/v1/setup/collection)
    • We should see Logstash and Beats as internal collectors, and Kibana and ES are partially migrated

Main Testing - Part 3

  • Setup internal collection monitoring for Logstash and Beats
  • Use Metricbeat for collection of ES and Kibana and disable internal collection.
  • POST the API and verify the response makes sense (/api/monitoring/v1/setup/collection)
    • We should see Logstash and Beats as internal collectors, and Kibana and ES are fully migrated

Detect Testing

  • Configure a beat to send data to the monitoring cluster, but do not enable monitoring
  • POST the API and verify the response makes sense (/api/monitoring/v1/setup/collection)
    • We should see that beats might be present in the api response

Multi-Cluster Testing

  • Configure multiple clusters in the above setups (probably just one as the behavior should be the same).
  • POST the API and verify the response makes sense (/api/monitoring/v1/setup/collection)
  • POST the cluster-specific API and verify the response makes sense (/api/monitoring/v1/setup/collection/{clusterUuid})

APM Testing

  • Only run Kibana from your local computer
  • Clone and setup this repo: https://github.com/elastic/apm-integration-testing
  • Start the apm testing environment: ./scripts/compose.py start --with-opbeans-node --no-kibana master
  • POST the API and verify the response makes sense (/api/monitoring/v1/setup/collection)
    • We should see that apm is showing up

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring

@chrisronline chrisronline marked this pull request as ready for review April 10, 2019 16:36
@ycombinator
Copy link
Contributor

POST the API...

Any reason these APIs are POSTs and not GETs? AFAICT none of them take or need to take request bodies.

@chrisronline
Copy link
Contributor Author

Any reason these APIs are POSTs and not GETs? AFAICT none of them take or need to take request bodies.

They both accept optional parameters (which aren't actually documented in the route defintion, so I need to do that) which are currently only used for testing. Maybe this is a bad idea? I needed someway to make the api integration tests work since the data for those tests isn't live

@ycombinator
Copy link
Contributor

With the cleanup you'll do for #34871 (comment), there's a non-test-only reason for the APIs to be POSTs instead of GETs. So I'm +1 to keep them as POSTs.

@ycombinator
Copy link
Contributor

I've gotten through all 3 Main Testing items so far and it's looking great! I'll finish off the remaining two test cases shortly.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@ycombinator
Copy link
Contributor

ycombinator commented Apr 10, 2019

I called the cluster-specific API with a non-existent cluster UUID like so:

curl -H 'kbn-xsrf: foobar' -X POST -v 'http://localhost:5601/api/monitoring/v1/setup/collection/chris_sucks' | jq .

I still got a 200 OK response. I think we should return a 404?

@ycombinator
Copy link
Contributor

I've gone through all test cases. There are a couple of unresolved comments, but otherwise this is looking very good!

@chrisronline
Copy link
Contributor Author

I updated the description to include testing details for APM which was just recently fixed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@ycombinator
Copy link
Contributor

@chrisronline I see you added a change for APM and addressed my mightExists comment. But it doesn't look like you've addressed my 404 comment yet. Let me know when you've addressed all the feedback from the previous round and I'll re-review this PR.

@chrisronline
Copy link
Contributor Author

@ycombinator Thanks for the reminder. Our current queries (at least I don't think so) make it possible to detect a cluster existing or not, so assuming that's true, we'd have to add a query to just check that. Does that seem accurate to you?

@ycombinator
Copy link
Contributor

Our current queries (at least I don't think so) make it possible to detect a cluster existing or not, so assuming that's true, we'd have to add a query to just check that. Does that seem accurate to you?

I'm not sure, I'll have to dig into the code to figure this out. I'll do that now and get back to you with what I find.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

return false;
}

return get(bucket, 'beat_type.buckets[0].key') === 'apm-server';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you're using _.get here, do you need the guard for bucket.beat_type above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear (or maybe what I'm proposing won't work) but I was thinking this function could just become:

function isBeatFromAPM(bucket) {
  return get(bucket, 'beat_type.buckets[0].key') === 'apm-server';
}

If bucket does not have a beat_type key, _.get should return a falsy value. Comparing that with apm-server would return false. So you wouldn't need the guard check.

* detected: {
* doesExist: boolean, // This product definitely exists but does not have any monitoring documents (kibana and ES)
* mightExist: boolean, // This product might exist based on the presence of some other indices
* }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this comment.

loadTestFile(require.resolve('./kibana_exclusive_mb'));
loadTestFile(require.resolve('./es_and_kibana_mb'));
loadTestFile(require.resolve('./es_and_kibana_exclusive_mb'));
loadTestFile(require.resolve('./detect_beats'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got ES, Kibana, and Beats covered here. Do you think we should add coverage for APM and Logstash too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup definitely should. Thanks for the reminder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this would just cover the detection logic. I'm not sure we can test between -mb- indices and non -mb- indices for beats, logstash or APM? Am I understanding that correctly? Because beats and APM won't be generating those indices and logstash doesn't support that yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is accurate.

At some point (hopefully soon) we'll have -mb- indices for Logstash. At that time we'd want to come back and add coverage for those here. For a future PR, of course!

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed code and left comments.

@chrisronline
Copy link
Contributor Author

Thanks @ycombinator. I addressed most of your comments. Will work on adding those additional tests

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@chrisronline
Copy link
Contributor Author

@ycombinator I added those tests and this is ready for another round, thanks!

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work! 👏👏👏

Looking forward to the the rest of the steps!

@chrisronline chrisronline merged commit 33f1123 into elastic:master Apr 17, 2019
@chrisronline chrisronline deleted the monitoring/mb_ux_2 branch April 17, 2019 13:17
chrisronline added a commit to chrisronline/kibana that referenced this pull request Apr 17, 2019
* Initial logic and tests for this

* Improve test

* Some cleanup and adding api integration tests

* Add comments

* Account for no documents returned

* Add detection logic and tests

* Ensure these fields are optional

* Update detection logic

* Beats, logstash and apm can only possibly exist

* Fix some issues with APM

* Fix tests

* Small updates based on needing to also include the filebeat index

* Fix issue with the reduce

* PR feedback

* More PR feedback

* Add additional tests

* Add additional tests
chrisronline added a commit that referenced this pull request Apr 17, 2019
* Initial logic and tests for this

* Improve test

* Some cleanup and adding api integration tests

* Add comments

* Account for no documents returned

* Add detection logic and tests

* Ensure these fields are optional

* Update detection logic

* Beats, logstash and apm can only possibly exist

* Fix some issues with APM

* Fix tests

* Small updates based on needing to also include the filebeat index

* Fix issue with the reduce

* PR feedback

* More PR feedback

* Add additional tests

* Add additional tests
@chrisronline
Copy link
Contributor Author

Backport:

7.x: bf9cf47

@elasticmachine
Copy link
Contributor

💔 Build Failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants