Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] Parity between usage data collection #34940

Closed
chrisronline opened this issue Apr 11, 2019 · 7 comments
Closed

[Monitoring] Parity between usage data collection #34940

chrisronline opened this issue Apr 11, 2019 · 7 comments
Labels

Comments

@chrisronline
Copy link
Contributor

Currently, we have two separate pieces of code that handle collecting usage data. This is because these pieces of code do something different with the data: one returns it from an api endpoint and the other ships it off to Elasticsearch through monitoring documents.

However, this isn't scalable as with Metricbeat now collecting and shipping usage data (using the api endpoint mentioned in the first piece of code above) to monitoring documents (like the second piece of code), we need to ensure parity or bugs start to crop up.

It will be hard to maintain this parity if the two pieces of code remain as separate pieces - we should unify them so it's not possible for them to deviate.

cc @tsullivan

@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring

@ycombinator
Copy link
Contributor

I don't know details / complexities of the implementation but conceptually it would be nice if there was a common piece of code responsible for collecting monitoring data, including formatting it correctly. Then the API endpoint code and the elasticsearch bulk shipping code could both call this common collection code. That would ensure that these parity bugs go away.

@ycombinator
Copy link
Contributor

ycombinator commented Apr 12, 2019

Another approach would be to decouple when stats are collected (by the various collectors within Kibana) from when the collected stats are used (either pulled via the GET api/stats endpoint or pushed via the bulk uploader to POST _monitoring/bulk).

To make this work, the Kibana server would keep collected stats in memory. The collectors would run whenever they are configured to and update their section of the in-memory collected stats. The GET api/stats code would read the stats from memory and serve them over HTTP, whenever requested. Likewise the bulk uploader would run at it's own frequency, read the stats from memory and push them to ES.

The nice thing about this decoupling is that the collectors can each run at whatever frequency makes sense to them. This might be especially beneficial when it comes to Kibana telemetry collection, which we might want to run rather infrequently.

Similarly, the bulk uploader could run at whatever frequency it wants to or be entirely disabled w/o affecting collection in any way. This could be useful when we want users to migrate to using Metricbeat for collection.

@chrisronline
Copy link
Contributor Author

chrisronline commented Apr 12, 2019

To add more information, here is a bit of a difference between how we poll data from the collectors.

GET /api/stats

This is an endpoint used by telemetry and MB monitoring collection. By default, it returns the result of this collector set. If you provide an optional extended=true query parameter (which MB monitoring collection does), it will merge in the data fetched from all usage collectors (code path is here to here to here). This results in the following usage collectors fetching data:

[
  'sample-data',
  'kql',
  'localization',
  'kibana',
  'spaces',
  'ml',
  'apm',
  'maps',
  'canvas',
  'cloud',
  'infraops',
  'rollups',
  'upgrade-assistant-telemetry',
  'visualization_types',
  'ui_metric',
  'reporting'
]

Monitoring Polling

This is how internal monitoring works within Kibana. At the configured interval (default is 10s), we fetch all collectors (except for the duplicate ops collector from OSS). That list is:

[
  'sample-data',
  'kql',
  'localization',
  'kibana_stats',
  'kibana',
  'kibana_settings',
  'spaces',
  'ml',
  'apm',
  'maps',
  'canvas',
  'cloud',
  'infraops',
  'rollups',
  'upgrade-assistant-telemetry',
  'visualization_types',
  'ui_metric',
  'reporting'
]

They both utilize methods off the OSS collector set class. It makes sense to put the consolidated logic here as both already have access to and are currently using it.

@ycombinator
Copy link
Contributor

ycombinator commented Apr 12, 2019

They both utilize methods off the OSS collector set class. It makes sense to put the consolidated logic here as both already have access to and are currently using it.

@chrisronline As far as achieving parity goes, what you're proposing above will work, as long as all collection happens synchronously with either the GET /api/stats request or when the Monitoring Polling runs.

However, we will still need to address the issue of separating the telemetry collection interval from the rest-of-kibana-stats collection interval and making this separation work while keeping parity between GET /api/stats and Monitoring Polling. I'm not sure putting the consolidated logic in the OSS collector set class is sufficient to address this issue. That's what led me to this alternate proposal but perhaps I'm missing something?

@afharo
Copy link
Member

afharo commented Jan 25, 2021

@chrisronline, with the latest split between Telemetry and Monitoring. Do you think this issue is still valid?

@chrisronline
Copy link
Contributor Author

Yes, this is all set. Thanks @afharo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants