[Monitoring] Parity between usage data collection #34940

chrisronline · 2019-04-11T14:47:39Z

Currently, we have two separate pieces of code that handle collecting usage data. This is because these pieces of code do something different with the data: one returns it from an api endpoint and the other ships it off to Elasticsearch through monitoring documents.

However, this isn't scalable as with Metricbeat now collecting and shipping usage data (using the api endpoint mentioned in the first piece of code above) to monitoring documents (like the second piece of code), we need to ensure parity or bugs start to crop up.

It will be hard to maintain this parity if the two pieces of code remain as separate pieces - we should unify them so it's not possible for them to deviate.

cc @tsullivan

elasticmachine · 2019-04-11T14:47:41Z

Pinging @elastic/stack-monitoring

ycombinator · 2019-04-11T14:56:27Z

I don't know details / complexities of the implementation but conceptually it would be nice if there was a common piece of code responsible for collecting monitoring data, including formatting it correctly. Then the API endpoint code and the elasticsearch bulk shipping code could both call this common collection code. That would ensure that these parity bugs go away.

ycombinator · 2019-04-12T16:31:47Z

Another approach would be to decouple when stats are collected (by the various collectors within Kibana) from when the collected stats are used (either pulled via the GET api/stats endpoint or pushed via the bulk uploader to POST _monitoring/bulk).

To make this work, the Kibana server would keep collected stats in memory. The collectors would run whenever they are configured to and update their section of the in-memory collected stats. The GET api/stats code would read the stats from memory and serve them over HTTP, whenever requested. Likewise the bulk uploader would run at it's own frequency, read the stats from memory and push them to ES.

The nice thing about this decoupling is that the collectors can each run at whatever frequency makes sense to them. This might be especially beneficial when it comes to Kibana telemetry collection, which we might want to run rather infrequently.

Similarly, the bulk uploader could run at whatever frequency it wants to or be entirely disabled w/o affecting collection in any way. This could be useful when we want users to migrate to using Metricbeat for collection.

chrisronline · 2019-04-12T19:24:23Z

To add more information, here is a bit of a difference between how we poll data from the collectors.

GET /api/stats

This is an endpoint used by telemetry and MB monitoring collection. By default, it returns the result of this collector set. If you provide an optional extended=true query parameter (which MB monitoring collection does), it will merge in the data fetched from all usage collectors (code path is here to here to here). This results in the following usage collectors fetching data:

[
  'sample-data',
  'kql',
  'localization',
  'kibana',
  'spaces',
  'ml',
  'apm',
  'maps',
  'canvas',
  'cloud',
  'infraops',
  'rollups',
  'upgrade-assistant-telemetry',
  'visualization_types',
  'ui_metric',
  'reporting'
]

Monitoring Polling

This is how internal monitoring works within Kibana. At the configured interval (default is 10s), we fetch all collectors (except for the duplicate ops collector from OSS). That list is:

[
  'sample-data',
  'kql',
  'localization',
  'kibana_stats',
  'kibana',
  'kibana_settings',
  'spaces',
  'ml',
  'apm',
  'maps',
  'canvas',
  'cloud',
  'infraops',
  'rollups',
  'upgrade-assistant-telemetry',
  'visualization_types',
  'ui_metric',
  'reporting'
]

They both utilize methods off the OSS collector set class. It makes sense to put the consolidated logic here as both already have access to and are currently using it.

ycombinator · 2019-04-12T22:13:37Z

They both utilize methods off the OSS collector set class. It makes sense to put the consolidated logic here as both already have access to and are currently using it.

@chrisronline As far as achieving parity goes, what you're proposing above will work, as long as all collection happens synchronously with either the GET /api/stats request or when the Monitoring Polling runs.

However, we will still need to address the issue of separating the telemetry collection interval from the rest-of-kibana-stats collection interval and making this separation work while keeping parity between GET /api/stats and Monitoring Polling. I'm not sure putting the consolidated logic in the OSS collector set class is sufficient to address this issue. That's what led me to this alternate proposal but perhaps I'm missing something?

afharo · 2021-01-25T09:54:49Z

@chrisronline, with the latest split between Telemetry and Monitoring. Do you think this issue is still valid?

chrisronline · 2021-01-25T17:46:09Z

Yes, this is all set. Thanks @afharo!

chrisronline added Team:Monitoring Stack Monitoring team Feature:Telemetry labels Apr 11, 2019

chrisronline mentioned this issue Apr 12, 2019

Monitoring telemetry collection interval #34609

Merged

ycombinator mentioned this issue Apr 30, 2019

Metricbeat-indexed kibana_stats docs differ from internally-indexed ones #35799

Closed

igoristic mentioned this issue Oct 17, 2019

Metricbeat-indexed kibana_stats docs differ from internally-indexed ones #48490

Closed

Bamieh mentioned this issue Apr 22, 2020

[UsageCollection] Expose events API #64156

Closed

chrisronline mentioned this issue Jun 30, 2020

Telemetry & Monitoring: Kibana Monitoring & BulkUploader #68998

Closed

4 tasks

chrisronline closed this as completed Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring] Parity between usage data collection #34940

[Monitoring] Parity between usage data collection #34940

chrisronline commented Apr 11, 2019

elasticmachine commented Apr 11, 2019

ycombinator commented Apr 11, 2019

ycombinator commented Apr 12, 2019 •

edited

Loading

chrisronline commented Apr 12, 2019 •

edited

Loading

ycombinator commented Apr 12, 2019 •

edited

Loading

afharo commented Jan 25, 2021

chrisronline commented Jan 25, 2021

[Monitoring] Parity between usage data collection #34940

[Monitoring] Parity between usage data collection #34940

Comments

chrisronline commented Apr 11, 2019

elasticmachine commented Apr 11, 2019

ycombinator commented Apr 11, 2019

ycombinator commented Apr 12, 2019 • edited Loading

chrisronline commented Apr 12, 2019 • edited Loading

GET /api/stats

Monitoring Polling

ycombinator commented Apr 12, 2019 • edited Loading

afharo commented Jan 25, 2021

chrisronline commented Jan 25, 2021

ycombinator commented Apr 12, 2019 •

edited

Loading

chrisronline commented Apr 12, 2019 •

edited

Loading

ycombinator commented Apr 12, 2019 •

edited

Loading