-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] [Monitoring] Only retry fetching usage once monitoring bulk upload is successful #54309
[Telemetry] [Monitoring] Only retry fetching usage once monitoring bulk upload is successful #54309
Conversation
…itoring_bulk_upload_fix
Pinging @elastic/pulse (Team:Pulse) |
x-pack/legacy/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js
Outdated
Show resolved
Hide resolved
…_uploader.js Co-Authored-By: Christiane (Tina) Heiligers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine for what it's doing.
Offline, @Bamieh and I discussed the fix and agreed it's fine as a work-around for not easily being able to detect programmatically if monitoring in Kibana is enabled or not. Optimal solutions will be handled elsewhere.
I ran the code locally with the --verbose
flag both before and after enabling Monitoring, and verified that it works.
(before enabling monitoring, we get the debug log Resetting lastFetchWithUsage because uploading to the cluster was not successful.
. After enabling monitoring, we get Uploaded bulk stats payload to the local cluster
LGTM.
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this fix locally merged into #54055 and appears to work as expected. Telemetry fetches once initially and then as needed. lgtm!
…tic#54309) * fix interval and add tests * Update x-pack/legacy/plugins/monitoring/server/kibana_monitoring/bulk_uploader.js Co-Authored-By: Christiane (Tina) Heiligers <[email protected]> Co-authored-by: Christiane (Tina) Heiligers <[email protected]>
* master: (23 commits) [Vis: Default editor] Reactify the timelion editor (elastic#52990) [Discover] fix histogram min interval (elastic#53979) [Telemetry] [Monitoring] Only retry fetching usage once monito… (elastic#54309) [docs][APM] Add runtime index config documentation (elastic#53907) [SIEM] Detection engine timeline (elastic#53783) Filter scripted fields preview field list to source fields (elastic#53826) Management - New platform api (elastic#52579) Reset region and Account when switching inventory (elastic#54287) [SIEM] [Case] Case workflow api schema (elastic#51535) Code coverage setup on CI (elastic#49003) [ML] DF Analytics Results: adds link to docs (elastic#54189) Update schemas boolean, byteSize, and duration to coerce strings (elastic#54177) [Metrics UI] Pass relevant shouldAllowEdit capabilities into SettingsPage (elastic#49781) [Canvas] Fixes bugs with autoplay and refresh (elastic#53149) [ML] DF Analytics Classification: ensure confusion matrix can be fetched (elastic#53629) Fix Vega react eslint errors (elastic#54259) Remove non existing codeowners (elastic#54274) use correct type (elastic#54244) [Dashboard] Removing 100% as dshDashboardViewport height (elastic#54263) add `examples/` to no-restricted-path config (elastic#54252) ...
Pinging @elastic/kibana-core (Team:Core) |
The bulk uploader in monitoring attempts to bulk insert data into Elasticsearch every 10 seconds (defined by the flag xpack.monitoring.kibana.collection.interval).
To avoid performance issues, we have throttled fetching telemetry usage data to once every 24 hours in the bulk uploader when monitoring is enabled.
The current behavior is to keep fetching and trying to insert usage data until ES succeeds. Once it succeeds we start fetching usage every 24 hours.
When monitoring is not enabled, the bulk uploader will keep on retring since ES returns ignored: true (the index does not exist) rendering the operation as unsuccessful, hence fetching usage again.
This is happening on all 7.x and master. It was discovered when running a backport against 7.5 branch. (#54055)
To improve performance when monitoring is not enabled we can start fetching usage data once the bulk uploader gets a success on the bulk insert from ES.
The tiny downside to this approach is that we will not be getting usage data on the first successful insert after enabling monitoring. We will be getting this data on the second tick (in less that 20 seconds).
CC @aaronjcaldwell
Closes #54294