Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.15] [Synthetics] Fix heatmap on monitor detail/history page for very large doc counts (#184177) #196157

Closed

Conversation

justinkambic
Copy link
Contributor

Backport

This will backport the following commits from main to 8.15:

Questions ?

Please refer to the Backport tool documentation

…e doc counts (elastic#184177)

## Summary

Resolves elastic#180076.

### User-facing updates

#### We can now display the heatmap for "very large" document counts

##### This PR

_note: The first bucket in this image is empty because that's when my
monitor actually started._

<img width="2083" alt="image"
src="https://github.com/elastic/kibana/assets/18429259/72daf378-586e-4c7d-a1b0-a05bf95fb09a">

##### Previously

<img width="1420" alt="image"
src="https://github.com/elastic/kibana/assets/18429259/94f46913-debc-4bce-9794-600241fc4d6f">

#### We display appropriate bucketed data

This is a weird behavior I only noticed once I had made the new query.
For some reason, the chart didn't display all the data it should. This
may be related to the initial issue, but even in ranges less than 10k
docs I was seeing this behavior. I manually checked this by querying for
the bucket spans and ensuring the right number of docs were showing up,
which is what I'd expect, because it's highly unlikely there would be a
bug like this in the date histogram agg.

##### This PR

<img width="1420" alt="image"
src="https://github.com/elastic/kibana/assets/18429259/8620f709-dd4f-4a52-b39d-a3ff778ff7b4">

##### Previously

<img width="1420" alt="image"
src="https://github.com/elastic/kibana/assets/18429259/745fa36d-db35-46a4-be1d-330bed799884">

### Why are you doing this?

We have noticed in a few SDH that in cases where users are querying very
large document tallies, like a month's worth of data for a monitor that
runs every minute, they're not seeing their status heatmap fill up.

~This patch would introduce a multi-stage query mechanism to the pings
query endpoint. The PR is in draft state, so the procedure performs the
query in this manner all the time, but we may want to add a new param to
the endpoint that allows the client to opt into this behavior, or simply
cap at the default 10k documents like it does today.~

~I intend to do a good amount of benchmark testing on this fix locally
and in Cloud before I try to merge it. We may also need to explore
imposing a hard upper limit on the number of stages we are willing to
execute. Right now the query is unbounded and a user could try to query
datasets that would result in Kibana retrieving and trying to respond
with hundreds of MB of data.~

### What changed

This PR modifies the backend query to rely on a Date Histogram agg,
rather than querying the documents directly for this purpose. This agg
uses the same interval that the client calculates based on the width of
the page. I have kept the existing time bin logic on the client side,
and modified the function that builds the data structure to pour the
buckets from the histogram into the bins that we serve the chart that
the user sees.

One drawback is because the buckets are computed by Elasticsearch, we
need to make API calls to retrieve more data, whereas on resizes
previously, all the data was already present and would get re-bucketed
on the client. I think this is acceptable because of the ability to
handle larger datasets and given that the query should be very fast
still.

Additionally:

- I have made additional modifications to the frontend logic to prevent
unnecessary API calls. Before, the component would always make at least
two calls on an initial load. I have also switched from `lodash`
`throttle` to `useDebounce` from `react-use`, as this seems to more
properly drop intermediate resize events where we would want to skip
hitting the API.
- I have also changed the render logic. We used to use a default value
to build out an initial set of time bins. Unfortunately, with all these
changes in place, this results in a weird behavior where we show an
empty scale while waiting for data, and the bins then snap to the proper
size once the element's width has been determined and the bins get
scaled out properly. Instead of this, I skip rendering the chart until
the initial width is available via a `ref` applied to the wrapper
element. At that point, we send the init width to the hook and use that
value for our initial query and bin calculations, so we only ever show
bins on the charts that will correspond to the bucketed data we
eventually display.
- I added a progress indicator to the wrapper element so the user
receives an indication that the data is being refreshed.
- I added a quiet fetch similar to how the overview page works, so that
after an initial render the chart will not have its data dropped until
we have new data to replace it with. Along those lines, the hook now
watches the page location and if it changes (i.e. to from History to
Overview), state management will destroy the heatmap data so we don't
have a flicker of other data during a new fetch.

##  Testing this PR

Unfortunately, to truly test this PR you'd need a monitor with over 10k
documents, as that's the criteria specified in the initial issue. I did
this by running a monitor over about 10 days on a 1-run-per-minute
schedule. You could create a cloud deployment from this PR, or create
dummy documents in some way (this can be hard to verify though). I am
pretty confident this works and fixes the issue based on my real-world
testing.

---------

Co-authored-by: shahzad31 <[email protected]>
(cherry picked from commit 50f8bd6)

# Conflicts:
#	x-pack/plugins/observability_solution/synthetics/common/constants/synthetics/rest_api.ts
#	x-pack/plugins/observability_solution/synthetics/public/apps/synthetics/state/root_effect.ts
#	x-pack/plugins/observability_solution/synthetics/server/routes/index.ts
#	x-pack/plugins/observability_solution/synthetics/server/routes/pings/get_ping_statuses.ts
@justinkambic justinkambic enabled auto-merge (squash) October 14, 2024 14:44
@botelastic botelastic bot added ci:project-deploy-observability Create an Observability project Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team labels Oct 14, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@justinkambic
Copy link
Contributor Author

Canceling this backport, we will just target 8.16 and the necessary commits are already in that branch.

auto-merge was automatically disabled October 14, 2024 14:45

Pull request was closed

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 14, 2024

⏳ Build in-progress, with failures

Failed CI Steps

Test Failures

  • [job] [logs] Jest Integration Tests #2 / checking migration metadata changes on all registered SO types detecting all the settings that have opted-in for dynamic in-memory updates
  • [job] [logs] Jest Integration Tests #2 / checking migration metadata changes on all registered SO types detecting all the settings that have opted-in for dynamic in-memory updates
  • [job] [logs] Jest Integration Tests #5 / checking migration metadata changes on all registered SO types detecting migration related changes in registered types
  • [job] [logs] Jest Integration Tests #5 / checking migration metadata changes on all registered SO types detecting migration related changes in registered types
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding APM to a preconfigured agent policy after first setup Create a .fleet-policies document with the APM package policy
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding APM to a preconfigured agent policy after first setup Create a .fleet-policies document with the APM package policy
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding APM to a preconfigured agent policy after first setup Create correct package policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding APM to a preconfigured agent policy after first setup Create correct package policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding APM to a preconfigured agent policy after first setup Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding APM to a preconfigured agent policy after first setup Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding package policy id to a preconfigured package policy after first setup Create correct package policies and use the name of package policies instead of id
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding package policy id to a preconfigured package policy after first setup Create correct package policies and use the name of package policies instead of id
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding package policy id to a preconfigured package policy after first setup Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Adding package policy id to a preconfigured package policy after first setup Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Support removing a field from output after first setup Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy Support removing a field from output after first setup Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy With a full preconfigured cloud policy Create correct .fleet-policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy With a full preconfigured cloud policy Create correct .fleet-policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy With a full preconfigured cloud policy Create correct package policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy With a full preconfigured cloud policy Create correct package policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy With a full preconfigured cloud policy Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet cloud preconfiguration Preconfigured cloud policy With a full preconfigured cloud policy Works and preconfigure correctly agent policies
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset all policy Works and reset all preconfigured policies
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset all policy Works and reset all preconfigured policies
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset one preconfigured policy Works and reset one preconfigured policies if the policy is already deleted (with a ghost package policy)
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset one preconfigured policy Works and reset one preconfigured policies if the policy is already deleted (with a ghost package policy)
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset one preconfigured policy Works and reset one preconfigured policies if the policy was deleted with a preconfiguration deletion record
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset one preconfigured policy Works and reset one preconfigured policies if the policy was deleted with a preconfiguration deletion record
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset one preconfigured policy Works if the preconfigured policies already exists with a missing package policy
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfiguration reset Reset one preconfigured policy Works if the preconfigured policies already exists with a missing package policy
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfigured outputs Preconfigured outputs With a preconfigured monitoring output Should create a default output and the default preconfigured output
  • [job] [logs] Jest Integration Tests #1 / Fleet preconfigured outputs Preconfigured outputs With a preconfigured monitoring output Should create a default output and the default preconfigured output
  • [job] [logs] Jest Integration Tests #1 / Fleet setup preconfiguration with multiple instances Kibana preconfiguration setup sets up Fleet correctly when multiple Kibana instaces are started in serial
  • [job] [logs] Jest Integration Tests #1 / Fleet setup preconfiguration with multiple instances Kibana preconfiguration setup sets up Fleet correctly when multiple Kibana instaces are started in serial
  • [job] [logs] Jest Integration Tests #1 / Fleet setup preconfiguration with multiple instances Kibana preconfiguration setup sets up Fleet correctly when multiple Kibana instances are started at the same time
  • [job] [logs] Jest Integration Tests #1 / Fleet setup preconfiguration with multiple instances Kibana preconfiguration setup sets up Fleet correctly when multiple Kibana instances are started at the same time
  • [job] [logs] Jest Integration Tests #1 / Fleet setup preconfiguration with multiple instances Kibana preconfiguration setup sets up Fleet correctly with single Kibana instance
  • [job] [logs] Jest Integration Tests #1 / Fleet setup preconfiguration with multiple instances Kibana preconfiguration setup sets up Fleet correctly with single Kibana instance
  • [job] [logs] Jest Integration Tests #1 / fleet usage telemetry should fetch usage telemetry
  • [job] [logs] Jest Integration Tests #1 / fleet usage telemetry should fetch usage telemetry
  • [job] [logs] FTR Configs #61 / Interactive setup APIs - Enrollment flow should be able to enroll with valid authentication code
  • [job] [logs] FTR Configs #61 / Interactive setup APIs - Enrollment flow should be able to enroll with valid authentication code
  • [job] [logs] FTR Configs #72 / Interactive setup APIs - Manual configuration flow should be able to configure with valid authentication code
  • [job] [logs] FTR Configs #72 / Interactive setup APIs - Manual configuration flow should be able to configure with valid authentication code
  • [job] [logs] FTR Configs #38 / Interactive setup APIs - Manual configuration flow without TLS should be able to configure with valid authentication code
  • [job] [logs] FTR Configs #38 / Interactive setup APIs - Manual configuration flow without TLS should be able to configure with valid authentication code
  • [job] [logs] FTR Configs #74 / Interactive Setup Functional Tests (Enrollment token) should configure Kibana successfully
  • [job] [logs] FTR Configs #74 / Interactive Setup Functional Tests (Enrollment token) should configure Kibana successfully
  • [job] [logs] FTR Configs #8 / Interactive Setup Functional Tests (Manual configuration without Security) should configure Kibana successfully
  • [job] [logs] FTR Configs #8 / Interactive Setup Functional Tests (Manual configuration without Security) should configure Kibana successfully
  • [job] [logs] FTR Configs #3 / Interactive Setup Functional Tests (Manual configuration without TLS) should configure Kibana successfully
  • [job] [logs] FTR Configs #3 / Interactive Setup Functional Tests (Manual configuration without TLS) should configure Kibana successfully
  • [job] [logs] FTR Configs #7 / Interactive Setup Functional Tests (Manual configuration) should configure Kibana successfully
  • [job] [logs] FTR Configs #7 / Interactive Setup Functional Tests (Manual configuration) should configure Kibana successfully
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana migrator fails on the CLONE_TEMP_TO_TARGET state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana migrator fails on the CLONE_TEMP_TO_TARGET state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana migrator fails on the REINDEX_SOURCE_TO_TEMP_INDEX_BULK state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana migrator fails on the REINDEX_SOURCE_TO_TEMP_INDEX_BULK state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana migrator fails on the UPDATE_TARGET_MAPPINGS_PROPERTIES state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana migrator fails on the UPDATE_TARGET_MAPPINGS_PROPERTIES state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana_analytics migrator fails on the CLONE_TEMP_TO_TARGET state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana_analytics migrator fails on the CLONE_TEMP_TO_TARGET state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana_analytics migrator fails on the UPDATE_TARGET_MAPPINGS_PROPERTIES state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana_analytics migrator fails on the UPDATE_TARGET_MAPPINGS_PROPERTIES state is capable of successfully performing the split migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana_task_manager migrator fails on the TRANSFORMED_DOCUMENTS_BULK_INDEX state, after the other ones have finished is capable of completing the .kibana_task_manager migration in subsequent restart
  • [job] [logs] Jest Integration Tests #2 / split .kibana index into multiple system indices failure cases when the .kibana_task_manager migrator fails on the TRANSFORMED_DOCUMENTS_BULK_INDEX state, after the other ones have finished is capable of completing the .kibana_task_manager migration in subsequent restart
  • [job] [logs] Jest Integration Tests #5 / split .kibana index into multiple system indices when migrating from a legacy version performs v1 migration and then relocates saved objects into different indices, depending on their types
  • [job] [logs] Jest Integration Tests #5 / split .kibana index into multiple system indices when migrating from a legacy version performs v1 migration and then relocates saved objects into different indices, depending on their types
  • [job] [logs] Jest Integration Tests #5 / split .kibana index into multiple system indices when multiple Kibana migrators run in parallel correctly migrates 7.7.2_xpack_100k_obj.zip archive
  • [job] [logs] Jest Integration Tests #5 / split .kibana index into multiple system indices when multiple Kibana migrators run in parallel correctly migrates 7.7.2_xpack_100k_obj.zip archive
  • [job] [logs] Jest Integration Tests #1 / upgrade agent policy schema version with package installed with outdated schema version should correctly upgrade schema version
  • [job] [logs] Jest Integration Tests #1 / upgrade agent policy schema version with package installed with outdated schema version should correctly upgrade schema version
  • [job] [logs] Jest Integration Tests #1 / Uprade package install version with package installed with a previous format install version should upgrade package install version for outdated packages
  • [job] [logs] Jest Integration Tests #1 / Uprade package install version with package installed with a previous format install version should upgrade package install version for outdated packages

History

@justinkambic justinkambic deleted the backport/8.15/pr-184177 branch October 14, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport ci:project-deploy-observability Create an Observability project Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants