Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Improve perfomance of fetching unacknowledged actions #75892

Closed

Conversation

nchaulet
Copy link
Member

@nchaulet nchaulet commented Aug 25, 2020

Description

Currently fetching unacknowledged actions for an agent is a performance bottleneck because of parsing KQL (see #75646)

This PR change our data model to avoid that KQL query.

Change made in this PR:

  • Denormalize our model and a new property not_acknowledged_actions on the agent so we do not need to do a search to find not acknowledged action for an agent.
  • rename sent_at => acknowledged_at in AgentAction schema as it's more accurate.

Load test

2000 agents

Before

2020/08/25 10:35:17 timer requests.healthcheck.latency
2020/08/25 10:35:17   count:            1052
2020/08/25 10:35:17   min:                46.28ms
2020/08/25 10:35:17   max:              2816.12ms
2020/08/25 10:35:17   mean:              297.97ms
2020/08/25 10:35:17   stddev:            320.69ms
2020/08/25 10:35:17   median:            217.33ms
2020/08/25 10:35:17   75%:               280.85ms
2020/08/25 10:35:17   95%:               723.66ms
2020/08/25 10:35:17   99%:              2140.05ms
2020/08/25 10:35:17   99.9%:            2801.34ms
2020/08/25 10:35:17   1-min rate:          1.29
2020/08/25 10:35:17   5-min rate:          1.28
2020/08/25 10:35:17   15-min rate:         1.32
2020/08/25 10:35:17   mean rate:           1.25
2020/08/25 10:35:17 counter requests.healthcheck.concurrent_count
2020/08/25 10:35:17   count:               1
2020/08/25 10:35:17 meter requests.healthcheck.success
2020/08/25 10:35:17   count:            1052
2020/08/25 10:35:17   1-min rate:          1.29
2020/08/25 10:35:17   5-min rate:          1.28
2020/08/25 10:35:17   15-min rate:         1.32
2020/08/25 10:35:17   mean rate:           1.25
2020/08/25 10:35:17 Policy revision summary
2020/08/25 10:35:17   revision  1:   2000 agents

After

2020/08/25 10:06:44 timer requests.healthcheck.latency
2020/08/25 10:06:44   count:            1061
2020/08/25 10:06:44   min:                44.09ms
2020/08/25 10:06:44   max:              2202.55ms
2020/08/25 10:06:44   mean:              254.06ms
2020/08/25 10:06:44   stddev:            273.90ms
2020/08/25 10:06:44   median:            205.27ms
2020/08/25 10:06:44   75%:               232.60ms
2020/08/25 10:06:44   95%:               500.66ms
2020/08/25 10:06:44   99%:              1689.15ms
2020/08/25 10:06:44   99.9%:            2197.51ms
2020/08/25 10:06:44   1-min rate:          1.32
2020/08/25 10:06:44   5-min rate:          1.32
2020/08/25 10:06:44   15-min rate:         1.35
2020/08/25 10:06:44   mean rate:           1.33
2020/08/25 10:06:44 counter requests.healthcheck.concurrent_count
2020/08/25 10:06:44   count:               0
2020/08/25 10:06:44 meter requests.healthcheck.success
2020/08/25 10:06:44   count:            1061
2020/08/25 10:06:44   1-min rate:          1.32
2020/08/25 10:06:44   5-min rate:          1.32
2020/08/25 10:06:44   15-min rate:         1.35
2020/08/25 10:06:44   mean rate:           1.33
2020/08/25 10:06:44 Policy revision summary
2020/08/25 10:06:44   revision  1:   2000 agents

4000 agents

Before

2020/08/25 11:06:34 counter requests.healthcheck.concurrent_count
2020/08/25 11:06:34   count:               1
2020/08/25 11:06:34 meter requests.healthcheck.success
2020/08/25 11:06:34   count:            1575
2020/08/25 11:06:34   1-min rate:          1.02
2020/08/25 11:06:34   5-min rate:          0.96
2020/08/25 11:06:34   15-min rate:         1.03
2020/08/25 11:06:34   mean rate:           0.96
2020/08/25 11:06:34 timer requests.healthcheck.latency
2020/08/25 11:06:34   count:            1575
2020/08/25 11:06:34   min:                49.46ms
2020/08/25 11:06:34   max:              4281.60ms
2020/08/25 11:06:34   mean:              562.76ms
2020/08/25 11:06:34   stddev:            800.22ms
2020/08/25 11:06:34   median:            313.69ms
2020/08/25 11:06:34   75%:               477.01ms
2020/08/25 11:06:34   95%:              3045.95ms
2020/08/25 11:06:34   99%:              3669.76ms
2020/08/25 11:06:34   99.9%:            4279.77ms
2020/08/25 11:06:34   1-min rate:          1.02
2020/08/25 11:06:34   5-min rate:          0.96
2020/08/25 11:06:34   15-min rate:         1.03
2020/08/25 11:06:34   mean rate:           0.96
2020/08/25 11:06:34 Agent rollout
2020/08/25 11:06:34   agents:  4000
2020/08/25 11:06:34 Policy revision summary
2020/08/25 11:06:34   revision  1:   4000 agents

After

2020/08/25 11:42:32 meter requests.healthcheck.success
2020/08/25 11:42:32   count:            1831
2020/08/25 11:42:32   1-min rate:          1.16
2020/08/25 11:42:32   5-min rate:          1.10
2020/08/25 11:42:32   15-min rate:         1.06
2020/08/25 11:42:32   mean rate:           1.07
2020/08/25 11:42:32 counter requests.healthcheck.concurrent_count
2020/08/25 11:42:32   count:               1
2020/08/25 11:42:32 timer requests.healthcheck.latency
2020/08/25 11:42:32   count:            1831
2020/08/25 11:42:32   min:                44.90ms
2020/08/25 11:42:32   max:              3568.79ms
2020/08/25 11:42:32   mean:              430.63ms
2020/08/25 11:42:32   stddev:            665.81ms
2020/08/25 11:42:32   median:            229.46ms
2020/08/25 11:42:32   75%:               310.05ms
2020/08/25 11:42:32   95%:              2580.73ms
2020/08/25 11:42:32   99%:              3182.45ms
2020/08/25 11:42:32   99.9%:            3566.68ms
2020/08/25 11:42:32   1-min rate:          1.16
2020/08/25 11:42:32   5-min rate:          1.10
2020/08/25 11:42:32   15-min rate:         1.06
2020/08/25 11:42:32   mean rate:           1.07
2020/08/25 11:42:32 Agent rollout
2020/08/25 11:42:32   agents:  4000
2020/08/25 11:42:32 Policy revision summary
2020/08/25 11:42:32   revision  1:   4000 agents

TODO

  • Add SO migrations for sent_at field

@nchaulet nchaulet added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.10.0 Team:Fleet Team label for Observability Data Collection Fleet team labels Aug 25, 2020
@nchaulet nchaulet self-assigned this Aug 25, 2020
@nchaulet nchaulet marked this pull request as ready for review August 25, 2020 15:57
@nchaulet nchaulet requested a review from a team August 25, 2020 15:57
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@jen-huang
Copy link
Contributor

How will this affect agent actions created in 7.9? I see saved object mapping changes. Does there need to be migrations added if there are documents from 7.9?

@kibanamachine
Copy link
Contributor

⏳ Build in-progress, with failures

Failed CI Steps

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@nchaulet
Copy link
Member Author

@jen-huang it will affect actions from 7.9, I am planning on writing the migration, but we should have an happy path to migrate the renaming of sent_at to acknowledged_at

@nchaulet
Copy link
Member Author

nchaulet commented Sep 3, 2020

Better fix #75693 #76589

@nchaulet nchaulet closed this Sep 3, 2020
@nchaulet nchaulet deleted the feature-refacto-action-sent-at branch September 3, 2020 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants