Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually building KueryNode for Fleet's routes #75693

Merged
merged 19 commits into from
Sep 1, 2020
Merged

Conversation

kobelb
Copy link
Contributor

@kobelb kobelb commented Aug 21, 2020

Parsing a string to create the KQL query is computationally expensive. It accounted for nearly 17% of the CPU time when enrolling 1,000 agents using fleet-loadtest over 5 minutes.

Screen Shot 2020-08-25 at 9 30 51 AM

This approach manually builds the KueryNode, so we no longer have to do the string parsing, which was the most expensive part.

Impact

Micro benchmark

6396334 nanoseconds

esKuery.fromKueryExpression(
  'not fleet-agent-actions.attributes.sent_at: * and fleet-agent-actions.attributes.agent_id:1234567'
);

766089 nanoseconds

esKuery.nodeTypes.function.buildNode('and', [
  esKuery.nodeTypes.function.buildNode(
    'not',
    esKuery.nodeTypes.function.buildNode('is', 'fleet-agent-actions.attributes.sent_at', '*')
  ),
  esKuery.nodeTypes.function.buildNode(
    'is',
    'fleet-agent-actions.attributes.agent_id',
    '1234567'
  ),
]);

Shortest test

Running https://github.com/nchaulet/fleet-load-test-scenarios before these changes

checkin_first_time_duration....: avg=2.53s    min=2.02s   med=2.03s    max=7.03s    p(90)=2.54s    p(95)=7.03s
checkin_second_time_duration...: avg=98.41ms  min=55.51ms med=93.41ms  max=138.32ms p(90)=122.72ms p(95)=130.55ms

And after

checkin_first_time_duration....: avg=2.7s     min=2.01s   med=2.03s    max=7.05s    p(90)=3.33s    p(95)=7.05s
checkin_second_time_duration...: avg=93.21ms  min=27.05ms med=92.64ms  max=177.27ms p(90)=133.85ms p(95)=155.01ms

Short-test

Running fleet-loadtest with RATE=15 AGENTS=4000 go run main.go until 200 agents checked-in and got their policy

2020/08/21 19:01:09 counter requests.healthcheck.concurrent_count
2020/08/21 19:01:09   count:               0
2020/08/21 19:01:09 timer requests.healthcheck.latency
2020/08/21 19:01:09   count:             315
2020/08/21 19:01:09   min:                 6.57ms
2020/08/21 19:01:09   max:              6209.91ms
2020/08/21 19:01:09   mean:              458.72ms
2020/08/21 19:01:09   stddev:           1030.04ms
2020/08/21 19:01:09   median:             24.94ms
2020/08/21 19:01:09   75%:               371.94ms
2020/08/21 19:01:09   95%:              3082.55ms
2020/08/21 19:01:09   99%:              5453.22ms
2020/08/21 19:01:09   99.9%:            6209.91ms
2020/08/21 19:01:09   1-min rate:          0.43
2020/08/21 19:01:09   5-min rate:          1.21
2020/08/21 19:01:09   15-min rate:         1.57
2020/08/21 19:01:09   mean rate:           1.04
2020/08/21 19:01:09 meter requests.healthcheck.success
2020/08/21 19:01:09   count:             314
2020/08/21 19:01:09   1-min rate:          0.43
2020/08/21 19:01:09   5-min rate:          1.21
2020/08/21 19:01:09   15-min rate:         1.57
2020/08/21 19:01:09   mean rate:           1.05
2020/08/21 19:01:09 Policy revision summary
2020/08/21 19:01:09   revision  1:    210 agents

After these changes

2020/08/21 20:01:47 timer requests.healthcheck.latency
2020/08/21 20:01:47   count:             323
2020/08/21 20:01:47   min:                 6.32ms
2020/08/21 20:01:47   max:              5004.98ms
2020/08/21 20:01:47   mean:              257.06ms
2020/08/21 20:01:47   stddev:            687.97ms
2020/08/21 20:01:47   median:             13.74ms
2020/08/21 20:01:47   75%:               120.73ms
2020/08/21 20:01:47   95%:              1875.24ms
2020/08/21 20:01:47   99%:              3985.15ms
2020/08/21 20:01:47   99.9%:            5004.98ms
2020/08/21 20:01:47   1-min rate:          0.81
2020/08/21 20:01:47   5-min rate:          1.48
2020/08/21 20:01:47   15-min rate:         1.68
2020/08/21 20:01:47   mean rate:           1.32
2020/08/21 20:01:47 counter requests.healthcheck.concurrent_count
2020/08/21 20:01:47   count:               0
2020/08/21 20:01:47 meter requests.healthcheck.success
2020/08/21 20:01:47   count:             323
2020/08/21 20:01:47   1-min rate:          0.81
2020/08/21 20:01:47   5-min rate:          1.48
2020/08/21 20:01:47   15-min rate:         1.68
2020/08/21 20:01:47   mean rate:           1.32
2020/08/21 20:01:47 Policy revision summary
2020/08/21 20:01:47   revision  1:    216 agents

This reduces the mean health check latency from 458.72ms to 257.06ms

Longer test

Running fleet-loadtest with RATE=15 AGENTS=2000 go run main.go until completion

2020/08/25 17:00:34 timer requests.healthcheck.latency
2020/08/25 17:00:34   count:            3266
2020/08/25 17:00:34   min:                 6.48ms
2020/08/25 17:00:34   max:              3143.09ms
2020/08/25 17:00:34   mean:              139.80ms
2020/08/25 17:00:34   stddev:            383.02ms
2020/08/25 17:00:34   median:             10.40ms
2020/08/25 17:00:34   75%:                30.99ms
2020/08/25 17:00:34   95%:               926.28ms
2020/08/25 17:00:34   99%:              2008.38ms
2020/08/25 17:00:34   99.9%:            3122.97ms
2020/08/25 17:00:34   1-min rate:          1.55
2020/08/25 17:00:34   5-min rate:          1.56
2020/08/25 17:00:34   15-min rate:         1.59
2020/08/25 17:00:34   mean rate:           1.59
2020/08/25 17:00:34 counter requests.healthcheck.concurrent_count
2020/08/25 17:00:34   count:               0
2020/08/25 17:00:34 meter requests.healthcheck.success
2020/08/25 17:00:34   count:            3266
2020/08/25 17:00:34   1-min rate:          1.55
2020/08/25 17:00:34   5-min rate:          1.56
2020/08/25 17:00:34   15-min rate:         1.59
2020/08/25 17:00:34   mean rate:           1.59
2020/08/25 17:00:34 Policy revision summary
2020/08/25 17:00:34   revision  1:   2000 agents

@kobelb kobelb requested a review from nchaulet August 21, 2020 20:03
@nchaulet
Copy link
Member

Nice the perf improvements seems significant 👍

@@ -37,6 +37,9 @@ import { SavedObjectUnsanitizedDoc } from './serialization';
import { SavedObjectsMigrationLogger } from './migrations/core/migration_logger';
import { SavedObject } from '../../types';

// eslint-disable-next-line @kbn/eslint/no-restricted-paths
import { KueryNode } from '../../../plugins/data/common';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring this code to remove the circular reference became quite time-consuming. Since a circular reference already exists between core and the data plugin, I'd like to postpone fixing the circular reference to another PR to minimize the likelihood of this change not making 7.10.

@kobelb kobelb marked this pull request as ready for review August 26, 2020 23:49
@kobelb kobelb requested a review from a team August 26, 2020 23:49
@kobelb kobelb requested a review from a team as a code owner August 26, 2020 23:49
@kobelb kobelb added v8.0.0 v7.10.0 release_note:skip Skip the PR/issue when compiling release notes labels Aug 26, 2020
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Aug 27, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -34 to +36
if (filter && filter.length > 0 && indexMapping) {
const filterKueryNode = esKuery.fromKueryExpression(filter);
if (filter && indexMapping) {
const filterKueryNode =
typeof filter === 'string' ? esKuery.fromKueryExpression(filter) : filter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: we are no longer ignoring empty string filters, but I guess this has no impact?

Copy link
Contributor Author

@kobelb kobelb Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty strings are still ignored, I added 195a5a1#diff-c5ab35755205b8adbf6cfc858d69ea3bR86-R88 to double-check this. JavaScript treats an empty-string as being falsy, so checking filter and filter.length > 0 is redundant, we only need to check filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait what? Kibana is not Java code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no... Maybe in the future though! New-new platform??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to find a more suitable codename if we switch to java. NewPlatformBijectiveAdapterFactory maybe.

@kobelb
Copy link
Contributor Author

kobelb commented Sep 1, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@kobelb kobelb merged commit b5faf41 into elastic:master Sep 1, 2020
@kobelb kobelb deleted the kuery-time branch September 1, 2020 21:00
kobelb added a commit that referenced this pull request Sep 1, 2020
* Simple benchmark tests for kuery

* Building manually is "better" still not free

* Building the KueryNode manually

* Removing benchmark tests

* Another query is building the KueryNode manually

* Empty strings are inherently falsy

* No longer reaching into the data plugin, import from the "root" indexes

* Using AGENT_ACTION_SAVED_OBJECT_TYPE everywhere

* Adding SavedObjectsRepository#find unit test for KueryNode

* Adding KQL KueryNode test for validateConvertFilterToKueryNode

* /s/KQL string/KQL expression

* Updating API docs

* Adding micro benchmark

* Revert "Adding micro benchmark"

This reverts commit 97e19c0.

* Adding an empty string filters test

Co-authored-by: Elastic Machine <[email protected]>

Co-authored-by: Elastic Machine <[email protected]>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Sep 2, 2020
* master: (223 commits)
  skip flaky suite (elastic#75724)
  [Reporting] Add functional test for Reports in non-default spaces (elastic#76053)
  [Enterprise Search] Fix various icons in dark mode (elastic#76430)
  skip flaky suite (elastic#76245)
  Add `auto` interval to histogram AggConfig (elastic#76001)
  [Resolver] generator uses setup_node_env (elastic#76422)
  [Ingest Manager] Support both zip & tar archives from Registry (elastic#76197)
  [Ingest Manager] Improve agent vs kibana version checks (elastic#76238)
  Manually building `KueryNode` for Fleet's routes (elastic#75693)
  remove dupe tinymath section (elastic#76093)
  Create APM issue template (elastic#76362)
  Delete unused file. (elastic#76386)
  [SECURITY_SOLUTION][ENDPOINT] Trusted Apps Create API (elastic#76178)
  [Detections Engine] Add Alert actions to the Timeline (elastic#73228)
  [Dashboard First] Library Notification (elastic#76122)
  [Maps] Add mvt support for ES doc sources  (elastic#75698)
  Add setHeaderActionMenu API to AppMountParameters (elastic#75422)
  [ML] Remove "Are you sure" from data frame analytics jobs (elastic#76214)
  [yarn] remove typings-tester, use @ts-expect-error (elastic#76341)
  [Reporting/CSV] Do not fail the job if scroll ID can not be cleared (elastic#76014)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants