-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task Manager] Batches the update operations in Task Manager #71470
[Task Manager] Batches the update operations in Task Manager #71470
Conversation
* master: (314 commits) [APM] Use status_code field to calculate error rate (elastic#71109) [Observability] Change appLink passing the date range (elastic#71259) [Security] Add Timeline improvements (elastic#71506) adjust vislib bar opacity (elastic#71421) Fix ScopedHistory mock and adapt usages (elastic#71404) [Security Solution] Add hook for reading/writing resolver query params (elastic#70809) [APM] Bug fixes from ML integration testing (elastic#71564) [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404) [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275) [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321) Change signal.rule.risk score mapping from keyword to float (elastic#71126) Added help text where needed on connectors and alert actions UI (elastic#69601) [SIEM][Detections] Value Lists Management Modal (elastic#67068) [test] Skips test preventing promotion of ES snapshot elastic#71582 [test] Skips test preventing promotion of ES snapshot elastic#71555 [ILM] Fix alignment of the timing field (elastic#71273) [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540) initial telemetry setup (elastic#69330) [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027) Search across spaces (elastic#67644) ...
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
* master: (214 commits) replacing hard coded links for ela.st (elastic#72240) skip flaky suite (elastic#60865) chore(NA): teardown dynamic dll plugin (elastic#72096) Register navLink actions for declared applications (elastic#72109) Fix value for process.hash.sha256 draggable (elastic#72142) Call setupIngest before fleet_install tests (elastic#72214) [Security Solution][Detections] Better toast errors (elastic#72205) skip flaky suite (elastic#64696) [Security Solution][Detections] Disable exceptions for Threshold and ML rules (elastic#72137) [Security Solution][Detections,Lists] Miscellaneous post-FF fixes (elastic#71990) [baseline/capture] use high-memory nodes with ramDisks (elastic#71894) skip flaky suite (elastic#77207) [Maps] Fix issue preventing TMS from rendering correctly (elastic#71946) using test_user with minimum privs (elastic#71988) Fixed Webhook connector doesn't retain added HTTP header settings (elastic#71924) [Ingest Manager] Do not show enrolling and unenrolling agents as online in agent counters (elastic#71921) [Maps] fix 'New Map' from getting added to recently accessed (elastic#72125) [Visualizations] Pass 'aggs' parameter to custom request handlers (elastic#71423) [Monitoring] Out of the box alert tweaks (elastic#71942) [ML] Fix datafeed start time is incorrect when the job has trailing empty buckets (elastic#71976) ...
x-pack/plugins/task_manager/server/lib/bulk_operation_buffer.ts
Outdated
Show resolved
Hide resolved
Just tried this PR out with my 100 index threshold alerts scheduling a server log for 4 instances every second (TM poll interval) w/20 workers. Earlier this week, it backed up the TM queue pretty quickly, 1000's of entries in minutes. After about 5 minutes of running, queue not growing at all! Queue size guestimated via CPU usage seems up a little bit (130% or so before, now more like 150%), but I'd say that's expected when you reduce other latencies like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM and seems to work as expected (more throughput with lots of active alerts)
@@ -47,6 +53,25 @@ export async function promiseResult<T, E>(future: Promise<T>): Promise<Result<T, | |||
} | |||
} | |||
|
|||
export async function unwrapPromise<T, E>(future: Promise<Result<T, E>>): Promise<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to wrap my head around what this is doing, thought I'd look at the tests, and there aren't any!
Would probably be useful to have tests for result_type.ts
, not a blocker for this PR obviously ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair- this used to be just typing, it now has some logic.
I'll add some tests.
In any case:
Basically unwrapPromise
expects a Promise to resolve with a Result<T,E>
and if it's an ok
it resolves, if it's an err
it rejects.
If the future
rejects, we want to unwrap the error from it's Result
container, and that's why we have the catch
.
We can't put the catch
after the then
because then it'll catch the rejection we meant to return in the then
.
It's annoying as hell and would be much clearer if Typescript just let us add typing to rejected Promises... the exceptions model in JS is... lacking... to say the least.
function isSavedObjectsUpdateResponse( | ||
result: SavedObjectsUpdateResponse | Error | ||
): result is SavedObjectsUpdateResponse { | ||
return result && typeof (result as SavedObjectsUpdateResponse).id === 'string'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems slightly icky - the idea is that an error won't have an id
property? Which seems pretty safe, but you'd think there would be a cleaner way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agreed, but sadly, this is the shape returned by SavedObjectsRepository :/
Can't think of a better way without changing how the SavedObjectsRepository works.
Any ideas? 🤷
x-pack/plugins/task_manager/server/lib/bulk_operation_buffer.ts
Outdated
Show resolved
Hide resolved
* master: (60 commits) [SIEM][Detection Engine][Lists] Adds list permissions (elastic#72335) [SIEM][Detection Engine][Lists] Adds conflict versioning and io-ts improvements to lists (elastic#72337) [Resolver] no longer pass related event stats to process node component (elastic#72435) Revert "skip flaky suite (elastic#72146)" [Security Solution] Cleanup endpoint telemetry (elastic#71950) Unskip dashboard embeddable rendering tests (elastic#71824) [ENDPOINT] Added unerolling status for host. (elastic#72303) [Alerting][Connectors] Increase the size of the logos (elastic#72419) [SECURITY] [Timeline] Raw events not displayed (elastic#72387) [ML] Fixes display of regression stop stats if one is NaN (elastic#72412) [Ingest Pipelines] Processor Editor Move Tooltip (elastic#72239) Fix match phrase and not match phrase comparators (elastic#71850) [Plugin Generator] Generate tsconfig and useDefaultBehaviors (elastic#72040) [Security Solution][Timeline] Fix timeline styling and createFrom beh… (elastic#72152) [Resolver] Selector performance (elastic#72380) [Ingest Manager] Set `_meta` in the index.mappings (elastic#72026) [Ingest Manager] Do not bumb config revision during config creation (elastic#72270) [ML] Adding missing index pattern name to new job wizards (elastic#72400) [ML] improve annotation flyout performance (elastic#72299) [APM] Testing error rate API and restructuring folders (elastic#72257) ...
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one question
import { unwrapPromise } from './lib/result_type'; | ||
|
||
// by default allow updates to be buffered for up to 50ms | ||
const DEFAULT_BUFFER_MAX_DURATION = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there would be value to have this as a kibana.yml config with a default of that number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it, but it feel very internal and I think for now we can keep it as a constant.
We can expose it later if we feel it's worth it, but I think the poll_interval and max_workers is enough cognitive load for our users. :)
…#71470) This PR attempts to batch update tasks in Task Manager in order to avoid overloading the Elasticsearch queue. This is the 1st PR addressing elastic#65551 Under the hood we now use a Reactive buffer accumulates all calls to the `update` api in the TaskStore and flushes after 50ms or when as many operations as there are workers have been buffered (whichever comes first).
…#72626) This PR attempts to batch update tasks in Task Manager in order to avoid overloading the Elasticsearch queue. This is the 1st PR addressing #65551 Under the hood we now use a Reactive buffer accumulates all calls to the `update` api in the TaskStore and flushes after 50ms or when as many operations as there are workers have been buffered (whichever comes first).
* master: [Security Solution][Timeline] Add Empty view to the Timelines page (elastic#72576) [Security Solution][Resolver] Show process detail panel when clicking a process node (elastic#72563) Move manifest packageConfig mocks into security_solution plugin (elastic#72527) [QA][Code Coverage] Fixup Team Assignment (elastic#72467) [docs] remove references to tile map visualization in supported aggregations (elastic#72493) [ci][apm-ui] fix argument name for disabling pr comments (elastic#72633) Only check that the event ids are the same in arrays (elastic#72624) Add doc titles to ES UI apps (elastic#71045) Add Upgrade Assistant API integration test to ensure the reindex operation saved object can handle immense error messages (elastic#72347) [APM] Disable flaky rum e2e’s (elastic#72614) Applying tiny fix from 72532 to main branch (elastic#72533) [APM] Update script with new roles/users (elastic#72599) [Security Solution] Add margin (elastic#72542) Migrated fixed_scroll karma tests to jest (elastic#72258) [ML] Handling data recognizer saved object errors (elastic#72447) [Monitoring] Fix the messaging around needing TLS enabled (elastic#72310) [Task Manager] Batches the update operations in Task Manager (elastic#71470)
…#71470) This PR attempts to batch update tasks in Task Manager in order to avoid overloading the Elasticsearch queue. This is the 1st PR addressing elastic#65551 Under the hood we now use a Reactive buffer accumulates all calls to the `update` api in the TaskStore and flushes after 50ms or when as many operations as there are workers have been buffered (whichever comes first).
…71470) (#72724) * [Task Manager] Batches the update operations in Task Manager (#71470) This PR attempts to batch update tasks in Task Manager in order to avoid overloading the Elasticsearch queue. This is the 1st PR addressing #65551 Under the hood we now use a Reactive buffer accumulates all calls to the `update` api in the TaskStore and flushes after 50ms or when as many operations as there are workers have been buffered (whichever comes first). * removed next tick scheduling as we dont use it and its test was too flaky
…feature-privileges * alerting/consumer-based-rbac: [Security Solution][Timeline] Add Empty view to the Timelines page (elastic#72576) [Security Solution][Resolver] Show process detail panel when clicking a process node (elastic#72563) renamed variable to make it clear the SO client is unsecured Move manifest packageConfig mocks into security_solution plugin (elastic#72527) [QA][Code Coverage] Fixup Team Assignment (elastic#72467) [docs] remove references to tile map visualization in supported aggregations (elastic#72493) [ci][apm-ui] fix argument name for disabling pr comments (elastic#72633) Only check that the event ids are the same in arrays (elastic#72624) includes hidden params type in SO client Add doc titles to ES UI apps (elastic#71045) Add Upgrade Assistant API integration test to ensure the reindex operation saved object can handle immense error messages (elastic#72347) [APM] Disable flaky rum e2e’s (elastic#72614) Applying tiny fix from 72532 to main branch (elastic#72533) [APM] Update script with new roles/users (elastic#72599) [Security Solution] Add margin (elastic#72542) Migrated fixed_scroll karma tests to jest (elastic#72258) [ML] Handling data recognizer saved object errors (elastic#72447) [Monitoring] Fix the messaging around needing TLS enabled (elastic#72310) [Task Manager] Batches the update operations in Task Manager (elastic#71470)
Summary
This PR attempts to batch update tasks in Task Manager in order to avoid overloading the Elasticsearch queue.
This is the 1st PR addressing #65551
Under the hood we now use a Reactive buffer accumulates all calls to the
update
api in the TaskStore and flushes after 50ms or when as many operations as there are workers have been buffered (whichever comes first).Checklist
Delete any items that are not applicable to this PR.
For maintainers