-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply back pressure in Task Manager whenever Elasticsearch responds with a 429 #75666
Conversation
* WIP step 1 * WIP step 2 * Cleanup * Make maxWorkers an observable for the task pool * Cleanup * Fix test failures * Use BehaviorSubject * Add some tests
* Add errors$ observable to the task store * Add unit tests
…29 errors occur (#77096) * WIP * Cleanup * Add error count to message * Reset observable values on stop * Add comments * Fix issues when changing configurations * Cleanup code * Cleanup pt2 * Some renames * Fix typecheck * Use observables to manage throughput * Rename class * Switch to createManagedConfiguration * Add some comments * Start unit tests * Add logs * Fix log level * Attempt at adding integration tests * Fix test failures * Fix timer * Revert "Fix timer" This reverts commit 0817e5e. * Use Symbol * Fix merge scan
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
@elasticmachine merge upstream |
…into feature/task_manager_429 * 'feature/task_manager_429' of github.com:elastic/kibana: (158 commits) Add license check to direct package upload handler. (#79653) [Ingest Manager] Rename API /api/ingest_manager => /api/fleet (#79193) [Security Solution][Resolver] Simplify CopyableField styling and add comments (#79594) Fine-tunes ML related text on Metrics UI (#79425) [ML] DF Analytics creation wizard: ensure job creation possible when model memory lower than estimate (#79229) Add new "Add Data" tutorials (#77237) Update APM telemetry docs (#79583) Revert "Add support for runtime field types to mappings editor. (#77420)" (#79611) Kibana request headers (#79218) ensure missing indexPattern error is bubbled up to error callout (#79378) Missing space fix (#79585) remove duplicate tab states (#79501) [data.ui] Lazy load UI components in data plugin. (#78889) Add generic type params to search dependency. (#79608) [Ingest Manager] Internal action for policy reassign (#78493) [ILM] Add index_codec to forcemerge action in hot and warm phases (#78175) [Ingest Manager] Update open API spec and add condition to agent upgrade endpoint (#79579) [ML] Hide Data Grid column options when histogram charts are enabled. (#79459) [Telemetry] Synchronous `setup` and `start` methods (#79457) [Observability] Persist time range across apps (#79258) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Curious if there would be a way to simulate a heavy ES load and test this out locally?
@elasticmachine merge upstream |
Thanks for bringing this up, good point! I added a |
…ith a 429 (elastic#75666) * Make task manager maxWorkers and pollInterval observables (elastic#75293) * WIP step 1 * WIP step 2 * Cleanup * Make maxWorkers an observable for the task pool * Cleanup * Fix test failures * Use BehaviorSubject * Add some tests * Make the task manager store emit error events (elastic#75679) * Add errors$ observable to the task store * Add unit tests * Temporarily apply back pressure to maxWorkers and pollInterval when 429 errors occur (elastic#77096) * WIP * Cleanup * Add error count to message * Reset observable values on stop * Add comments * Fix issues when changing configurations * Cleanup code * Cleanup pt2 * Some renames * Fix typecheck * Use observables to manage throughput * Rename class * Switch to createManagedConfiguration * Add some comments * Start unit tests * Add logs * Fix log level * Attempt at adding integration tests * Fix test failures * Fix timer * Revert "Fix timer" This reverts commit 0817e5e. * Use Symbol * Fix merge scan * replace startsWith with a timer that is scheduled to 0 * typo Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Gidi Meir Morris <[email protected]>
…otphase-to-formlib * 'master' of github.com:elastic/kibana: (59 commits) [Security Solution][Resolver] Replace copy-to-clipboard with native navigator.clipboard (elastic#80193) [Security Solution] Reduce initial bundle size (elastic#78992) [Security Solution][Resolver] Fix Resize node box-shadow bug (elastic#80223) Move observability content (elastic#79978) skip flaky suite (elastic#79389) removing kibana_datatable` in favor of `datatable` (elastic#75184) [ML] Fixes for anomaly swim lane (elastic#80299) [Lens] Smokescreen lens test unskip (elastic#80190) Improved AlertsClient tests structure by splitting a huge alerts_client.tests.ts file into a specific files defined by its responsibility. (elastic#80088) [APM] React key warning when opening popover with external resources (elastic#80328) [Step 1] use Observables on server search API (elastic#79874) Apply back pressure in Task Manager whenever Elasticsearch responds with a 429 (elastic#75666) [Lens] Leverage original http request error (elastic#79831) [Security Solution][Case] Improve ServiceConnectorCaseParams type (elastic#80109) [SECURITY_SOLUTION] Fix query on alert histogram (elastic#80219) [DOCS] Update ingest node pipelines doc (elastic#79187) [Ingest Manager] Split up OpenAPI spec file (elastic#80107) [SECURITY_SOLUTION][ENDPOINT] Fix label on Trusted App create name field (elastic#80001) [Ingest Manager] Fix agent policy bump revision to create only one POLICY_CHANGE action (elastic#80081) Grid layout fixes (elastic#80305) ... # Conflicts: # x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/components/phases/shared/data_tier_allocation_field.tsx # x-pack/plugins/index_lifecycle_management/public/shared_imports.ts
…ith a 429 (#75666) (#80355) * Make task manager maxWorkers and pollInterval observables (#75293) * WIP step 1 * WIP step 2 * Cleanup * Make maxWorkers an observable for the task pool * Cleanup * Fix test failures * Use BehaviorSubject * Add some tests * Make the task manager store emit error events (#75679) * Add errors$ observable to the task store * Add unit tests * Temporarily apply back pressure to maxWorkers and pollInterval when 429 errors occur (#77096) * WIP * Cleanup * Add error count to message * Reset observable values on stop * Add comments * Fix issues when changing configurations * Cleanup code * Cleanup pt2 * Some renames * Fix typecheck * Use observables to manage throughput * Rename class * Switch to createManagedConfiguration * Add some comments * Start unit tests * Add logs * Fix log level * Attempt at adding integration tests * Fix test failures * Fix timer * Revert "Fix timer" This reverts commit 0817e5e. * Use Symbol * Fix merge scan * replace startsWith with a timer that is scheduled to 0 * typo Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Gidi Meir Morris <[email protected]> Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Gidi Meir Morris <[email protected]>
💔 Build Failed
Failed CI Steps
Test FailuresChrome X-Pack UI Functional Tests.x-pack/test/functional/apps/maps.maps app "before all" hook in "maps app"Standard Out
Stack Trace
X-Pack API Integration Tests (Security Basic).x-pack/test/api_integration/apis/security/index_fields·ts.security (basic license) Index Fields "before all" hook in "Index Fields"Standard Out
Stack Trace
Chrome UI Functional Tests.test/functional/apps/getting_started/_shakespeare·js.Getting Started Shakespeare should configure Terms aggregation on play_nameStandard Out
Stack Trace
and 10 more failures, only showing the first 3. Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Resolves #65553.
In this PR, I'm implementing the proposal discussed here.
This PR represents a feature branch where pieces of work are getting merged:
How to test this locally?
yarn es snapshot --ssl -E path.data=/my/path
yarn start --ssl
)yarn es snapshot --ssl -E path.data=/my/path -E thread_pool.write.queue_size=3
yarn start --ssl
)Note for docs
This is worth documenting somewhere that Task Manager will reduce max workers and increase poll interval whenever Elasticsearch responds with 429 errors on task manager CRUD operations.