-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task Manager] Correctly handle running
tasks when calling RunNow and reduce flakiness in related tests
#73244
[Task Manager] Correctly handle running
tasks when calling RunNow and reduce flakiness in related tests
#73244
Conversation
await delay(100); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional delay caused by the buffering of update
s allows it to fail every once in a while as the runNow
sees the task as still running and doesn't try to run it concurrently.
So we delay here a few ms to side step that.
@@ -456,7 +456,7 @@ export async function awaitTaskRunResult( | |||
) | |||
); | |||
} else if (isTaskClaimEvent(taskEvent)) { | |||
reject( | |||
return reject( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addresses the Socket timeout - we weren't short circuiting the return here which caused some callbacks to fail weirdly at times.
* master: (126 commits) [ML] Disabling ML if license feature is disabled (elastic#73187) [ML] Fixing old _xpack style es endpoint paths (elastic#73667) [DOCS] [Lens] 7.9 docs refresh (elastic#72301) [ML] DF Analytics results: ensure `View` link is only enabled when job has successfully completed (elastic#73539) Set timeRange to default to trigger the error message (elastic#73629) [ML] Functional tests - stabilize DFA navigation and index pattern handling (elastic#73660) [ILM] Add links to "Snapshot and Restore" from ILM "wait for snapshot policy" (elastic#72473) [kbn-storybook] Update Storybook to 5.3.19 (elastic#73320) [Metrics UI] Fix hasData call to ensure it has data not just indices (elastic#72969) [Uptime] Use `service.name` to link from Uptime -> APM where available (elastic#73618) allow others to update `URL.revokeObjectURL` property if needed (elastic#73639) regen docs (elastic#73650) [Visualize] Fix inspector download filename issue when saving in-place (elastic#72605) [Data] Query Input String manager (elastic#72093) [Security Solutions] Add tooltips (elastic#73436) Do not render descriptionless actions within an EuiCard (elastic#73611) [Security Solution][Detections] Value Lists Modal supports multiple exports (elastic#73532) [Security Solution][Resolver] Handle disabled process collection (elastic#73592) [Security_Solution][Bug] Fix user name/domain to ECS structure (elastic#73530) [Security Solution][Exceptions] - Update rule.exceptions_list to include exception list list_id (elastic#73349) ...
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
@elasticmachine merge upstream |
* master: (38 commits) [Discover] Context unskip date nanos functional tests (elastic#73781) [ML] Migrate to React BrowserRouter and Kibana provided History. (elastic#71941) [Discover] Improve saveSearch functional test handling (elastic#73626) [Metrics UI] Fix all threshold alert conditions disappearing due to alert prefill (elastic#73708) [Metrics UI] Fix alert previews of ungrouped alerts (elastic#73735) [SIEM] Fixes "include building block button" to operate (elastic#73900) [Metrics UI] Fix alert management to open without refresh (elastic#73739) [Security Solution][Lists] - Tests cleanup and remove unnecessary import (elastic#73865) [Ingest Management] main branch uses epr-snapshot. Others production (elastic#73555) [Canvas][tech-debt] Fix SVG not shrinking vertically properly (elastic#73867) [Maps] upgrade turf (elastic#73816) [Security Solution][Telemetry] Concurrent telemetry requests (elastic#73558) [Security Solution][Exceptions] - Update how nested entries are displayed in exceptions viewer (elastic#73745) [Security Solution][Exceptions] Adds autocomplete workaround for .text fields (elastic#73761) [Metrics UI] Fix previewing of No Data results (elastic#73753) Closes elastic#72914 by hiding anomaly detection settings links when the ml plugin is disabled. (elastic#73638) [Ingest Manager] Fix config selection in enrollment flyout from config list page (elastic#73833) [DOCS] Fixes typo in Alerting actions (elastic#73756) [APM] fixes linking errors to ML and Discover (elastic#73758) Handle promise rejections when building artifacts (elastic#73831) ...
* master: (39 commits) [Canvas][tech-debt] Rename __examples__ to __stories__ (elastic#73853) [Canvas] Storybook Redux Addon (elastic#73227) Use "Apply_filter_trigger" in "explore underlying data" action (elastic#71445) [maps] convert top nav config to TS (elastic#73851) [maps] fix fit to bounds for ES document layers with joins (elastic#73985) [Canvas][tech-debt] Refactor Toolbar (completes Kill Recompose.pure) (elastic#73309) [CI] In-progress Slack notifications (elastic#74012) [SIEM][Detection Engine] Fixes tags to accept characters such as AND, OR, (, ), ", * (elastic#74003) [SECURITY_SOLUTION][ENDPOINT] Fix host list Configuration Status cell link loosing list page/size state (elastic#73989) Tweak injected metadata (elastic#73990) Closes elastic#73998 by using `canAccessML` in the ML capabilities API to (elastic#73999) [SIEM] Fixes toaster errors when siemDefault index is an empty or empty spaces (elastic#73991) [Security Solution] Fix timeline pin event callback (elastic#73981) [Security Solution] Fix unexpected redirect (elastic#73969) [Metrics UI] Fix Metrics Explorer TSVB link to use workaround pattern (elastic#73986) [APM] docs: Update machine learning integration (elastic#73597) [Ingest Manager] Fix limited concurrency helper (elastic#73976) [build/sysv] fix missing env variable rename (elastic#73977) Fix a typo. (elastic#73948) [Ingest Manager] Revert fleet config concurrency rollout to rate limit (elastic#73940) ...
running
tasks when calling RunNow and reduce flakiness in related tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (74 commits) [Discover] Inline noWhiteSpace function (elastic#74331) [DOCS] Add Observability topic (elastic#73041) skip flaky suite (elastic#74327) [Security Solution][Detections] Fixes Severity Override not matching for Elastic Endpoint Security rule (elastic#74317) [Security Solution][Exceptions] - Fixes exceptions builder nested deletion issue and adds unit tests (elastic#74250) Fixed Alert details does not update page title and breadcrumb (elastic#74214) [src/dev/build] build Kibana Platform bundles from source (elastic#73591) [Reporting] Shorten asset path to help CLI FS Watcher (elastic#74185) Fix TMS not loaded in legacy maps (elastic#73570) [Security Solution] styling for notes' panel (elastic#74274) [Security Solution][Tech Debt] cleans up ts-ignore issues and some smaller linter issues (elastic#74268) Make the actions plugin support generics (elastic#71439) [Security Solution] Keep original note creator (elastic#74203) [CI] Fix xpack kibana build dir in xpack visual regression script [CI] Fix baseline_capture job by adding parallel process number back [Monitoring] Ensure setup mode works on cloud but only for alerts (elastic#73127) [Maps] Custom color ramps should show correctly on the map for mvt layers (elastic#74169) [kbn/optimizer] remove unused modules (elastic#74195) [CI] Add pipeline task queue framework and merge workers into one (elastic#71268) Using msearch for tree api endpoint (elastic#73813) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
}); | ||
|
||
await delay(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for a delay after a retry.try
? Seems like a future cause of flakiness. The logic of the code above could be changed to wait until the task finished running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that was a mistake - this was meant to be await ensureTasksIndexRefreshed();
too 👍
Good catch!
}); | ||
|
||
await delay(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
…nd reduce flakiness in related tests (elastic#73244) This PR addresses two issues which caused several tests to be flaky in TM. When `runNow` was introduced to TM we added a pinned query which returned specific tasks by ID. This query does not have the filter applied to it which causes task to return when they're already marked as `running` but we didn't address these correctly which caused flakyness in the tests. This didn't cause a broken beahviour, but it did cause beahviour that was hard to reason about - we now address them correctly. It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user `refresh:true` on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.
…nd reduce flakiness in related tests (elastic#73244) This PR addresses two issues which caused several tests to be flaky in TM. When `runNow` was introduced to TM we added a pinned query which returned specific tasks by ID. This query does not have the filter applied to it which causes task to return when they're already marked as `running` but we didn't address these correctly which caused flakyness in the tests. This didn't cause a broken beahviour, but it did cause beahviour that was hard to reason about - we now address them correctly. It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user `refresh:true` on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.
…nd reduce flakiness in related tests (#73244) (#74386) This PR addresses two issues which caused several tests to be flaky in TM. When `runNow` was introduced to TM we added a pinned query which returned specific tasks by ID. This query does not have the filter applied to it which causes task to return when they're already marked as `running` but we didn't address these correctly which caused flakyness in the tests. This didn't cause a broken beahviour, but it did cause beahviour that was hard to reason about - we now address them correctly. It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user `refresh:true` on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.
…nd reduce flakiness in related tests (#73244) (#74387) This PR addresses two issues which caused several tests to be flaky in TM. When `runNow` was introduced to TM we added a pinned query which returned specific tasks by ID. This query does not have the filter applied to it which causes task to return when they're already marked as `running` but we didn't address these correctly which caused flakyness in the tests. This didn't cause a broken beahviour, but it did cause beahviour that was hard to reason about - we now address them correctly. It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user `refresh:true` on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
* master: (208 commits) Observability Overview fix extra basepath prepend for alerting fetch (elastic#74465) [Lens] Clean and inline disabling of react-hooks/exhaustive-deps eslint rule (elastic#70010) Skip "space with index pattern management disabled" functional test for cloud env (elastic#74073) Filter out non-security jobs when collecting Detections telemetry (elastic#74456) [Security Solution][Test] Enzyme test for related events button (elastic#74411) [SECURITY_SOLUTION] add z-index to get over nav bar (elastic#74427) Rename package configs SO to package policies (elastic#74422) [DOCS] Add Kibana alerts to Stack Monitoring (elastic#73762) skip flaky suite (elastic#71390) [ML] DF Analytics: adds functional tests for edit form (elastic#73885) Rename agent configs SO to agent policies (elastic#74397) [Jenkins] run CI when plugin readmes change (elastic#74388) [Metrics UI] Fix validating Metrics Explorer URL (elastic#74311) fixing encoding issue with \ for enroll command (elastic#74379) [Ingest Manager] Update package registry for testing to f6b01d (elastic#74341) Change experimental message for visualizations (elastic#74354) [Alerting] Reload the Alerts List when alerts are deleted (elastic#73715) [Enterprise Search] Fix/DRY out plugin i18n strings (elastic#74323) update empty prompt in analytics list (elastic#74174) [Task Manager] Correctly handle `running` tasks when calling RunNow and reduce flakiness in related tests (elastic#73244) ...
…chedule every polling interval (#74606) Fixes flaky tests in Task Manager and Alerting. The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over. This prevents that edge case which was effecting both TM in general and Alerting specifically.
…chedule every polling interval (elastic#74606) Fixes flaky tests in Task Manager and Alerting. The fix in elastic#73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over. This prevents that edge case which was effecting both TM in general and Alerting specifically.
…chedule every polling interval (elastic#74606) Fixes flaky tests in Task Manager and Alerting. The fix in elastic#73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over. This prevents that edge case which was effecting both TM in general and Alerting specifically.
…chedule every polling interval (#74606) (#74940) Fixes flaky tests in Task Manager and Alerting. The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over. This prevents that edge case which was effecting both TM in general and Alerting specifically.
…chedule every polling interval (#74606) (#74941) Fixes flaky tests in Task Manager and Alerting. The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over. This prevents that edge case which was effecting both TM in general and Alerting specifically.
Summary
This PR addresses two issues which caused several tests to be flaky in TM.
When
runNow
was introduced to TM we added a pinned query which returned specific tasks by ID.This query does not have the filter applied to it which causes task to return when they're marked as
running
but we didn't address these correctly which caused flakyness in the tests (rightfully) - we now address them correctly by messaging this correctly in the error.It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user
refresh:true
on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.closes #71390
Checklist
Delete any items that are not applicable to this PR.
For maintainers