Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task Manager] Add caching to the task partitioning logic #189562

Merged
merged 10 commits into from
Aug 5, 2024

Conversation

doakalexi
Copy link
Contributor

@doakalexi doakalexi commented Jul 30, 2024

Resolves #189119

Summary

This PR adds a mechanism to keep the node's calculated partitions in cache for 10 seconds before calling the discovery service again and recalculating them.

Checklist

To verify

  • Add the following to kibana.yml:
xpack.task_manager.claim_strategy: 'unsafe_mget'
  • Verify that the cache is being updated every 10 seconds, you could test this by adding a log statement with the timestamp in the TaskPartitioner code that was updated
  • Verify that on start the cache is updated on the initial claiming cycle

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi doakalexi changed the title Adding caching to the task partition logic [Task Manager] Add caching to the task partitioning logic Jul 30, 2024
@doakalexi doakalexi added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) release_note:skip Skip the PR/issue when compiling release notes v8.16.0 labels Jul 30, 2024
@doakalexi
Copy link
Contributor Author

/ci

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#6647

[✅] x-pack/test/task_manager_claimer_mget/config.ts: 25/25 tests passed.

see run history

@doakalexi doakalexi marked this pull request as ready for review July 30, 2024 22:20
@doakalexi doakalexi requested a review from a team as a code owner July 30, 2024 22:20
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@doakalexi doakalexi requested review from ymao1 and adcoelho July 30, 2024 22:20
Copy link
Contributor

@adcoelho adcoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and seems to be working as expected.

I added a console.log(this.podPartitionsLastUpdated) to x-pack/plugins/task_manager/server/lib/task_partitioner.ts line 54 and see the timestamps being updated.
Screenshot 2024-08-01 at 10 16 53

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments about error handling, otherwise LGTM


// update the pod partitions cache after 10 seconds
if (now - lastUpdated >= CACHE_INTERVAL) {
const allPodNames = await this.getAllPodNames();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an error retrieving this, should we return the cached partitions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good, resolved in this commit e39827c

@@ -67,4 +76,19 @@ describe('getPartitions()', () => {
247, 249, 250, 252, 253, 255,
]);
});

test('correctly caches the partitions on 10 second interval ', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for handling errors from the discovery service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, resolved in this commit e39827c

@doakalexi doakalexi enabled auto-merge (squash) August 5, 2024 14:57
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #71 / aiops log rate analysis with 'kibana_sample_data_logstsdb' kibana sample data logs displays index details

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@doakalexi doakalexi merged commit d79bdfd into elastic:main Aug 5, 2024
40 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task Manager] Add caching to the task partitioning logic
6 participants