Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor bulk update tags to fix issue with concurrently removing multiple tags #143543

Merged
merged 12 commits into from
Nov 8, 2022

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Oct 18, 2022

Summary

Fixing #142330

Refactored bulk update tags to use updateByQuery with script to add and remove agent tags.

Using conflicts: abort setting, so that the retry task can try later for two concurrent update tags actions.
This works as expected so that removing 2 tags from 50k agents quickly eventually succeeds.

Pending:

  • Hosted agent are filtered out, so those will never have a corresponding action result, leaving the bulk action in progress forever. This has to be remediated by querying hosted agents separately and saving an error action result for them.

To verify:

  • Create 50k agent documents with create_agents.ts script
  • Select all on UI and add two tags: tag1, tag2. Wait for it to propagate in agent list (takes up to 30 seconds for agent list to be refreshed).
  • Remove the two tags quickly. Wait for the actions to propagate. The two tags should be successfully removed from all agents and deleted.
  • Agent activity should show the actions as succeeded for 50k agents

Existing scenarios should not be impacted by the refactor:

  • When the bulk selection contains hosted agents (agents assigned to managed agent policies), the hosted agents should not be updated with tags, and the action will fail (fail count equal to hosted agents).
  • When selecting all agents in current page, and adding/removing tags, the action should succeed as before, and agent activity should show the right info (e.g. 20 agents succeeded)
remove_2_tags_at_once_50k.mov

Kibana log entries showing the concurrent action execution:

[2022-11-07T13:17:52.551+01:00][INFO ][plugins.fleet] Running action asynchronously, actionId: 2046beaf-8de8-45f4-b689-6000bbb86d3e, total agents: 50000
[2022-11-07T13:17:52.595+01:00][INFO ][plugins.fleet] Scheduling task fleet:update_agent_tags:retry:check:2046beaf-8de8-45f4-b689-6000bbb86d3e
[2022-11-07T13:17:53.737+01:00][INFO ][plugins.fleet] Running action asynchronously, actionId: 7e779438-f1a4-430c-8c45-c040961bb079, total agents: 50000
[2022-11-07T13:17:53.803+01:00][INFO ][plugins.fleet] Scheduling task fleet:update_agent_tags:retry:check:7e779438-f1a4-430c-8c45-c040961bb079
[2022-11-07T13:17:53.859+01:00][ERROR][plugins.fleet] Action failed: Caught error: {"name":"ResponseError","meta":{"body":{"took":39,"timed_out":false,"total":50000,"updated":0,"deleted":0,"batches":1,"version_conflicts":1000,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1,"throttled_until_millis":0,"failures":[{"index":".fleet-agents-7","id":"v789UYQByzusJdx86FTK","cause":{"type":"version_conflict_engine_exception","reason":"[v789UYQByzusJdx86FTK]: version conflict, required seqNo [2996365], primary term [1]. current document has seqNo [3040365] and primary term [1]","index_uuid":"FL5v_6dUSSqvSeZq8lN1hQ","shard":"0","index":".fleet-agents-7"},"status":409},{"index":".fleet-agents-7","id":"wL89UYQByzusJdx86FTK","cause":{"type":"version_conflict_engine_exception","reason":"[wL89UYQByzusJdx86FTK]: version conflict, required seqNo [2996366], primary term [1]. current document has seqNo [3040366] and primary term [1]","index_uuid":"FL5v_6dUSSqvSeZq8lN1hQ","shard":"0","index":".fleet-agents-7"},"status":409},{"index":".fleet-agents-
[2022-11-07T13:17:53.944+01:00][INFO ][plugins.fleet] Scheduling task fleet:update_agent_tags:retry:7e779438-f1a4-430c-8c45-c040961bb079
[2022-11-07T13:17:53.944+01:00][INFO ][plugins.fleet] Retrying in task: fleet:update_agent_tags:retry:7e779438-f1a4-430c-8c45-c040961bb079
[2022-11-07T13:17:57.627+01:00][DEBUG][plugins.fleet] {"took":5019,"timed_out":false,"total":50000,"updated":50000,"deleted":0,"batches":50,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1,"throttled_until_millis":0,"failures":[]}
[2022-11-07T13:17:59.376+01:00][INFO ][plugins.fleet] Running bulk action retry task
[2022-11-07T13:17:59.377+01:00][DEBUG][plugins.fleet] Retry #1 of task fleet:update_agent_tags:retry:7e779438-f1a4-430c-8c45-c040961bb079
[2022-11-07T13:17:59.377+01:00][INFO ][plugins.fleet] Running action asynchronously, actionId: 7e779438-f1a4-430c-8c45-c040961bb079, total agents: 50000
[2022-11-07T13:17:59.377+01:00][INFO ][plugins.fleet] Completed bulk action retry task
[2022-11-07T13:17:59.381+01:00][INFO ][plugins.fleet] Scheduling task fleet:update_agent_tags:retry:check:7e779438-f1a4-430c-8c45-c040961bb079
[2022-11-07T13:18:02.026+01:00][INFO ][plugins.fleet] processed 50000 agents, took 5019ms
[2022-11-07T13:18:02.026+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:2046beaf-8de8-45f4-b689-6000bbb86d3e
[2022-11-07T13:18:04.369+01:00][DEBUG][plugins.fleet] {"took":4980,"timed_out":false,"total":50000,"updated":50000,"deleted":0,"batches":50,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1,"throttled_until_millis":0,"failures":[]}
[2022-11-07T13:18:07.159+01:00][INFO ][plugins.fleet] processed 50000 agents, took 4980ms
[2022-11-07T13:18:07.160+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:7e779438-f1a4-430c-8c45-c040961bb079

Checklist

@juliaElastic juliaElastic self-assigned this Oct 18, 2022
@juliaElastic juliaElastic added the release_note:skip Skip the PR/issue when compiling release notes label Oct 18, 2022
@juliaElastic juliaElastic force-pushed the fix/update-tags-script branch from 14403f8 to 1f85820 Compare October 18, 2022 14:03
@juliaElastic juliaElastic added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 24, 2022
@juliaElastic juliaElastic changed the title [WIP] update tags refactor Refactor bulk update tags to fix issue with concurrently removing multiple tags Nov 7, 2022
@juliaElastic juliaElastic marked this pull request as ready for review November 7, 2022 10:43
@juliaElastic juliaElastic requested a review from a team as a code owner November 7, 2022 10:43
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

Comment on lines 129 to 130
ctx._source.tags.removeAll(params.tagsToRemove);
ctx._source.tags.addAll(params.tagsToAdd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no just having that code block, instead of the replaceAll before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the use case of renaming a tag, we don't want the order to change. This is covered by the replaceAll logic to replace the tag in place.

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good to me 🚀 (not tested locally)

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 58 64 +6
osquery 108 113 +5
securitySolution 440 446 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 66 72 +6
osquery 109 115 +6
securitySolution 517 523 +6
total +20

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@juliaElastic juliaElastic merged commit 0017a08 into elastic:main Nov 8, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants