Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add escape-hatch flag for skipping upgrade rate limiting #176823

Closed
kpollich opened this issue Feb 13, 2024 · 1 comment · Fixed by #176923
Closed

[Fleet] Add escape-hatch flag for skipping upgrade rate limiting #176823

kpollich opened this issue Feb 13, 2024 · 1 comment · Fixed by #176923
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kpollich
Copy link
Member

Today, Fleet enforces a 10 minute rate limit following an attempt to upgrade a given agent. Another upgrade cannot be attempted until this 10 minute timeout expires. This caused issues in the 8.12.1 release when we introduced a bug (elastic/fleet-server#3263) in Fleet Server that caused agents to become inadvertently rate limited in perpetuity.

We should introduce a flag that allows users to opt out of the rate limiting for extreme edge cases where rate limiting is behaving in an unexpected fashion. This flag should be documented alongside the existing force flags which allows upgrades to be restarted regardless of their current state.

@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Feb 13, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic juliaElastic self-assigned this Feb 14, 2024
juliaElastic added a commit that referenced this issue Feb 19, 2024
## Summary

Closes #176823

Added `skipRateLimitCheck` to be able to skip rate limiting on `upgrade`
and `bulk_upgrade` API as an escape hatch.

To verify:
- enroll an agent 8.11.4 and upgrade to 8.12.0
- within 10m, try upgrade again with the API - the upgrade should fail
- verify that the upgrade works if using the `skipRateLimitCheck` flag

Example:

```
POST kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade
{
  "version": "8.12.1"
}

// should return error
{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 was upgraded less than 10 minutes ago. Please wait 07m02s before trying again to ensure the upgrade will not be rolled back."
}

POST kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade
{
  "version": "8.12.1",
  "skipRateLimitCheck":true
}

// should return status 200 and upgrade action successful - check with action_status API
GET kbn:/api/fleet/agents/action_status

// bulk API
POST kbn:/api/fleet/agents/bulk_upgrade
{
  "version":"8.12.0",
  "agents":["8b3c4f46-aedb-447f-8a9e-13fe313a3463"],
  "start_time":"2024-02-14T14:08:23.599Z"
}

// should return 200, and action_status should report failed status
GET kbn:/api/fleet/agents/action_status

Response:
    {
      "type": "UPGRADE",
      "status": "FAILED",
      "latestErrors": [
        {
          "agentId": "8b3c4f46-aedb-447f-8a9e-13fe313a3463",
          "error": "Agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 is not upgradeable: agent is already being upgraded.",
          "timestamp": "2024-02-14T14:36:47.749Z",
          "hostname": "agent1"
        }
      ]
    },

POST kbn:/api/fleet/agents/bulk_upgrade
{
  "version":"8.12.0",
  "agents":["8b3c4f46-aedb-447f-8a9e-13fe313a3463"],
  "start_time":"2024-02-14T14:08:23.599Z",
  "skipRateLimitCheck":true
}

// should return 200, and action itself complete too
GET kbn:/api/fleet/agents/action_status

   {
      "type": "UPGRADE",
      "status": "COMPLETE",
      "latestErrors": []
    },

```

Covered with API integration tests.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Feb 19, 2024
## Summary

Closes elastic#176823

Added `skipRateLimitCheck` to be able to skip rate limiting on `upgrade`
and `bulk_upgrade` API as an escape hatch.

To verify:
- enroll an agent 8.11.4 and upgrade to 8.12.0
- within 10m, try upgrade again with the API - the upgrade should fail
- verify that the upgrade works if using the `skipRateLimitCheck` flag

Example:

```
POST kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade
{
  "version": "8.12.1"
}

// should return error
{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 was upgraded less than 10 minutes ago. Please wait 07m02s before trying again to ensure the upgrade will not be rolled back."
}

POST kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade
{
  "version": "8.12.1",
  "skipRateLimitCheck":true
}

// should return status 200 and upgrade action successful - check with action_status API
GET kbn:/api/fleet/agents/action_status

// bulk API
POST kbn:/api/fleet/agents/bulk_upgrade
{
  "version":"8.12.0",
  "agents":["8b3c4f46-aedb-447f-8a9e-13fe313a3463"],
  "start_time":"2024-02-14T14:08:23.599Z"
}

// should return 200, and action_status should report failed status
GET kbn:/api/fleet/agents/action_status

Response:
    {
      "type": "UPGRADE",
      "status": "FAILED",
      "latestErrors": [
        {
          "agentId": "8b3c4f46-aedb-447f-8a9e-13fe313a3463",
          "error": "Agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 is not upgradeable: agent is already being upgraded.",
          "timestamp": "2024-02-14T14:36:47.749Z",
          "hostname": "agent1"
        }
      ]
    },

POST kbn:/api/fleet/agents/bulk_upgrade
{
  "version":"8.12.0",
  "agents":["8b3c4f46-aedb-447f-8a9e-13fe313a3463"],
  "start_time":"2024-02-14T14:08:23.599Z",
  "skipRateLimitCheck":true
}

// should return 200, and action itself complete too
GET kbn:/api/fleet/agents/action_status

   {
      "type": "UPGRADE",
      "status": "COMPLETE",
      "latestErrors": []
    },

```

Covered with API integration tests.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 31517ef)
kibanamachine referenced this issue Feb 21, 2024
…#177157)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[Fleet] added skipRateLimitCheck flag to upgrade API
(#176923)](#176923)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Julia
Bardi","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-02-19T09:23:18Z","message":"[Fleet]
added skipRateLimitCheck flag to upgrade API (#176923)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/176823\r\n\r\nAdded
`skipRateLimitCheck` to be able to skip rate limiting on
`upgrade`\r\nand `bulk_upgrade` API as an escape hatch.\r\n\r\nTo
verify:\r\n- enroll an agent 8.11.4 and upgrade to 8.12.0\r\n- within
10m, try upgrade again with the API - the upgrade should fail\r\n-
verify that the upgrade works if using the `skipRateLimitCheck`
flag\r\n\r\nExample:\r\n\r\n```\r\nPOST
kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade\r\n{\r\n
\"version\": \"8.12.1\"\r\n}\r\n\r\n// should return error\r\n{\r\n
\"statusCode\": 429,\r\n \"error\": \"Too Many Requests\",\r\n
\"message\": \"agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 was upgraded
less than 10 minutes ago. Please wait 07m02s before trying again to
ensure the upgrade will not be rolled back.\"\r\n}\r\n\r\nPOST
kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade\r\n{\r\n
\"version\": \"8.12.1\",\r\n \"skipRateLimitCheck\":true\r\n}\r\n\r\n//
should return status 200 and upgrade action successful - check with
action_status API\r\nGET kbn:/api/fleet/agents/action_status\r\n\r\n//
bulk API\r\nPOST kbn:/api/fleet/agents/bulk_upgrade\r\n{\r\n
\"version\":\"8.12.0\",\r\n
\"agents\":[\"8b3c4f46-aedb-447f-8a9e-13fe313a3463\"],\r\n
\"start_time\":\"2024-02-14T14:08:23.599Z\"\r\n}\r\n\r\n// should return
200, and action_status should report failed status\r\nGET
kbn:/api/fleet/agents/action_status\r\n\r\nResponse:\r\n {\r\n \"type\":
\"UPGRADE\",\r\n \"status\": \"FAILED\",\r\n \"latestErrors\": [\r\n
{\r\n \"agentId\": \"8b3c4f46-aedb-447f-8a9e-13fe313a3463\",\r\n
\"error\": \"Agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 is not
upgradeable: agent is already being upgraded.\",\r\n \"timestamp\":
\"2024-02-14T14:36:47.749Z\",\r\n \"hostname\": \"agent1\"\r\n }\r\n
]\r\n },\r\n\r\nPOST kbn:/api/fleet/agents/bulk_upgrade\r\n{\r\n
\"version\":\"8.12.0\",\r\n
\"agents\":[\"8b3c4f46-aedb-447f-8a9e-13fe313a3463\"],\r\n
\"start_time\":\"2024-02-14T14:08:23.599Z\",\r\n
\"skipRateLimitCheck\":true\r\n}\r\n\r\n// should return 200, and action
itself complete too\r\nGET kbn:/api/fleet/agents/action_status\r\n\r\n
{\r\n \"type\": \"UPGRADE\",\r\n \"status\": \"COMPLETE\",\r\n
\"latestErrors\": []\r\n },\r\n\r\n```\r\n\r\nCovered with API
integration tests.\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"31517ef1412212e4b8bd69999e25ceaef6e897e9","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","Team:Fleet","v8.13.0","v8.14.0"],"title":"[Fleet]
added skipRateLimitCheck flag to upgrade
API","number":176923,"url":"https://github.com/elastic/kibana/pull/176923","mergeCommit":{"message":"[Fleet]
added skipRateLimitCheck flag to upgrade API (#176923)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/176823\r\n\r\nAdded
`skipRateLimitCheck` to be able to skip rate limiting on
`upgrade`\r\nand `bulk_upgrade` API as an escape hatch.\r\n\r\nTo
verify:\r\n- enroll an agent 8.11.4 and upgrade to 8.12.0\r\n- within
10m, try upgrade again with the API - the upgrade should fail\r\n-
verify that the upgrade works if using the `skipRateLimitCheck`
flag\r\n\r\nExample:\r\n\r\n```\r\nPOST
kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade\r\n{\r\n
\"version\": \"8.12.1\"\r\n}\r\n\r\n// should return error\r\n{\r\n
\"statusCode\": 429,\r\n \"error\": \"Too Many Requests\",\r\n
\"message\": \"agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 was upgraded
less than 10 minutes ago. Please wait 07m02s before trying again to
ensure the upgrade will not be rolled back.\"\r\n}\r\n\r\nPOST
kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade\r\n{\r\n
\"version\": \"8.12.1\",\r\n \"skipRateLimitCheck\":true\r\n}\r\n\r\n//
should return status 200 and upgrade action successful - check with
action_status API\r\nGET kbn:/api/fleet/agents/action_status\r\n\r\n//
bulk API\r\nPOST kbn:/api/fleet/agents/bulk_upgrade\r\n{\r\n
\"version\":\"8.12.0\",\r\n
\"agents\":[\"8b3c4f46-aedb-447f-8a9e-13fe313a3463\"],\r\n
\"start_time\":\"2024-02-14T14:08:23.599Z\"\r\n}\r\n\r\n// should return
200, and action_status should report failed status\r\nGET
kbn:/api/fleet/agents/action_status\r\n\r\nResponse:\r\n {\r\n \"type\":
\"UPGRADE\",\r\n \"status\": \"FAILED\",\r\n \"latestErrors\": [\r\n
{\r\n \"agentId\": \"8b3c4f46-aedb-447f-8a9e-13fe313a3463\",\r\n
\"error\": \"Agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 is not
upgradeable: agent is already being upgraded.\",\r\n \"timestamp\":
\"2024-02-14T14:36:47.749Z\",\r\n \"hostname\": \"agent1\"\r\n }\r\n
]\r\n },\r\n\r\nPOST kbn:/api/fleet/agents/bulk_upgrade\r\n{\r\n
\"version\":\"8.12.0\",\r\n
\"agents\":[\"8b3c4f46-aedb-447f-8a9e-13fe313a3463\"],\r\n
\"start_time\":\"2024-02-14T14:08:23.599Z\",\r\n
\"skipRateLimitCheck\":true\r\n}\r\n\r\n// should return 200, and action
itself complete too\r\nGET kbn:/api/fleet/agents/action_status\r\n\r\n
{\r\n \"type\": \"UPGRADE\",\r\n \"status\": \"COMPLETE\",\r\n
\"latestErrors\": []\r\n },\r\n\r\n```\r\n\r\nCovered with API
integration tests.\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"31517ef1412212e4b8bd69999e25ceaef6e897e9"}},"sourceBranch":"main","suggestedTargetBranches":["8.13"],"targetPullRequestStates":[{"branch":"8.13","label":"v8.13.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.14.0","branchLabelMappingKey":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/176923","number":176923,"mergeCommit":{"message":"[Fleet]
added skipRateLimitCheck flag to upgrade API (#176923)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/176823\r\n\r\nAdded
`skipRateLimitCheck` to be able to skip rate limiting on
`upgrade`\r\nand `bulk_upgrade` API as an escape hatch.\r\n\r\nTo
verify:\r\n- enroll an agent 8.11.4 and upgrade to 8.12.0\r\n- within
10m, try upgrade again with the API - the upgrade should fail\r\n-
verify that the upgrade works if using the `skipRateLimitCheck`
flag\r\n\r\nExample:\r\n\r\n```\r\nPOST
kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade\r\n{\r\n
\"version\": \"8.12.1\"\r\n}\r\n\r\n// should return error\r\n{\r\n
\"statusCode\": 429,\r\n \"error\": \"Too Many Requests\",\r\n
\"message\": \"agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 was upgraded
less than 10 minutes ago. Please wait 07m02s before trying again to
ensure the upgrade will not be rolled back.\"\r\n}\r\n\r\nPOST
kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade\r\n{\r\n
\"version\": \"8.12.1\",\r\n \"skipRateLimitCheck\":true\r\n}\r\n\r\n//
should return status 200 and upgrade action successful - check with
action_status API\r\nGET kbn:/api/fleet/agents/action_status\r\n\r\n//
bulk API\r\nPOST kbn:/api/fleet/agents/bulk_upgrade\r\n{\r\n
\"version\":\"8.12.0\",\r\n
\"agents\":[\"8b3c4f46-aedb-447f-8a9e-13fe313a3463\"],\r\n
\"start_time\":\"2024-02-14T14:08:23.599Z\"\r\n}\r\n\r\n// should return
200, and action_status should report failed status\r\nGET
kbn:/api/fleet/agents/action_status\r\n\r\nResponse:\r\n {\r\n \"type\":
\"UPGRADE\",\r\n \"status\": \"FAILED\",\r\n \"latestErrors\": [\r\n
{\r\n \"agentId\": \"8b3c4f46-aedb-447f-8a9e-13fe313a3463\",\r\n
\"error\": \"Agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 is not
upgradeable: agent is already being upgraded.\",\r\n \"timestamp\":
\"2024-02-14T14:36:47.749Z\",\r\n \"hostname\": \"agent1\"\r\n }\r\n
]\r\n },\r\n\r\nPOST kbn:/api/fleet/agents/bulk_upgrade\r\n{\r\n
\"version\":\"8.12.0\",\r\n
\"agents\":[\"8b3c4f46-aedb-447f-8a9e-13fe313a3463\"],\r\n
\"start_time\":\"2024-02-14T14:08:23.599Z\",\r\n
\"skipRateLimitCheck\":true\r\n}\r\n\r\n// should return 200, and action
itself complete too\r\nGET kbn:/api/fleet/agents/action_status\r\n\r\n
{\r\n \"type\": \"UPGRADE\",\r\n \"status\": \"COMPLETE\",\r\n
\"latestErrors\": []\r\n },\r\n\r\n```\r\n\r\nCovered with API
integration tests.\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios","sha":"31517ef1412212e4b8bd69999e25ceaef6e897e9"}}]}]
BACKPORT-->

Co-authored-by: Julia Bardi <[email protected]>
fkanout pushed a commit to fkanout/kibana that referenced this issue Mar 4, 2024
## Summary

Closes elastic#176823

Added `skipRateLimitCheck` to be able to skip rate limiting on `upgrade`
and `bulk_upgrade` API as an escape hatch.

To verify:
- enroll an agent 8.11.4 and upgrade to 8.12.0
- within 10m, try upgrade again with the API - the upgrade should fail
- verify that the upgrade works if using the `skipRateLimitCheck` flag

Example:

```
POST kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade
{
  "version": "8.12.1"
}

// should return error
{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 was upgraded less than 10 minutes ago. Please wait 07m02s before trying again to ensure the upgrade will not be rolled back."
}

POST kbn:/api/fleet/agents/8b3c4f46-aedb-447f-8a9e-13fe313a3463/upgrade
{
  "version": "8.12.1",
  "skipRateLimitCheck":true
}

// should return status 200 and upgrade action successful - check with action_status API
GET kbn:/api/fleet/agents/action_status

// bulk API
POST kbn:/api/fleet/agents/bulk_upgrade
{
  "version":"8.12.0",
  "agents":["8b3c4f46-aedb-447f-8a9e-13fe313a3463"],
  "start_time":"2024-02-14T14:08:23.599Z"
}

// should return 200, and action_status should report failed status
GET kbn:/api/fleet/agents/action_status

Response:
    {
      "type": "UPGRADE",
      "status": "FAILED",
      "latestErrors": [
        {
          "agentId": "8b3c4f46-aedb-447f-8a9e-13fe313a3463",
          "error": "Agent 8b3c4f46-aedb-447f-8a9e-13fe313a3463 is not upgradeable: agent is already being upgraded.",
          "timestamp": "2024-02-14T14:36:47.749Z",
          "hostname": "agent1"
        }
      ]
    },

POST kbn:/api/fleet/agents/bulk_upgrade
{
  "version":"8.12.0",
  "agents":["8b3c4f46-aedb-447f-8a9e-13fe313a3463"],
  "start_time":"2024-02-14T14:08:23.599Z",
  "skipRateLimitCheck":true
}

// should return 200, and action itself complete too
GET kbn:/api/fleet/agents/action_status

   {
      "type": "UPGRADE",
      "status": "COMPLETE",
      "latestErrors": []
    },

```

Covered with API integration tests.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants