Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple ES outputs as long as they are the same ES #1684

Merged
merged 101 commits into from
Sep 7, 2022

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Jul 22, 2022

What is the problem this PR solves?

It fixes #1672. Currently fleet-server does not handle correctly the multiple ES outputs the fleet-ui might create.

How does this PR solve the problem?

It creates a non-breaking change on the agent documents to store ES API key information per output.

How to test this PR locally

  • add a second ES output:
    • go to Fleet > Settings
    • outputs > add output
    • just copy the data from the default output and git it another name. Let's call it non-default
  • create/edit a policy to use non-default output for integrations. Let the monitoring using the default:
    • navigate to Fleet > Agent policies > your_policy
    • go to the settings "tab"
    • select non-default as the output for Output for integrations. (You'll need a paid license or use Elastic Cloud)
  • enroll a Elastic Agent on this policy with some integration

At first both outputs should have their on API keys. Use elastic-agent inspect to check that.

Now modify the policy in a way that one of the outputs has its permissions changed, but not the other. I usually add and remove the Endpoint Security. Then both outputs should have the same API key, again, use the inspect command to check.

Here you can see how I reproduced the issue and my investigation

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's checklist

  • Modify the agent model to support multiple ES outputs: add fields, don't change the existing ones, keep backward compatible
  • ensure new agent model is correctly saved and loaded (Marshal/UnMarshaled)
  • add migration process to internal/pkg/dl/migration.go
    • add doc explaining the migration
  • when preparing ES output:
    • set the default API key to empty -> force all API keys to be regenerated
    • set old fields to empty string/nil
    • use new fields to store output API key data
  • update painless script to use new agent model
  • ensure the model.Agent aren't used (except for tests checking they aren't populated anymore)
    • update handleAck to use the new model
    • update monitor to use the new model
  • on upgrade bump the coordinator index
  • ensure we invalidate the api keys from DefaultAPIKeyHistory (the migration should take care of it)
  • add output name to api key metadata
  • use only new agent model
    • adjust internal/pkg/coordinator/monitor_integration_test.go to use new Agent model
    • adjust internal/pkg/coordinator/monitor.go to use new Agent model
    • ensure there is no reference to the deprecated agent fields. As the migration takes care of migrating every agent document, then it should be safe to remove the deprecated fields from the agent Schema
  • try/nice to have: on agent checkin, after fleet-server upgrade to 8.4, try to search api key by metadata.agentId and invalidate the "old" ones. So far I don't think it's possible.
    • I think the most viable here would be to invalidate all the API keys during the migration

Things to remember to test

  • ensure all API keys (access + outputs) are invalidated on Unenroll
  • ensure that given an agent which the default_api_key_history field ins't empty, these keys get retired on ack and unenroll
  • ensure the existing agents documents get migrated (a.k.a the migration happens)
  • ensure the migrations is idempotent. Start the new fleet-server a few times and ensure it does not try to update the already updated agent documents. it can also be checked by looking at the logs. At the en of each migration a log like the following is produced.
    log message: migration MIGRATION_NAME done
    full log:
{"_index":".ds-logs-elastic_agent.fleet_server-default-2022.08.11-000001","_id":"sft_kYIB00fSKmPF2k05","_version":1,"_score":0,"_source":{"fleet.migration.retries.search":0,"agent":{"name":"fleet-server-dev","id":"308052ab-bca4-4eaa-8787-904feea56fa0","type":"filebeat","ephemeral_id":"b21f3b26-c6a3-4da6-9edf-56fc01813ebf","version":"8.5.0"},"log":{"file":{"path":"/opt/Elastic/Agent/data/elastic-agent-6d830e/logs/default/fleet-server-20220812-15.ndjson"},"offset":6005},"fleet.migration.version_conflicts":0,"elastic_agent":{"id":"308052ab-bca4-4eaa-8787-904feea56fa0","version":"8.5.0","snapshot":true},"fleet.migration.es.timed_out":false,"fleet.migration.total":1,"ecs":{"version":"8.0.0"},"fleet.migration.total.duration":27.307903,"fleet.migration.deleted":0,"fleet.migration.updated":1,"host":{"hostname":"fleet-server-dev","os":{"kernel":"5.13.0-52-generic","codename":"impish","name":"Ubuntu","type":"linux","family":"debian","version":"21.10 (Impish Indri)","platform":"ubuntu"},"containerized":false,"ip":["10.0.2.15","fe80::6c:bfff:fe6f:c40","192.168.56.43","fe80::a00:27ff:fed8:3139"],"name":"fleet-server-dev","id":"c287e219ca4f4643aa206fc914242e26","mac":["02:6c:bf:6f:0c:40","08:00:27:d8:31:39"],"architecture":"x86_64"},"log.level":"info","event":{"agent_id_status":"verified","ingested":"2022-08-12T10:00:51Z","dataset":"elastic_agent.fleet_server"},"fleet.migration.es.took":26,"fleet.migration.retries.bulk":0,"fleet.migration.name":"AgentOutputs","service.name":"fleet-server","fleet.migration.noops":0,"message":"migration AgentOutputs done","input":{"type":"filestream"},"@timestamp":"2022-08-12T10:00:43.961Z","data_stream":{"namespace":"default","type":"logs","dataset":"elastic_agent.fleet_server"},"fleet.migration.batches":1},"fields":{"fleet.migration.retries.search":[0],"elastic_agent.version":["8.5.0"],"host.os.name.text":["Ubuntu"],"host.hostname":["fleet-server-dev"],"host.mac":["02:6c:bf:6f:0c:40","08:00:27:d8:31:39"],"fleet.migration.total":[1],"host.os.version":["21.10 (Impish Indri)"],"fleet.migration.total.duration":[27.307903],"host.os.name":["Ubuntu"],"log.level":["info"],"agent.name":["fleet-server-dev"],"host.name":["fleet-server-dev"],"event.agent_id_status":["verified"],"fleet.migration.retries.bulk":[0],"fleet.migration.name":["AgentOutputs"],"host.os.type":["linux"],"fleet.migration.noops":[0],"input.type":["filestream"],"log.offset":[6005],"data_stream.type":["logs"],"host.architecture":["x86_64"],"agent.id":["308052ab-bca4-4eaa-8787-904feea56fa0"],"ecs.version":["8.0.0"],"host.containerized":[false],"agent.version":["8.5.0"],"host.os.family":["debian"],"fleet.migration.version_conflicts":[0],"fleet.migration.es.timed_out":[false],"host.ip":["10.0.2.15","fe80::6c:bfff:fe6f:c40","192.168.56.43","fe80::a00:27ff:fed8:3139"],"agent.type":["filebeat"],"host.os.kernel":["5.13.0-52-generic"],"fleet.migration.deleted":[0],"fleet.migration.updated":[1],"elastic_agent.snapshot":[true],"host.id":["c287e219ca4f4643aa206fc914242e26"],"fleet.migration.es.took":[26],"service.name":["fleet-server"],"elastic_agent.id":["308052ab-bca4-4eaa-8787-904feea56fa0"],"data_stream.namespace":["default"],"host.os.codename":["impish"],"message":["migration AgentOutputs done"],"event.ingested":["2022-08-12T10:00:51.000Z"],"@timestamp":["2022-08-12T10:00:43.961Z"],"host.os.platform":["ubuntu"],"data_stream.dataset":["elastic_agent.fleet_server"],"log.file.path":["/opt/Elastic/Agent/data/elastic-agent-6d830e/logs/default/fleet-server-20220812-15.ndjson"],"fleet.migration.batches":[1],"agent.ephemeral_id":["b21f3b26-c6a3-4da6-9edf-56fc01813ebf"],"event.dataset":["elastic_agent.fleet_server"]}}

Related issues

Acceptance criteria

Considering #1779 is merged, which changes the API key life cycle

for all scenarios

  • On start up, 8.5 migrations runs successfully.
    • there should be no error log related to migrations, they'd contain either of the following.
      If any of them appear, please provide us the full log, with all its fields.
      • failed to run migrations for BC3+
      • failed to run subsystems for BC1 and BC2
    • a info log like migration [MIGRATION_NAME] done shows the migration is done. Where [MIGRATION_NAME] is one of AgentMetadata, AgentOutputs or PolicyCoordinatorIdx.
      • a error log like failed applying [MIGRATION_NAME] migration might appear after the migration done log.
        • please report the full log, with all its fields if that happens
  • Agents enroll successfully, and their integrations can send data to ES. That means the API keys have tne necessary permissions
  • Start with a basic/default policy, add then remove Endpoint Security and check:
    • the integrations can always send data to ES (before Endpoint Security, with Endpoint Security, after removing Endpoint Security)
    • the API key for each output remain the same
    • fleet-server does not invalidate them.
      • to check if fleet invalidate them, look for the log Invalidate old API keys. There will a fleet.policy.apiKeyIDsToRetire field with the list of API Keys invalidated

New cluster, no document on .fleet-agents index

  • nothing special to look for.

Upgrade scenario

  • .fleet-agents index with active agents
  • the following fields are populated on the agent documents:
    • default_api_key
    • default_api_key_id
    • policy_output_permissions_hash
  • upgrade fleet-server to 8.5
  • confirm the agents are health and their integrations are sending data
  • check that:
    • a new outputs field was created
    • the following fields are empty, null or not present
      • default_api_key
      • default_api_key_id
      • default_api_key_history
      • policy_output_permissions_hash
  • restart fleet-server and check the AC for all scenarios again.

apmmachine and others added 30 commits June 1, 2022 05:32
…duration (elastic#1604)

(cherry picked from commit 3958820)

Co-authored-by: Sean Cunningham <[email protected]>
(cherry picked from commit bdd20a1)

Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Copy link
Contributor

@aleksmaus aleksmaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo in the comments, otherwise LGTM

@@ -226,6 +226,8 @@ func (b *Bulker) flushBulk(ctx context.Context, queue queueT) error {
return fmt.Errorf("flushBulk failed, could not unmarshal ES response: %w", err)
}
if blk.HasErrors {
// We lack information to properly correlate this error with what has failed.
// Thus, for now it'd be mre noise tan information outside an investigation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: "mre noise tan" -> "more noise than"

Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, let's make sure qa will test this once again with upgrade flow.
i'm not looking forward to conflicts in my PR :D

@AndersonQ AndersonQ merged commit 63fdcbf into elastic:main Sep 7, 2022
@AndersonQ AndersonQ deleted the 2148-api-key branch September 7, 2022 07:48
@AndersonQ
Copy link
Member Author

@amolnater-qasource it's finally merged and ready for your tests. Please let me know if the necessary tests scenarios described in the PR are clear of if you need more details. Thank you!

@amolnater-qasource
Copy link
Collaborator

Hi @AndersonQ
Thank you for the update.
We will revalidate this issue as soon as #1836 is fixed on latest 8.5 SNAPSHOT.

Thanks

@cmacknz cmacknz added v8.5.0 and removed v8.4.0 labels Sep 14, 2022
AndersonQ added a commit that referenced this pull request Sep 19, 2022
AndersonQ added a commit that referenced this pull request Sep 19, 2022
* Revert "Fix v8.5.0 migration painless script (#1839)"

This reverts commit de5d74b.

* Revert "Allow multiple ES outputs as long as they are the same ES (#1684)"

This reverts commit 63fdcbf.
AndersonQ added a commit that referenced this pull request Sep 20, 2022
…e same ES" (#1879)

* Revert "Revert "Fix v8.5.0 migration painless script" (#1878)"
  This reverts commit ef9ca2b.

* Revert "Revert "Allow multiple ES outputs as long as they are the same ES (#1684)""
  This reverts commit bb696ac.

* avoid new API keys being marked for invalidation

Co-authored-by: Michal Pristas <[email protected]>
  He fixed the merge conflicts after Bulk API Keys update (#1779), commit 46ac14b, got merged
@amolnater-qasource
Copy link
Collaborator

Hi @AndersonQ

We have revalidated this PR on latest 8.5 BC1 Kibana cloud environment and had below observations:

Details are shared below:

Steps:

  1. Created same Elasticsearch output with name: non-default.
  2. Created policy with Integrations output as non-default.
  3. Installed an agent with this policy and run elastic-agent inspect command.
  4. Added Endpoint Security and again run elastic-agent inspect command.
  5. Removed Endpoint Security and again run elastic-agent inspect command.

Observations:

Scenario 01: On installing agent on fresh 8.5 BC1.

  • On initial installation we observed different api_key for both outputs.
  • On adding Endpoint security and removing Endpoint security we observed no change in api_key.

Screenshots:
7
8
9

Scenario 02: On uninstalling and reinstalling agent.

  • On initial installation we observed different api_key for both outputs.
  • On adding Endpoint security and removing Endpoint security we observed no change in api_key.

Screenshots:
10
11
12

Build details:
BUILD: 56595
COMMIT: 0d8de4df69f8084a94cdd9638d7de510813cb5ce

Could you please confirm if these are the required fixes?
Please let us know if we are missing anything here.

Thanks

@AndersonQ
Copy link
Member Author

The PR to update API the API keys permission is also on the 8.5, that means, it changes how it needs to be validated. Besides there was a problem with 8.5 BC1. That said, I'll need to update what needs to be checked on the tests and due to the recent changes I made, it's better to validate again. But I'll update the ACs and then ping you folks.

@amolnater-qasource
Copy link
Collaborator

Hi @AndersonQ

Thank you for the update.
Please let us know once it is available for revalidation.

@AndersonQ
Copy link
Member Author

@amolnater-qasource, here they are. I updated the PR description with them as well

Acceptance criteria

for all scenarios

  • On start up, 8.5 migrations runs successfully.
    • there should be no error log related to migrations, they'd contain either of the following.
      If any of them appear, please provide us the full log, with all its fields.
      • failed to run migrations for BC3+
      • failed to run subsystems for BC1 and BC2
    • a info log like migration [MIGRATION_NAME] done shows the migration is done. Where [MIGRATION_NAME] is one of AgentMetadata, AgentOutputs or PolicyCoordinatorIdx.
      • a error log like failed applying [MIGRATION_NAME] migration might appear after the migration done log.
        • please report the full log, with all its fields if that happens
  • Agents enroll successfully, and their integrations can send data to ES. That means the API keys have tne necessary permissions
  • Start with a basic/default policy, add then remove Endpoint Security and check:
    • the integrations can always send data to ES (before Endpoint Security, with Endpoint Security, after removing Endpoint Security)
    • the API key for each output remain the same
    • fleet-server does not invalidate them.
      • to check if fleet invalidate them, look for the log Invalidate old API keys. There will a fleet.policy.apiKeyIDsToRetire field with the list of API Keys invalidated

New cluster, no document on .fleet-agents index

  • nothing special to look for.

Upgrade scenario

  • .fleet-agents index with active agents
  • the following fields are populated on the agent documents:
    • default_api_key
    • default_api_key_id
    • policy_output_permissions_hash
  • upgrade fleet-server to 8.5
  • confirm the agents are health and their integrations are sending data
  • check that:
    • a new outputs field was created
    • the following fields are empty, null or not present
      • default_api_key
      • default_api_key_id
      • default_api_key_history
      • policy_output_permissions_hash
  • restart fleet-server and check the AC for all scenarios again.

@amolnater-qasource
Copy link
Collaborator

Hi @AndersonQ

Thank you for sharing the detailed testing guidelines.
We will revalidate this once 8.5 BC3 build will be available.

Thanks!

@amolnater-qasource
Copy link
Collaborator

Hi @AndersonQ
We have revalidated this issue on latest 8.5 BC6 Kibana cloud-production environment and had below observations:

Non-Upgrade scenario:

  • Two different api_keys for different outputs
  • On any operation adding/removing endpoint defend, we didn't observe any change in api_keys

Screenshot:
18

Upgrade scenario:

  • We checked data for fields available under .feet-agents:
    default_api_key
    default_api_key_id
    default_api_key_history
    policy_output_permissions_hash

  • The same data is available on 8.5.0 BC6 post upgrade under .feet-agents:
    default_api_key
    default_api_key_id
    default_api_key_history
    policy_output_permissions_hash

Screenshots:
8.4.3:
8
9

8.5.0 BC6:
14
15

Post Kibana and Agent Upgrade:
20

Could you please confirm if this is the expected fix.
Please let us know if we are missing any scenario to be tested.

Thanks

blakerouse added a commit that referenced this pull request Nov 8, 2022
* [Automation] Update elastic stack version to 8.5.0-6b9f92c0 for testing (#1756)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-0616acda for testing (#1760)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-dd6f2bb0 for testing (#1765)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-feb644de for testing (#1768)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-7783a03c for testing (#1776)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-17b8a62d for testing (#1780)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-9aed3b11 for testing (#1784)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-440e0896 for testing (#1788)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-fedc3e60 for testing (#1791)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-b5001a6d for testing (#1795)

Co-authored-by: apmmachine <[email protected]>

* ci: move to fleet-ci (#1199)

* Fic path to the packaging (#1806)

* Fix gcs credentials for packaging (#1807)

* [Automation] Update elastic stack version to 8.5.0-de69302b for testing (#1822)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-1bd77fc1 for testing (#1826)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-167dfc80 for testing (#1831)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-6b7dda2d for testing (#1835)

Co-authored-by: apmmachine <[email protected]>

* Allow multiple ES outputs as long as they are the same ES (#1684)

* add 'outputs' field to the ES agent schema to store the API key data and permission hash for each ES output

* add output name to API key metadata

* add v8.5 migration to migration.go

* add migration docs and improve logging

* group migration functions per version

* [Automation] Update elastic stack version to 8.5.0-4140365c for testing (#1837)

Co-authored-by: apmmachine <[email protected]>

* updating upgrade_status: completed (#1833)

* updating upgrade_status: completed

* updated schema.json and regenerated schema.go

* updated license headers

* Fix v8.5.0 migration painless script (#1839)

* fix v8.5.0 migration painless script

* [Automation] Update elastic stack version to 8.5.0-8e906f9f for testing (#1843)

Co-authored-by: apmmachine <[email protected]>

* ci: rename dra staging for release dra release staging (#1840)

* Remove events from agent checkin body. (#1842)

Remove the events attribute from the agent checkin body. Note that
removal of the attribute will not stop the server from issuing a 400 if
the response body is too long. The removal is so that the checkin code
on the fleet-server and agent remain comparable.

Co-authored-by: Blake Rouse <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-589a4a10 for testing (#1852)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-37418cf3 for testing (#1855)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-fcf3d4c2 for testing (#1862)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-c7913db3 for testing (#1868)

Co-authored-by: apmmachine <[email protected]>

* Add error detail to catch-all HTTP response (#1854)

* Make authc log debug and add cache hit field (#1870)

* Document Go 1.18 certificate change in changelog. (#1871)

* Revert "Fix v8.5.0 migration painless script" (#1878)

* Revert "Fix v8.5.0 migration painless script (#1839)"

This reverts commit de5d74b.

* Revert "Allow multiple ES outputs as long as they are the same ES (#1684)"

This reverts commit 63fdcbf.

* [Automation] Update elastic stack version to 8.5.0-56d2c52d for testing (#1880)

Co-authored-by: apmmachine <[email protected]>

* Bulk API Keys update (#1779)

Bulk API Keys update (#1779)

* Fix and reintroduce "Allow multiple ES outputs as long as they are the same ES" (#1879)

* Revert "Revert "Fix v8.5.0 migration painless script" (#1878)"
  This reverts commit ef9ca2b.

* Revert "Revert "Allow multiple ES outputs as long as they are the same ES (#1684)""
  This reverts commit bb696ac.

* avoid new API keys being marked for invalidation

Co-authored-by: Michal Pristas <[email protected]>
  He fixed the merge conflicts after Bulk API Keys update (#1779), commit 46ac14b, got merged

* [Automation] Update elastic stack version to 8.5.0-7dc445a0 for testing (#1888)

Co-authored-by: apmmachine <[email protected]>

* Update pre-sets limits to avoid overlap. (#1891)

Update file max limits and env_defaults_test.go running make defaults to generate the new one

* [Release] add-backport-next (#1892)

* Bump version to 8.6.0 (#1895)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places. (#1896)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places.

* Add changelog entry.

* Update CHANGELOG.next.asciidoc

Co-authored-by: Craig MacKenzie <[email protected]>

Co-authored-by: Craig MacKenzie <[email protected]>

* Update apikey.cache_hit log field name to match convention (#1900)

* [Automation] Update elastic stack version to 8.6.0-21651da3 for testing (#1908)

Co-authored-by: apmmachine <[email protected]>

* LoadLimits does not override existing values (#1912)

Fleet-server will use any specified cache or server limit values over
whatever is returned by the default/agent number loader. For example, if
A max body size is specifically set to a value such as 5MB, and the
default returned by the LoadLimits is 1MB, the 5MB value is used.

* [Automation] Update elastic stack version to 8.6.0-326f84b0 for testing (#1916)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-df00693f for testing (#1925)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-a2f4f140 for testing (#1928)

Co-authored-by: apmmachine <[email protected]>

* Revert "updating upgrade_status: completed (#1833)" (#1920)

* Revert "updating upgrade_status: completed (#1833)"

This reverts commit 23be42a.

* Leaving in upgrade_status field for retry functionality

* Storing checkin message in last_checkin_message (#1932)

* Storing checkin message in last_checkin_message

* added changelog

* fixed tests

* Unique limiters for each API listener (#1904)

* Unique limiters for each API listener

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.

* Add listener address labeled logs to limiter

* Review feedback

* Apply suggestions from code review

Co-authored-by: Anderson Queiroz <[email protected]>

* review feedback

* fix import

* Fix test

Co-authored-by: Anderson Queiroz <[email protected]>

* Cleanup cmd/fleet/main.go (#1886)

* Replace cache.Config with config.Cache

* Move server setup from cmd/fleet to new pkg/server

* Move constants

* Fix imports and integration tests

* fix linter

* [Automation] Update elastic stack version to 8.6.0-158a13db for testing (#1938)

Co-authored-by: apmmachine <[email protected]>

* [8.6](forwardport) Add extra protection against accessing null fields to 8.5 migration (#1921) (#1926)

* [Automation] Update elastic stack version to 8.6.0-aea1c645 for testing (#1942)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-0fca2953 for testing (#1948)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-e4c15f15 for testing (#1954)

Co-authored-by: apmmachine <[email protected]>

* Conditional log level for api key read (#1946)

Conditional log level for api key read (#1946)

* Updated migration query to match items with deprecated field present (#1959)

Co-authored-by: Anderson Queiroz <[email protected]>

* Fix fleet.migration.total log key overlap (#1951)

Co-authored-by: Anderson Queiroz <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-d939cfde for testing (#1964)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-7c9f25a9 for testing (#1969)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-c49fac70 for testing (#1976)

Co-authored-by: apmmachine <[email protected]>

* Update to Go 1.18.7. (#1978)

* [Automation] Update elastic stack version to 8.6.0-5a8d757d for testing (#1981)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-40086bc7 for testing (#1987)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-233dc5d4 for testing (#1990)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-54a302f0 for testing (#1995)

Co-authored-by: apmmachine <[email protected]>

* Don't send POLICY_CHANGE actions retrieved from index to agent. (#1963)

* Don't send POLICY_CHANGE actions retrieved from index to agent.

The fleet-server should not send any policy change actions that are
written to the actions index to an agent on checkin. The server will
remove these actions in the convert method and emit a warning message.
The ack token that is used is not altered in this case. Policy change
actions are dynamically generated by the fleet-server when it detects
that the agent is not running an up to date version of the policy.

* move filtering to its own method

* Fix linter, tests, fix file name

* [Automation] Update elastic stack version to 8.6.0-cae815eb for testing (#2000)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-6545f2df for testing (#2005)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-055acc83 for testing (#2011)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-baf193e8 for testing (#2016)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-22d60ec9 for testing (#2020)

Co-authored-by: apmmachine <[email protected]>

* Allow upgrade action to signal retry (#1887)

* Allow upgrade action to signal retry

Allow the ack of an upgrade action to set the upgrade status to
retrying.

* fix tests set failed state

* Fix broken test

* nil upgrade status by default

* Set agent to healthy in case of upgrade failure

* fix upgrade fields

* Fix tests

* [Automation] Update elastic stack version to 8.6.0-b8b35931 for testing (#2024)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-a892f234 for testing (#2030)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Add GH action to add issues to ingest board

Issues in this repo labeled with `Team:Fleet` will be added to the ingest board automatically w/ the `Fleet Server` area.

* Update add-issues-to-ingest-board.yml

* [Automation] Update elastic stack version to 8.6.0-89d224d2 for testing (#2034)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-949a38d2 for testing (#2039)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-26dc1164 for testing (#2045)

Co-authored-by: apmmachine <[email protected]>

* Add active filter for enrollment key queries. (#2044)

* Add active filter for enrollment key queries.

Add an active: true filter to enrollment key queries. This allows
fleet-server to handle cases where there may be 10+ inactive keys
associated with a policy.

* review feedback

* fix linter

* fix tests

* Fix test cases

* [Automation] Update elastic stack version to 8.6.0-4765d2b0 for testing (#2048)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-8a615646 for testing (#2050)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-3f5f98b7 for testing (#2051)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-f20b7179 for testing (#2056)

Co-authored-by: apmmachine <[email protected]>

* Run mod tidy.

* Run make notice.

* Fix intergration tests.

* Run go mod tidy and make notice.

* Fix path to fleet-server.yml in integration test.

* Fix race condition.

* Fix try 2.

* Fix race.

* Fix race try 2.

Co-authored-by: apmmachine <[email protected]>
Co-authored-by: apmmachine <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Michel Laterman <[email protected]>
Co-authored-by: Josh Dover <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
Co-authored-by: Michal Pristas <[email protected]>
Co-authored-by: Julien Lind <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Kyle Pollich <[email protected]>
blakerouse added a commit that referenced this pull request Nov 9, 2022
* Support for Elastic Agent V2 status (#1747)

* Support for Elastic Agent V2 status

* Make 'make check-ci' happy

* Add a check that 'components' is valid array

* Rename variable to better reflect it's meaning

* [v2] Switch to Elastic Agent v2 control protocol (#1751)

* Switch to new client.V2 for communication with Elastic Agent.

* Fix tests.

* Fix integration tests.

* Update go.sum.

* Fix some lint issues.

* Fix panic with agentInfo.

* Fix panic in logger reconfigure.

* Fixes for switching units.

* updated version (#2014)

* Update the elastic-agent-client to latest version. (#2061)

* [v2] Merge main as of Nov 7 (#2062)

* [Automation] Update elastic stack version to 8.5.0-6b9f92c0 for testing (#1756)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-0616acda for testing (#1760)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-dd6f2bb0 for testing (#1765)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-feb644de for testing (#1768)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-7783a03c for testing (#1776)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-17b8a62d for testing (#1780)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-9aed3b11 for testing (#1784)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-440e0896 for testing (#1788)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-fedc3e60 for testing (#1791)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-b5001a6d for testing (#1795)

Co-authored-by: apmmachine <[email protected]>

* ci: move to fleet-ci (#1199)

* Fic path to the packaging (#1806)

* Fix gcs credentials for packaging (#1807)

* [Automation] Update elastic stack version to 8.5.0-de69302b for testing (#1822)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-1bd77fc1 for testing (#1826)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-167dfc80 for testing (#1831)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-6b7dda2d for testing (#1835)

Co-authored-by: apmmachine <[email protected]>

* Allow multiple ES outputs as long as they are the same ES (#1684)

* add 'outputs' field to the ES agent schema to store the API key data and permission hash for each ES output

* add output name to API key metadata

* add v8.5 migration to migration.go

* add migration docs and improve logging

* group migration functions per version

* [Automation] Update elastic stack version to 8.5.0-4140365c for testing (#1837)

Co-authored-by: apmmachine <[email protected]>

* updating upgrade_status: completed (#1833)

* updating upgrade_status: completed

* updated schema.json and regenerated schema.go

* updated license headers

* Fix v8.5.0 migration painless script (#1839)

* fix v8.5.0 migration painless script

* [Automation] Update elastic stack version to 8.5.0-8e906f9f for testing (#1843)

Co-authored-by: apmmachine <[email protected]>

* ci: rename dra staging for release dra release staging (#1840)

* Remove events from agent checkin body. (#1842)

Remove the events attribute from the agent checkin body. Note that
removal of the attribute will not stop the server from issuing a 400 if
the response body is too long. The removal is so that the checkin code
on the fleet-server and agent remain comparable.

Co-authored-by: Blake Rouse <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-589a4a10 for testing (#1852)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-37418cf3 for testing (#1855)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-fcf3d4c2 for testing (#1862)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.5.0-c7913db3 for testing (#1868)

Co-authored-by: apmmachine <[email protected]>

* Add error detail to catch-all HTTP response (#1854)

* Make authc log debug and add cache hit field (#1870)

* Document Go 1.18 certificate change in changelog. (#1871)

* Revert "Fix v8.5.0 migration painless script" (#1878)

* Revert "Fix v8.5.0 migration painless script (#1839)"

This reverts commit de5d74b.

* Revert "Allow multiple ES outputs as long as they are the same ES (#1684)"

This reverts commit 63fdcbf.

* [Automation] Update elastic stack version to 8.5.0-56d2c52d for testing (#1880)

Co-authored-by: apmmachine <[email protected]>

* Bulk API Keys update (#1779)

Bulk API Keys update (#1779)

* Fix and reintroduce "Allow multiple ES outputs as long as they are the same ES" (#1879)

* Revert "Revert "Fix v8.5.0 migration painless script" (#1878)"
  This reverts commit ef9ca2b.

* Revert "Revert "Allow multiple ES outputs as long as they are the same ES (#1684)""
  This reverts commit bb696ac.

* avoid new API keys being marked for invalidation

Co-authored-by: Michal Pristas <[email protected]>
  He fixed the merge conflicts after Bulk API Keys update (#1779), commit 46ac14b, got merged

* [Automation] Update elastic stack version to 8.5.0-7dc445a0 for testing (#1888)

Co-authored-by: apmmachine <[email protected]>

* Update pre-sets limits to avoid overlap. (#1891)

Update file max limits and env_defaults_test.go running make defaults to generate the new one

* [Release] add-backport-next (#1892)

* Bump version to 8.6.0 (#1895)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places. (#1896)

* Catch error in waitBulkAction. Add bulk.WithRetryOnConflict(3) in multiple places.

* Add changelog entry.

* Update CHANGELOG.next.asciidoc

Co-authored-by: Craig MacKenzie <[email protected]>

Co-authored-by: Craig MacKenzie <[email protected]>

* Update apikey.cache_hit log field name to match convention (#1900)

* [Automation] Update elastic stack version to 8.6.0-21651da3 for testing (#1908)

Co-authored-by: apmmachine <[email protected]>

* LoadLimits does not override existing values (#1912)

Fleet-server will use any specified cache or server limit values over
whatever is returned by the default/agent number loader. For example, if
A max body size is specifically set to a value such as 5MB, and the
default returned by the LoadLimits is 1MB, the 5MB value is used.

* [Automation] Update elastic stack version to 8.6.0-326f84b0 for testing (#1916)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-df00693f for testing (#1925)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-a2f4f140 for testing (#1928)

Co-authored-by: apmmachine <[email protected]>

* Revert "updating upgrade_status: completed (#1833)" (#1920)

* Revert "updating upgrade_status: completed (#1833)"

This reverts commit 23be42a.

* Leaving in upgrade_status field for retry functionality

* Storing checkin message in last_checkin_message (#1932)

* Storing checkin message in last_checkin_message

* added changelog

* fixed tests

* Unique limiters for each API listener (#1904)

* Unique limiters for each API listener

Refactor the limit.Limiter so it can wrap the separate API httprouter
endpoints. Limiter.WrapX() calls take the handler and stats incrementer
for metrics/error counting. api.Run() replaced with Router.Run(), which
will generate an httprouter for each listener in order to be able to
associate the httprouter with a unique Limiter.

* Add listener address labeled logs to limiter

* Review feedback

* Apply suggestions from code review

Co-authored-by: Anderson Queiroz <[email protected]>

* review feedback

* fix import

* Fix test

Co-authored-by: Anderson Queiroz <[email protected]>

* Cleanup cmd/fleet/main.go (#1886)

* Replace cache.Config with config.Cache

* Move server setup from cmd/fleet to new pkg/server

* Move constants

* Fix imports and integration tests

* fix linter

* [Automation] Update elastic stack version to 8.6.0-158a13db for testing (#1938)

Co-authored-by: apmmachine <[email protected]>

* [8.6](forwardport) Add extra protection against accessing null fields to 8.5 migration (#1921) (#1926)

* [Automation] Update elastic stack version to 8.6.0-aea1c645 for testing (#1942)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-0fca2953 for testing (#1948)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-e4c15f15 for testing (#1954)

Co-authored-by: apmmachine <[email protected]>

* Conditional log level for api key read (#1946)

Conditional log level for api key read (#1946)

* Updated migration query to match items with deprecated field present (#1959)

Co-authored-by: Anderson Queiroz <[email protected]>

* Fix fleet.migration.total log key overlap (#1951)

Co-authored-by: Anderson Queiroz <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-d939cfde for testing (#1964)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-7c9f25a9 for testing (#1969)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-c49fac70 for testing (#1976)

Co-authored-by: apmmachine <[email protected]>

* Update to Go 1.18.7. (#1978)

* [Automation] Update elastic stack version to 8.6.0-5a8d757d for testing (#1981)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-40086bc7 for testing (#1987)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-233dc5d4 for testing (#1990)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-54a302f0 for testing (#1995)

Co-authored-by: apmmachine <[email protected]>

* Don't send POLICY_CHANGE actions retrieved from index to agent. (#1963)

* Don't send POLICY_CHANGE actions retrieved from index to agent.

The fleet-server should not send any policy change actions that are
written to the actions index to an agent on checkin. The server will
remove these actions in the convert method and emit a warning message.
The ack token that is used is not altered in this case. Policy change
actions are dynamically generated by the fleet-server when it detects
that the agent is not running an up to date version of the policy.

* move filtering to its own method

* Fix linter, tests, fix file name

* [Automation] Update elastic stack version to 8.6.0-cae815eb for testing (#2000)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-6545f2df for testing (#2005)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-055acc83 for testing (#2011)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-baf193e8 for testing (#2016)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-22d60ec9 for testing (#2020)

Co-authored-by: apmmachine <[email protected]>

* Allow upgrade action to signal retry (#1887)

* Allow upgrade action to signal retry

Allow the ack of an upgrade action to set the upgrade status to
retrying.

* fix tests set failed state

* Fix broken test

* nil upgrade status by default

* Set agent to healthy in case of upgrade failure

* fix upgrade fields

* Fix tests

* [Automation] Update elastic stack version to 8.6.0-b8b35931 for testing (#2024)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-a892f234 for testing (#2030)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Add GH action to add issues to ingest board

Issues in this repo labeled with `Team:Fleet` will be added to the ingest board automatically w/ the `Fleet Server` area.

* Update add-issues-to-ingest-board.yml

* [Automation] Update elastic stack version to 8.6.0-89d224d2 for testing (#2034)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-949a38d2 for testing (#2039)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-26dc1164 for testing (#2045)

Co-authored-by: apmmachine <[email protected]>

* Add active filter for enrollment key queries. (#2044)

* Add active filter for enrollment key queries.

Add an active: true filter to enrollment key queries. This allows
fleet-server to handle cases where there may be 10+ inactive keys
associated with a policy.

* review feedback

* fix linter

* fix tests

* Fix test cases

* [Automation] Update elastic stack version to 8.6.0-4765d2b0 for testing (#2048)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-8a615646 for testing (#2050)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-3f5f98b7 for testing (#2051)

Co-authored-by: apmmachine <[email protected]>

* [Automation] Update elastic stack version to 8.6.0-f20b7179 for testing (#2056)

Co-authored-by: apmmachine <[email protected]>

* Run mod tidy.

* Run make notice.

* Fix intergration tests.

* Run go mod tidy and make notice.

* Fix path to fleet-server.yml in integration test.

* Fix race condition.

* Fix try 2.

* Fix race.

* Fix race try 2.

Co-authored-by: apmmachine <[email protected]>
Co-authored-by: apmmachine <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Michel Laterman <[email protected]>
Co-authored-by: Josh Dover <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
Co-authored-by: Michal Pristas <[email protected]>
Co-authored-by: Julien Lind <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Kyle Pollich <[email protected]>

Co-authored-by: Aleksandr Maus <[email protected]>
Co-authored-by: Michal Pristas <[email protected]>
Co-authored-by: apmmachine <[email protected]>
Co-authored-by: apmmachine <[email protected]>
Co-authored-by: Victor Martinez <[email protected]>
Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Julia Bardi <[email protected]>
Co-authored-by: Michel Laterman <[email protected]>
Co-authored-by: Josh Dover <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
Co-authored-by: Julien Lind <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Kyle Pollich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working QA:Needs Validation Needs validation by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fleet-server does not correctly update API keys on policy output permissions change
10 participants