Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce direct agent configuration #5018

Closed
axw opened this issue Mar 26, 2021 · 6 comments · Fixed by #5177
Closed

Introduce direct agent configuration #5018

axw opened this issue Mar 26, 2021 · 6 comments · Fixed by #5177

Comments

@axw
Copy link
Member

axw commented Mar 26, 2021

As part of the move to Fleet we will need to move away from fetching agent config directly from Kibana, as the privileges APM Server is given do not cover this. Instead, agent config will be pushed down to APM Server via the server's policy.

In order to make the above possible, we will add configuration to APM Server to directly specify agent config. This will include the criteria, config settings, and an optional Etag value. For example:

apm-server:
  agent_config:
  - service.name: ten_percent
    config:
      transaction_sample_rate: 0.1
  - service.environment: production
    etag: abc123 # optional, computed by APM Server if not specified
    config:
      transaction_sample_rate: 0.5

If the "etag" property is not set for a config block, the server will compute one by hashing the settings. The property is configurable so that Kibana can inject a value. When creating/updating agent config in Kibana, it will take responsibility for calculating an Etag value and injecting it into the server config; Kibana will later use this for identifying whether config has been applied (more below).

Configuring a list of agent config will disable fetching agent config from Kibana.

Currently when APM Server queries Kibana for agent config, Kibana will mark config as having been applied when the provided Etag is found to be current. As we will no longer be communicating with Kibana, we'll need to write documents to Elasticsearch instead. The server will track whether in a config query from an agent, the agent supplies the current Etag (If-None-Match) for the config block that matches its criteria. When this is first true, the server will index a document including the Etag value indicating that the config has been applied.

Alternatively, we could have agents periodically send an internal agent statistics event which includes the current config Etag value. This would have the benefit of enabling us to see how many and which agents have applied the config.

@felixbarny
Copy link
Member

Alternatively, we could have agents periodically send an internal agent statistics event which includes the current config Etag value. This would have the benefit of enabling us to see how many and which agents have applied the config.

I like that idea but I'd suggest treating that as an enhancement and not couple it to the initial implementation. This also assures we're backward compatible with older agents. All agents will need to add an ephemeral_id (currently only the Java agent does that IIRC).
Alternatively, APM Server could generate that event whenever it receives a config request from an agent which would require fewer changes in the agents (ephemeral_id would probably still be required (or we could auto-populate the ephemeral_id by hashing the metadata)).

To show the count of agents that have successfully applied the config, Kibana could query for the unique count of ephemeral_ids for a given etag within the last 1-5 minutes. It should not look at all the data as a single service that frequently restarts would skew that number. Also, restricting the time range should lead to faster queries.

@jalvz
Copy link
Contributor

jalvz commented Mar 26, 2021

as the privileges APM Server is given do not cover this

Since that problem needs a solution either way, what other benefits / drawbacks this approach has in your opinion?

@axw
Copy link
Member Author

axw commented Mar 27, 2021

I like that idea but I'd suggest treating that as an enhancement and not couple it to the initial implementation.

Yes, I think that's a good call @felixbarny. This shouldn't be too hard. APM Server can index a document (not sure what kind, maybe metric-apm.internal-*?), and later on we can either extend the docs to include a count or index individual agent metric docs which identify the config. I wonder if we shouldn't use service node name instead of ephemeral ID, but let's debate that later...

as the privileges APM Server is given do not cover this

Since that problem needs a solution either way, what other benefits / drawbacks this approach has in your opinion?

@jalvz these are the main ones that come to mind:

  • not requiring a connection to Kibana means a simpler deployment. Not compared to what we have today, but in the future: Fleet Server and Elasticsearch will be a requirement for APM Server, but if we stick with the current approach then the connection to Kibana will only be required for central config. Overall there's a desire within Fleet to not require any access from integrations to Kibana.
  • minimal privileges. Querying Kibana for central config requires assigning Kibana-specific privileges to APM Server. AFAIK it's only planned to allow Elasticsearch privileges for integrations, and not Kibana/application privileges. Giving Kibana privileges to APM Server means it has more privileges than just querying central config, which isn't ideal.
  • removes caching and possibility of request storms. If an agent queries the server for config, the server can respond without making any external queries as it will have complete knowledge of all agent config defined. This eliminates the possibility of cache busting and request storms due to many unique service name/environment combinations. See also agentcfg: reduce number of Kibana queries by caching rules #4350
  • defining agent config directly enables more use cases. By enabling users to define agent config directly in apm-server.yml, we can enable additional infrastructure-as-code type use cases (e.g. define agent config in a Kubernetes manifest). This is not a goal at the moment, but I think it speaks to the loosely-coupled approach.

@jalvz
Copy link
Contributor

jalvz commented Mar 29, 2021

great, thanks @axw

@simitt
Copy link
Contributor

simitt commented Mar 29, 2021

I think that @axw's proposal also allowed to move the agent config to the Fleet app instead of the APM app, if we decide this is where it should live long term. The APM Server would just always receive the config values from the Fleet configuration.

@simitt
Copy link
Contributor

simitt commented Apr 6, 2021

When adding apm-server.agent_config.* to the supported configuration options, users could theoretically configure them directly in the Fleet integration UI. While this might be the long term solution, as long as the values are also overwritten by changes from the APM app, the Fleet UI config options should be read only.
Fleet integrations will support to allowlist/blocklist configuration settings in a future version (elastic/kibana#96319). Whenever a config option is blocked, it should either be shown as read-only or not at all (TBD).

The problem with allowing to configure options from within the APM and the Fleet app are triggered by the superuser privilege issues described in elastic/kibana#95501 (comment).:
Whenever a non-superuser updates something, it cannot be directly passed to the apm integration due to lack of privileges. Therefore, whenever settings are passed from a Kibana superuser, all settings need to be passed to the integration policy. This makes it impossible to identify which of the config options in the APM integration policy were manually overwritten, and which are simply outdated and need to be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants