Introduce direct agent configuration #5018

axw · 2021-03-26T10:46:01Z

As part of the move to Fleet we will need to move away from fetching agent config directly from Kibana, as the privileges APM Server is given do not cover this. Instead, agent config will be pushed down to APM Server via the server's policy.

In order to make the above possible, we will add configuration to APM Server to directly specify agent config. This will include the criteria, config settings, and an optional Etag value. For example:

apm-server:
  agent_config:
  - service.name: ten_percent
    config:
      transaction_sample_rate: 0.1
  - service.environment: production
    etag: abc123 # optional, computed by APM Server if not specified
    config:
      transaction_sample_rate: 0.5

If the "etag" property is not set for a config block, the server will compute one by hashing the settings. The property is configurable so that Kibana can inject a value. When creating/updating agent config in Kibana, it will take responsibility for calculating an Etag value and injecting it into the server config; Kibana will later use this for identifying whether config has been applied (more below).

Configuring a list of agent config will disable fetching agent config from Kibana.

Currently when APM Server queries Kibana for agent config, Kibana will mark config as having been applied when the provided Etag is found to be current. As we will no longer be communicating with Kibana, we'll need to write documents to Elasticsearch instead. The server will track whether in a config query from an agent, the agent supplies the current Etag (If-None-Match) for the config block that matches its criteria. When this is first true, the server will index a document including the Etag value indicating that the config has been applied.

Alternatively, we could have agents periodically send an internal agent statistics event which includes the current config Etag value. This would have the benefit of enabling us to see how many and which agents have applied the config.

felixbarny · 2021-03-26T12:00:24Z

Alternatively, we could have agents periodically send an internal agent statistics event which includes the current config Etag value. This would have the benefit of enabling us to see how many and which agents have applied the config.

I like that idea but I'd suggest treating that as an enhancement and not couple it to the initial implementation. This also assures we're backward compatible with older agents. All agents will need to add an ephemeral_id (currently only the Java agent does that IIRC).
Alternatively, APM Server could generate that event whenever it receives a config request from an agent which would require fewer changes in the agents (ephemeral_id would probably still be required (or we could auto-populate the ephemeral_id by hashing the metadata)).

To show the count of agents that have successfully applied the config, Kibana could query for the unique count of ephemeral_ids for a given etag within the last 1-5 minutes. It should not look at all the data as a single service that frequently restarts would skew that number. Also, restricting the time range should lead to faster queries.

jalvz · 2021-03-26T13:45:38Z

as the privileges APM Server is given do not cover this

Since that problem needs a solution either way, what other benefits / drawbacks this approach has in your opinion?

axw · 2021-03-27T01:53:30Z

I like that idea but I'd suggest treating that as an enhancement and not couple it to the initial implementation.

Yes, I think that's a good call @felixbarny. This shouldn't be too hard. APM Server can index a document (not sure what kind, maybe metric-apm.internal-*?), and later on we can either extend the docs to include a count or index individual agent metric docs which identify the config. I wonder if we shouldn't use service node name instead of ephemeral ID, but let's debate that later...

as the privileges APM Server is given do not cover this

Since that problem needs a solution either way, what other benefits / drawbacks this approach has in your opinion?

@jalvz these are the main ones that come to mind:

not requiring a connection to Kibana means a simpler deployment. Not compared to what we have today, but in the future: Fleet Server and Elasticsearch will be a requirement for APM Server, but if we stick with the current approach then the connection to Kibana will only be required for central config. Overall there's a desire within Fleet to not require any access from integrations to Kibana.
minimal privileges. Querying Kibana for central config requires assigning Kibana-specific privileges to APM Server. AFAIK it's only planned to allow Elasticsearch privileges for integrations, and not Kibana/application privileges. Giving Kibana privileges to APM Server means it has more privileges than just querying central config, which isn't ideal.
removes caching and possibility of request storms. If an agent queries the server for config, the server can respond without making any external queries as it will have complete knowledge of all agent config defined. This eliminates the possibility of cache busting and request storms due to many unique service name/environment combinations. See also agentcfg: reduce number of Kibana queries by caching rules #4350
defining agent config directly enables more use cases. By enabling users to define agent config directly in apm-server.yml, we can enable additional infrastructure-as-code type use cases (e.g. define agent config in a Kubernetes manifest). This is not a goal at the moment, but I think it speaks to the loosely-coupled approach.

jalvz · 2021-03-29T06:27:05Z

great, thanks @axw

simitt · 2021-03-29T08:09:46Z

I think that @axw's proposal also allowed to move the agent config to the Fleet app instead of the APM app, if we decide this is where it should live long term. The APM Server would just always receive the config values from the Fleet configuration.

simitt · 2021-04-06T12:19:06Z

When adding apm-server.agent_config.* to the supported configuration options, users could theoretically configure them directly in the Fleet integration UI. While this might be the long term solution, as long as the values are also overwritten by changes from the APM app, the Fleet UI config options should be read only.
Fleet integrations will support to allowlist/blocklist configuration settings in a future version (elastic/kibana#96319). Whenever a config option is blocked, it should either be shown as read-only or not at all (TBD).

The problem with allowing to configure options from within the APM and the Fleet app are triggered by the superuser privilege issues described in elastic/kibana#95501 (comment).:
Whenever a non-superuser updates something, it cannot be directly passed to the apm integration due to lack of privileges. Therefore, whenever settings are passed from a Kibana superuser, all settings need to be passed to the integration policy. This makes it impossible to identify which of the config options in the APM integration policy were manually overwritten, and which are simply outdated and need to be updated.

axw added enhancement v7.14.0 labels Mar 26, 2021

This was referenced Mar 26, 2021

[APM] Inject agent config directly into APM Fleet policies elastic/kibana#95501

Closed

[meta] APM Server managed by Elastic Agent with Fleet (GA) #4636

Closed

[Fleet] Support for APM Agent Central Config with zero configuration #4573

Closed

axw mentioned this issue Mar 27, 2021

agentcfg: reduce number of Kibana queries by caching rules #4350

Closed

axw added the fleet label Mar 31, 2021

axw mentioned this issue Mar 31, 2021

Introduce support for hot-reloading config without restarting #5039

Closed

axw mentioned this issue Apr 14, 2021

Feature: enable agent config to be defined in APM Server config #4929

Closed

axw added the [zube]: Ready label Apr 14, 2021

stuartnelson3 self-assigned this Apr 27, 2021

stuartnelson3 added [zube]: In Progress and removed [zube]: Ready labels Apr 27, 2021

stuartnelson3 mentioned this issue Apr 28, 2021

Direct agent configuration #5177

Merged

2 tasks

stuartnelson3 closed this as completed in #5177 May 20, 2021

zube bot added [zube]: Done and removed [zube]: In Progress labels May 20, 2021

axw added this to the 7.14 milestone May 30, 2021

simitt mentioned this issue Jun 1, 2021

[meta] Elastic Agent APM Integration with Fleet [7.14] #5388

Closed

20 tasks

zube bot removed the [zube]: Done label Jun 2, 2021

axw mentioned this issue Jun 17, 2021

Periodically report applied agent configuration #5470

Closed

gbamparop mentioned this issue May 9, 2022

[APM] agent config applied indicator does not work with Fleet integration elastic/kibana#123109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce direct agent configuration #5018

Introduce direct agent configuration #5018

axw commented Mar 26, 2021

felixbarny commented Mar 26, 2021

jalvz commented Mar 26, 2021

axw commented Mar 27, 2021 •

edited

Loading

jalvz commented Mar 29, 2021

simitt commented Mar 29, 2021

simitt commented Apr 6, 2021 •

edited

Loading

Introduce direct agent configuration #5018

Introduce direct agent configuration #5018

Comments

axw commented Mar 26, 2021

felixbarny commented Mar 26, 2021

jalvz commented Mar 26, 2021

axw commented Mar 27, 2021 • edited Loading

jalvz commented Mar 29, 2021

simitt commented Mar 29, 2021

simitt commented Apr 6, 2021 • edited Loading

axw commented Mar 27, 2021 •

edited

Loading

simitt commented Apr 6, 2021 •

edited

Loading