Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration to disable attribute enrichment #58

Merged
merged 4 commits into from
Jul 26, 2024

Conversation

lahsivjar
Copy link
Contributor

@lahsivjar lahsivjar commented Jul 22, 2024

Allows enabling/disabling enrichment of specific attributes.

Related to: https://github.com/elastic/opentelemetry-dev/issues/304

@lahsivjar lahsivjar requested a review from a team as a code owner July 22, 2024 11:13
@lahsivjar lahsivjar requested a review from axw July 22, 2024 11:14
@axw
Copy link
Member

axw commented Jul 23, 2024

What's the reason behind this change? Could it also be achieved with a subsequent transform processor?

@lahsivjar
Copy link
Contributor Author

lahsivjar commented Jul 23, 2024

What's the reason behind this change?

It was mostly driven by introducing transaction.root as a enriched attribute (draft PR). By my knowledge, the root attribute would ONLY be required in the APM-aggregation flow and a way to disable the metric would give a way to customize the attribute as required by a specific pipeline.

Could it also be achieved with a subsequent transform processor?

Hmm, that makes sense. I am overall a bit confused on what configuration should be a separate processor and what should be done within a processor. For example, in the case of elastic-traces processor (which would use this library for enrichment) there could be 2 types of configuration:

  1. Controlling the list of attributes that will be added to a trace, enabled by the current PR.
  2. Controlling the list of attributes that will be produced in the final trace which could be different than the ones used to process the batch. For example, we may process the batch with transaction.root as a bucket for the histogram but want to drop the bucket in the final metric. This is what is being discussed here.

In my opinion, 1 is a good to have as a processor configuration but 2 makes the boundaries of what the processor should do a bit confusing and should be taken care of by another processor. @axw does this reasoning make sense to you?

@axw
Copy link
Member

axw commented Jul 24, 2024

My understanding of how this enrichment works is broadly as follows, let me know if I'm off.

(trace spans) -> elastictraceprocessor -> spanmetricsconnectorv2 -> lsmintervalprocessor -> elasticsearchexporter

Then we have the following requirements:

  • we DO want a transaction.root attribute on transaction metrics
  • we DO NOT want a transaction.root attribute on any other metrics
  • we DO NOT want a transaction.root attribute on the span documents

What I'm a bit confused about is how making transaction.root optional in the enrichment (i.e. in elastictraceprocessor) helps, since I assume we would do this enrichment just once and use the result for producing different metrics. Am I mistaken?

So the alternative I'm thinking of would be:

  • always set transaction.root in enrichment
  • add a transform processor before lsmintervalprocessor that drops the attribute for non-transaction metrics
  • add a transform processor to the traces pipeline that drops the attribute after the connector

@lahsivjar
Copy link
Contributor Author

My understanding of how this enrichment works is broadly as follows, let me know if I'm off.
(trace spans) -> elastictraceprocessor -> spanmetricsconnectorv2 -> lsmintervalprocessor -> elasticsearchexporter

This looks correct to me. IIANM, in addition to this path, the elatictraceprocessor will also be required for the following:

  1. (trace spans) -> elastictraceprocessor -> elasticsearchexporter
  2. (trace spans) -> elastictraceprocessor -> elasticsearchexporter in ECS mode with translation lib

transaction.root will not be required for either of the 2 additional flows. The additional flows will basically enrich the traces with attributes required for elastic UI to work for traces.

What I'm a bit confused about is how making transaction.root optional in the enrichment (i.e. in elastictraceprocessor) helps, since I assume we would do this enrichment just once and use the result for producing different metrics. Am I mistaken?

Making the fields configurable would allow us to accommodate the pipelines to produce the required attributes for each path.

always set transaction.root in enrichment
add a transform processor before lsmintervalprocessor that drops the attribute for non-transaction metrics

I was thinking why not only set the transaction.root: true for transaction metrics (nothing required for non-transaction metric). There won't be a need of the above transform processor -- this is what the draft PR does.

add a transform processor to the traces pipeline that drops the attribute after the connector

+1 on this, we will require this bit.

@axw
Copy link
Member

axw commented Jul 25, 2024

Isn't the (trace spans) -> elastictraceprocessor part common to all of those paths though? Meaning we wouldn't have an opportunity to choose to set transaction.root for different metrics?

I'm probably missing something here - would you be able to share a collector config YAML to illustrate how you see the final pipeline being configured?

@axw
Copy link
Member

axw commented Jul 25, 2024

@lahsivjar and I chatted and he clarified that we would have multiple instances of the elastictrace processor, with each one setting a subset of attributes according to requirements. So only the one destined for producing transaction metrics would set transaction.root. TIL you can create multiple named pipelines, just like you can receivers, processors, etc.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// AttributeConfig is the configuration options for each attribute.
type AttributeConfig struct {
Disabled bool `mapstructure:"disabled"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to the collector config, I think it may be a bit easier to understand if we use enabled: true|false, with the default being enabled for all attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated a bit on this (enabled vs disabled). If we choose enabled then we would have to provide something like AllEnabledConfig that can be used in the processor and be careful when we add new attributes. That is also okay. Do you think it makes sense to use enabled all the way (here in the lib as well as in the processor)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to use enabled as the config and also provided a method to get a configuration with all enrichments enabled.

@lahsivjar lahsivjar requested a review from axw July 25, 2024 12:07
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. YMMV, but I think it may be simpler to use option args here, like:

Enrich(traces, trace.WithTransactionRoot(false))

Then keep the mapstructure config in the collector-components connector code.

@lahsivjar
Copy link
Contributor Author

@axw The issue with functional options is that any changes to the lib will require a followup in the collector components repo. My current plan is to directly embed the config here as a collector config and use the Enabled method to configure the default config, this would require no changes in the component itself.

@axw
Copy link
Member

axw commented Jul 26, 2024

I would personally rather keep the config for the connector with the connector code, and keep this code just for the business logic shared across apm-data and the components. But I don't have such a strong opinion that I'd block it.

As an aside, I think the user-facing configuration could be a little less verbose. Something like:

enabled_attributes:
  - event.outcome
  - transaction.*

And that would be translated to the more efficient structure you have. But maybe let's get this merged and follow up on what the ideal config looks like, I don't want to delay this further than I already have :)

@lahsivjar lahsivjar merged commit d024baa into elastic:main Jul 26, 2024
3 checks passed
@lahsivjar lahsivjar deleted the config-enrich branch July 26, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants