-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Processor Proposal #560
Comments
@anniefu thanks for writing the proposal. Couple of questions/comments
|
@anniefu thank you for the detailed description. This would be a very reasonable proposal if we did not already have the include/exclude filters in existing processors. Unfortunately because they exist we have to take that into account and aim for uniform configuration syntax across all processors in this repo.
This is true, but I do not think it is significant enough to warrant introducing a new way of filtering.
This can be done by defining multiple filter processors, each specifying a pair. Processors chaining is already a feature.
It is possible to specify different match types for each processor. This may be less flexible than allowing match type per property but probably good enough for simple cases. Generally, the declarative approach to specifying filters is going to remain limited one way or another unless we end up re-creating an SQL-like language that is turing-complete. My view on how the filters can be made more powerful and flexible is to introduce a new match type "eval", which allows to specify arbitrary expression (e.g. using something like https://github.com/antonmedv/expr or similar). This would also answer the question raised by @rghetia - AND and OR would be explicit operators in the eval expression (e.g. Implementing "eval" in a way that is performant is non-trivial though and will require a significant upfront design and thought to do correctly. How I would suggest to move forward with this proposal:
processors:
filter/allow-auth:
include:
match_type: regexp
services: ["auth.*", "login.*"]
filter/drop-authenticator:
exclude:
match_type: strict
services: ["authenticator"]
service:
pipelines:
traces:
processors: [filter/allow-auth, filter/drop-authenticator] Note how you do not have to use both include and exclude in each processor and how using only one allows you to control the order of their application.
|
Thanks for the quick feedback, everyone! Full disclosure, I have not worked with traces/spans much :) @rghetia I meant filter spans instead of traces to be parallel with filtering the metrics array. I'm not very familiar with traces/spans, but it seems like being able to filter by span name or perhaps Resource just to reduce unnecessary processing could be worth implementing. On the AND/OR, to match the existing attribute/span processor, it would be an OR between different metric/span properties. @rghetia, @annanay25 do you think it's worth implementing a filterprocessor for spans or should that just be omitted and left to the tailsamplingprocessor? (I don't think I would be the one to implement that). @tigrannajaryan Good point that different match types can be achieved already by chaining. I agree that having two filter configs is not ideal, so I am happy to match the filter format of the attribute/span processor. Knative's use case is actually very simple, just one include filter of strict matches. But for consistency, I'll implement the components necessary to match the full attribute/span filter config. I thought I would put this one out there just in case we were willing to diverge between processor, since the mutating nature of attribute/span processors have a few more restrictions to meet than one that's just dropping entire metrics/spans. The eval match type would be very powerful, but I agree with you that it is unlikely that most users would need such flexibility, Based off the feedback, I will update the design proposal:
|
@anniefu I think a filter for individual spans can still be useful. For example there may be a particularly chatty service or part of service that generates large volume of useless or even incorrect spans (as a result of a bug) that we want to suppress until the bug is fixed.
Let's see the modified proposal here before you proceed to implementation. |
Proposal updated, please take another look when you get a chance. Thanks! |
Bump on this. I'm happy to get the implementation done as soon as we reach a consensus. |
@anniefu Thank you for updating, sorry for the delay. A few questions: Q1.
Is this how we currently specify attributes in include/exclude filters? The syntax appears a bit different (perhaps it denotes the same thing in YAML, I am not sure). Q2. No support to filter metrics by attributes/labels? This is fine, we can add it later, just double checking. Q3.
Can you clarify how the cache is intended to work? Q4.
Why is |
@tigrannajaryan No worries, thanks for taking another look!
I believe so (I just copy-pasted that part from the processor README).
I think first iteration will just be setting up the interfaces/helper classes and the name field, which should set an example for adding more fields. The amount of code/test for that will already be large, so I am not planning on including attributes/labels at this time.
The cache will key off the string field being compared to (like name) and return true/false to whether it previously matched the filter. That way the cost of a regex match is replaced with the cost of a dictionary lookup. This might be an over optimization, so it is a setting that can be used at will. I do know for Knative we have a fixed set of metric names that will be repeatedly used, so it's unnecessary to repeatedly regexp match the same names and the cache of unique metric names will be relatively small (maybe 1k strings).
Oh, this is because I was not planning on implementing support for serviceName yet, and it would just be dead code. I guess we have a 1:1 interface:implementation match here, so maybe the best option is to remove the interface and just call MatchMetric implementation directly. That way in the future when people want to add more match fields, they don't have to update the interface AND the implementation stub. We probably don't want different processors implementing different MatchMetric's anyway as that would lead to inconsistency in behavior across processors. |
Any additional thoughts or concern? I would like to get an implementation out before I get caught up in other projects :) |
Very nice initiative! I'm looking exactly this functionality in order to setup several pipeline that needs to drop spans based on an attribute on them. |
@tigrannajaryan I working on composite policy as discussed earlier. Looks like the functionality proposed here overlaps with composite policy. |
@anniefu sorry for the delay. The proposal mostly looks good to me. Let's go ahead with implementation and we can discuss/decide the rest of the details in the PR. |
@ConverJens Great! I'm only planning on setting up the interfaces and name filtering right now, so maybe adding attributes filtering for spans would be something you'd be interested in adding after? I will CC you on the PRs to keep you in the loop. @vikrambe @tigrannajaryan it does seem like there's some overlap in configuration with the composite tail filtering proposal, but functionally it is different enough that I don't see a problem with both of them existing. I'm planning to split the PRs into smaller chunks as follows:
|
@anniefu Sounds good! Let me know when you get going! @vikrambe @tigrannajaryan I agree with @anniefu that the functionality is different enough to warrant both processors. |
…#597) Add filterset helper package to internal/processor that can be used by processors to filter metrics and spans by string properties. import "github.com/open-telemetry/opentelemetry-collector/internal/processor/filterset" This adds two types of filtersets - regexp: filter strings using https://golang.org/pkg/regexp/ patterns - strict: filter strings using exact string matches Link to tracking Issue: First part of Filter Processor Proposal #560, though the package can be used by other processors too This can be used for the MatchSpan implementation and the upcoming MatchMetric function. Testing: Unit tests Documentation: None, no public surface changes yet
Sorry for the scope creep, but I'll throw this out there. I'd like to apply filtering to span events as well as just spans. I'm playing with the idea of application loggers logging to OpenTracing span logs as a way to unify tracing and logging, and take advantage of trace sampling decisions. I want to apply the concept of enabling certain log levels to span events using this filter, e.g. filter out all |
@william-tran I'm more familiar with metrics than spans, but I will put out traceprocessor that filters spans as part of completing this proposal. My intent is to use the MatchSpan implementation, so the processor will be able to filter on anything that implementation can match on. I will be updating MatchSpan to consolidate the implementation with the MatchMetric one proposed in this proposal, but I wasn't personally intending on adding additional span matching properties. But in general, if the MatchSpan implementation is updated to work with span events too, the filterprocessor will pick up the ability to filter on that too. |
Add metric helper package to internal/process that adds a MatchMetric function that can be used by processors to filter metrics by metric name. MatchMetric can be extended to match on additional metric properties down the line. `import "github.com/open-telemetry/opentelemetry-collector/internal/processor/metric"` In addition - Add a Factory class for creating FilterSets easily - Add "testutils/configtestutils" to help with testing yaml configs of subpackages **Link to tracking Issue:** Second part of Filter Processor Proposal #560, though the package can be used by other processors too **Testing:** Unit tests
Hello, I have not forgotten about this, just got a bit busy with other projects. I'll have a filter metrics processor PR out sometime this week. |
I'm a bit late to the party but would make one UX suggestion for this. Would suggest not having match_type and have it based on the match text instead. For example: metrics:
# include and/or exclude can be specified. However, the include properties
# are always checked before the exclude properties.
exclude:
# Configuration section for any additional filtering options for
# a particular match_type. The match_type name is the name of the section.
# This is an optional field.
{strict, regexp}:
< see "match configuration" below >
# The metric name must match at least one of the items.
# This is an optional field.
meric_names: [
"host/cpu/*", # glob matching
"host/cpu/usage", # exact matching
"/cpu/", # regex matching
"!host/cpu/total" # inverse match so that host/cpu/total isn't dropped
] Many times a regex isn't needed and a glob will suffice. This allows the user to use whichever form of matching is more convenient for each entry. |
I've made a dedicated proposal for this here: #1081 |
I would like that rules (also in other processors) had an enable attribute (true by default) so configuration can be easily modified by turning on/off different rules and not deleting/inserting them. This would allow for quick and flexible reconfiguration when needed, without losing the declarations of rules that are not temporarily being removed. The true by default value would make it backwards compatible with existing config files while allowing for the new feature to be used. |
…metry#640) Add metric helper package to internal/process that adds a MatchMetric function that can be used by processors to filter metrics by metric name. MatchMetric can be extended to match on additional metric properties down the line. `import "github.com/open-telemetry/opentelemetry-collector/internal/processor/metric"` In addition - Add a Factory class for creating FilterSets easily - Add "testutils/configtestutils" to help with testing yaml configs of subpackages **Link to tracking Issue:** Second part of Filter Processor Proposal open-telemetry#560, though the package can be used by other processors too **Testing:** Unit tests
The MatchSpan interface is used by the attributes/span processor, however this change does not change the yaml configuration or behavior, just implementation. FilterSet has the same filtering logic as MatchSpan previously had, but operates more generically on just strings instead of spans and adds some additional options, such as caching. **Link to tracking Issue:** Side effect of open-telemetry#560 to help keep metrics and trace processors using matching configs the same. **Testing:** unit tests updated, no new behavior, just implementation swapout
Given that both the attribute and span processor can filter spans already as is documented, do we see it as valuable to add the same functionality for spans to the filterprocessor? As previously discussed, filtering entire traces is a bit more complex and would need design that is out of scope of the filterprocessor proposed here. |
@anniefu what is the latest on this? |
This is complete and available to use. For metrics, it's the filter processor. For spans, it's the attribute or span processor. |
Is it possible to drop spans without setting any attribute/span mutations? I tried the following and it errors:
The error is:
Am I doing something wrong? My hope is to drop these traces/spans altogether. |
@johanbrandhorst Any luck with dropping spans? I have a similar use case and not able to drop spans.
the error observed is
|
Nope, not had any luck. |
@sribalakumar @johanbrandhorst Any updates regarding the above? I am facing the same issue. |
I still don't have a solution for this. |
hi @anniefu and @tigrannajaryan
What i understand from this is its only work for processors. But i didn't see any drop-like action for attribute or span processor. Can you give an example to how to do this ? |
Looking at the attributes processor and span processor, the The tail sampling processor adopts an inclusion filter approach, based on span attributes and does not support exclusion at the moment. There are also plans to deprecate the processor. The filterprocessor currently only supports metrics, and it does have the desired behaviour of dropping metrics; just not spans. What I understand from this #560 (comment) is that it would still be useful to have a span filter. If so, I feel it would make most sense to have that logic within the filterprocessor (so it supports filtering both metrics and spans). What do folks think @tigrannajaryan @anniefu ? I'd be happy to contribute to this effort if you're both happy to go ahead with this proposal. |
* Create MeterImpl interface * Checkpoint w/ sdk.go building * Checkpoint working on global * api/global builds (test fails) * Test fix * All tests pass * Comments * Add two tests * Comments and uncomment tests * Precommit part 1 * Still working on tests * Lint * Add a test and a TODO * Cleanup * Lint * Interface()->Implementation() * Apply some feedback * From feedback * (A)Synchronous -> (A)Sync * Add a missing comment * Apply suggestions from code review Co-Authored-By: Krzesimir Nowak <[email protected]> * Rename a variable Co-authored-by: Krzesimir Nowak <[email protected]>
… 0.30.0 (open-telemetry#560) * Fix invalid struct tag in resource metrics * add integration-vet make target for breaking change detection * Adopt 0.30 in integration tests
Filter Processor
A processor for dropping (AKA filtering) metrics and spans from the Collector pipeline based on specified string properties, for example, metric name or span name.
Motivation
From #297, "Processing unnecessary metrics causes undue burden on resources in otelsvc pipeline. It also increase the backend cost."
A filter processor combined with multiple otelvc pipeline's also enables increased versatility from a single collector service. Each pipeline can filter for a certain subset of metrics and apply very different processing on those subsets. This is a use case for Knative, which aims to support multiple types of metrics from both knative/serving and knative/eventing with the same collector service.
Proposed Configuration
Intent
match configuration
Depending on the
match_type
, the optional additional filtering configuration are:Implementation Interface
Goals
Metric Interface
Similar to the attribute/span processor interface, MatchSpan, a MatchMetric interface will be implemented so that different MetricProcessor's can re-use the same filtering logic for consistency.
Shared Interface
Both the MatchSpan implementation and the MatchMetric implementation can use the same implementation of the FilterSet interface to match strings properties.
The Span/Metric implementations can maintain one FilterSet per property to filter against. To add new string properties for filtering, a new FilterSet is added.
MatchSpan and MatchMetric implementor's handle parsing the span or metric data for the string input to FilterSet.Matches(string). This helps keep metric & trace specific logic independent while sharing the core filtering logic.
To add a new string match pattern, implement a new FilterSet and add it to the FilterType enum. Add fields for
match_type
specific configuration to the config yaml as appropriate.The text was updated successfully, but these errors were encountered: