Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate Limit Processor #35204

Open
juergen-kaiser-by opened this issue Sep 16, 2024 · 14 comments
Open

Rate Limit Processor #35204

juergen-kaiser-by opened this issue Sep 16, 2024 · 14 comments
Labels
Accepted Component New component has been sponsored enhancement New feature or request

Comments

@juergen-kaiser-by
Copy link

juergen-kaiser-by commented Sep 16, 2024

Component(s)

No response

Is your feature request related to a problem? Please describe.

We run an observability backend (Elasticsearch) shared by many teams and services (thousands). The services run in kubernetes clusters and we want to collect the logs of all pods.

Problem: If a service/pod becomes very noisy for some reason, it can burden the backend so much that all of other teams feel it. In short: One team can ruin the day for all others.

We would like to limit the effect a single instance or service can have on the observability backend.

Describe the solution you'd like

  • There should be a method to limit the flow of logs based on some atributes. We apply a standardized set of labels to each Kubernetes deployment, so we would work with those.
  • Dropping log lines would be okay (for us) since we can not assume that a noisy service stops being noisy soon. Sampling is not okay because of the nature of logs.
  • If rate limiting happens then we and our users should be able to see it.
    • A single metric ingested into some (other) pipeline would suffice.
    • Let us choose a set of attributes the metric should have (copied) from the rate limited logs so that we can map it to the affected service. In our case, we would like to have our standard pod labels copied to the metric.

Considering the points above, we think that there should be a processor for this.

We have no requirements regarding the algorithm backing the rate limiting. It seems that a token bucket filter (example blog entry) is a reasonable choice here.

Describe alternatives you've considered

Rate Limiter in Receivers

Receiver rate limiting is okay if you only work with attributes available in the receivers. In our case, those are insufficient because we need pod labels. As a workaround, we could inject them into the collector config as env variables and focus the collector on a single pod by deploying it as a sidecar. However, a sidecar deployment consumes too much resources across all pods because we have large clusters (>= 10K pods).

A benefit of a rate limiting in receivers could give collector users to choose between dropping incoming telemetry and just not receive it, effectively creating backpressure.

Rate limiting in Receivers is discussed in #6908.

Rate Limiter as Extension

We lack knowledge about how extensions work internally to say anything about it. Rate limiting as extensions also is discussed in #6908.

Additional context

@juergen-kaiser-by juergen-kaiser-by added enhancement New feature or request needs triage New item requiring triage labels Sep 16, 2024
@juergen-kaiser-by juergen-kaiser-by changed the title Rate Limit based on Atributes Rate Limit Processor Sep 16, 2024
@atoulme
Copy link
Contributor

atoulme commented Sep 17, 2024

Can you explain why sampling doesn't work for you here? What would the configuration of the processor look like?

@juergen-kaiser-by
Copy link
Author

juergen-kaiser-by commented Sep 18, 2024

The sampling existing today does not work because it always is enabled. We would like to only limit too noisy pods/services. Note that a service suddenly can become noisy, so we do not know the beforehand.

Once noisyness is detected, there are different options how to limit the flow. Whether we need sampling or some other rate limiting algorithm boils down to how useful the output is afterwards. For logs, we think that larger chunks of consecutive log lines are more useful than smaller ones (think about a samples stack trace vs. having a full one), therefore the limiting algorithm should create large once before starting to drop log lines. A simple sampling creates small chunks.

That being said, I do not see why it should not be possible to specify different algorithms in the long run. For the other telemetry types, other algorithms are better.

I do not have a good example configuration, yet. Let me think about that.

@juergen-kaiser-by
Copy link
Author

As a first draft:

#[...]

processors:
  ratelimit:

    # defines the rate limiting for log signals. Idea for the structure is borrowed from the transformprocessor. We can have own sections for log, metric, and trace if we want this processor to be generic to all types.
    log:

      # grouping defines wheter and how to group the telemetry by a set of attributes.
      grouping:

        #   mode defines how to deal with logs/metrics/traces not not falling into one of the defined groups (not having the fields)
        #   - strict (default): nonexisting fields are treated as if they do (having a default value) => always rate limit. Fall back to the last group if existent.
        #   - relaxed: rate limiting applies only if all fields are present
        mode: strict

        groups:
        - tbf: # used rate limiting algorithm. Others are possible but only one must be set. (tbf = token bucket filter)
            average_rate_per_sec: 1000
            max_burst_per_sec: 5000

          # defines the list of attributes to use for grouping.
          attributes:
          - context: resource
            attribute: k8s.pod.uid
          - context: resource
            attribute: k8s.pod.tag.my-project-name

#[...]

The structures allows user to:

  • define rate limiting for different signal types
  • define different rate limiting strategies
  • define multiple groups

We could also add conditions to the group attributes to enable users to define more specialized groups.

@axw
Copy link
Contributor

axw commented Oct 11, 2024

Receiver rate limiting is okay if you only work with attributes available in the receivers. In our case, those are insufficient because we need pod labels. As a workaround, we could inject them into the collector config as env variables and focus the collector on a single pod by deploying it as a sidecar. However, a sidecar deployment consumes too much resources across all pods because we have large clusters (>= 10K pods).

Do you have any special authn/z requirements? If not, an option that may work for you is Kubernetes service account tokens in conjuction with the oidcauthextension.

Service account tokens are OIDC-compatible, and their claims contain the namespace and service account name, among other things. They do not contain pod labels - maybe you can enforce service account names for your customers?

If all of those stars align, then you could extract those claims and use them as keys for a rate limiter extension...

Rate limiting in Receivers is discussed in #6908.

We have built a distributed rate limiter extension for internal purposes, and we are planning to offer it to the contrib repo. The extension is currently rate limiting in the exporter, but it would be fairly straightforward to update it to also rate limit in receivers, and it's our plan to do so.

@juergen-kaiser-by
Copy link
Author

In our case, we consume logs generated by pods/containers in a Kubernetes clusters. Those logs are stored in files by Kubernetes and are mounted into the otel collector's container for consumption. There is no authentication involved in this path (in my understanding).

If you need some unique key for the "entity" which should be rate limited then we could work with the pod id or container id. However, those must be extracted first from the log file name. If the extension you mention works with the output of a receiver then this can work. However, it would not allow us to get useful notifications/metrics telling us about rate limited services. For that, we need more information than just the pod/container id.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 12, 2024
@axw
Copy link
Contributor

axw commented Oct 14, 2024

@juergen-kaiser-by I see, thanks for elaborating. The extension we have built is based on the auth framework (not the ideal interface, but it works) which would not fit with filelog anyway. The core functionality could also be extracted into a processor.

I think creating back pressure on the producer is ideal, which makes rate limiting on resource/record attributes tricky: there are potentially multiple resources per OTLP batch, so rate limiting on one resource may cause back pressure for the other. I think that probably doesn't apply in your case, I'm just thinking about whether it would lead to surprising behaviour for others.

Another option would be to allow the processor to only rate limit based on batch/request-level metadata. For example you might use this to rate limit a batch of data based on some HTTP header (e.g. a tenant ID), if you were using it with the OTLP receiver. Then you would find a way to get the metadata you need (i.e. pod labels) in there. We could potentially extend receivercreator to set this metadata based on pod labels. Something like:

extensions:
  k8s_observer:

processors:
  ratelimiter:
    metadata_keys: [tenant]
    # other config ...

receivers:
  receiver_creator:
    watch_observers: [k8s_observer]
    receivers:
      filelog:
        rule: type == "pod.container"
        metadata:
          - key: tenant
            value: `pod.labels["tenant"]`
        config:
          include:
            - /var/log/pods/`pod.namespace`_`pod.name`_`pod.uid`/`container_name`/*.log
          include_file_name: false
          include_file_path: true
          operators:
            - id: container-parser
              type: container

@juergen-kaiser-by
Copy link
Author

I think creating back pressure on the producer is ideal, which makes rate limiting on resource/record attributes tricky: there are potentially multiple resources per OTLP batch, so rate limiting on one resource may cause back pressure for the other. I think that probably doesn't apply in your case, I'm just thinking about whether it would lead to surprising behaviour for others.

Please remember that we would like to be able to see when log messages get dropped.

If you work with backpressure, then it is up to the previous instances to decide what to do if there is too much telemetry. In our case, the source of the log messages can be seen as the logging service or Kubernetes which stores the logs into files. Obviously, we do not want to slow down the service. Kubernetes itself does not care whether you consumed the logs. It does not wait for you and will rollover and delete the log messages as necessary. Consequently, our only chance to get information about dropped logs is to get them into the collector and drop the messages there as necessary.

@axw
Copy link
Contributor

axw commented Oct 16, 2024

@juergen-kaiser-by makes sense. I think the ratelimit processor could have configuration to either return an error, silently discard, or create backpressure. In your case could configure it to silently discard the data, and the processor would record metrics about rate limiting. I think that aligns with the solution you originally described.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Dec 16, 2024
@axw
Copy link
Contributor

axw commented Dec 16, 2024

We're in the process of open sourcing a rate limiter processor that we've been using internally: elastic/opentelemetry-collector-components#247

Eventually we intend to propose adding it opentelemetry-collector-contrib, that repo is just an interim home.

@dmitryax
Copy link
Member

dmitryax commented Dec 18, 2024

Discussed the PR during the Collector SIG call on Dec 17 and agreed on:

I can sponsor this component

@dmitryax dmitryax added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Dec 18, 2024
@anuraj381
Copy link

Here is the proposal for the config for ratelimit processor, please review -

Example Config

processors:
    ratelimit:
        logs:
            conditions:
                - 'attributes["log_level"] == "error"'
                - 'resource.attributes["k8s.namespace.name"] == "my-k8s-ns-name"'
            allowed_rate: 30000
            interval: 60s
            rate_limit_type: 'BACKPRESSURE'
            rate_limit_fields: 
                - resource.k8s\.container\.name
        traces:
            conditions:
                - 'attributes["log_level"] == "error"'
                - 'resource.attributes["k8s.namespace.name"] == "my-k8s-ns-name"'
            allowed_rate: 80000
            interval: 120s
            rate_limit_type: 'DROP'
            rate_limit_fields: 
                - resource.service_name
        metrics:
            conditions:
                - 'attributes["log_level"] == "error"'
                - 'resource.attributes["k8s.namespace.name"] == "my-k8s-ns-name"'
            allowed_rate: 30000
            interval: 60s
            rate_limit_type: 'BACKPRESSURE'
            rate_limit_fields: 
                - resource.k8s\.namespace\.name

Description of fields

Field Type Default Description
conditions []string [] A slice of OTTL expressions used to evaluate which logs records to be rate-limited. See OTTL Boolean Expressions for more details.
allowed_rate int 30000 Allowed rate of logs/metrics/traces per combination of rate_limit_fields in configured interval
interval duration 60s The interval in which rate-limit is applied after data crosses allowed_rate.
rate_limit_type string DROP Defines the rate_limiter behaviour when rate from any contributor is higher, drop the data (DROP), or create backpressure (BACKPRESSURE)
rate_limit_fields []string [] rate-limit is applied for each combination of values of these fields, fields can be added from resource Map. All the fields added in resource from source OR any preceding processors can be considered.

Notes:

  • The rate_limit_fields will accept only fields from resources, as of attributes are not considered (so we can have backpressure possibility with ratelimiter - suggested by @dmitryax )
  • I am doubtful with respect to ratelimiter with Metrics data, as with metrics data it is very common pattern to calculate result time-series with aggregations (sum, avg etc.) on different metrics data points collected, if we somehow drop some metrics data-points and other metrics data-points for same time are allowed, it can disturb the calculations over these time-series in Grafana dashboards. I think backpressure might not disturb this , but drop should disturb this.

@axw
Copy link
Contributor

axw commented Jan 6, 2025

@anuraj381 some thoughts, comparing to what we implemented in https://github.com/elastic/opentelemetry-collector-components/blob/main/processor/ratelimitprocessor/config.go

Units

It would be useful to have different units for limiting:

  • Requests: the number of plogs.Logs, pmetrics.Metrics, etc.
  • Records: the number of log records, metric data points, spans, or profile samples within a request
  • Bytes: the number of bytes, when encoding the request to OTLP (if necessary could use encoding extension for other codecs)

IIUC, what you've described is essentially "records" as the implied unit. If that's the case, then we could add configuration later to choose the other units.

I have a question though: with your configuration, if the processor encounters log records and spans, would they update the same rate limiter, or would log records and spans be rate limited independently? Having a single rate limit would be important for our use of bytes-based rate limiting, where we need to prevent individual tenants from overwhelming a downstream service's network-level rate limiter.

Cross-signal support

I am doubtful with respect to ratelimiter with Metrics data, as with metrics data it is very common pattern to calculate result time-series with aggregations (sum, avg etc.) on different metrics data points collected, if we somehow drop some metrics data-points and other metrics data-points for same time are allowed, it can disturb the calculations over these time-series in Grafana dashboards. I think backpressure might not disturb this , but drop should disturb this.

There's potential for misuse, but still important in some cases to rate limit across all signal types (see above section).

Rate limit fields

If rate_limit_fields can only access resource attributes, should that be conveyed by the configuration name somehow? e.g. resource_attributes.

Rate limiting by client metadata (HTTP headers, gRPC metadata, Kafka message headers, etc.) is also important, e.g. for rate limiting by a tenant ID that is extracted from an auth token. It should be possible to add configuration for this later, in a separate configuration attribute.

Rate limiting algorithm

IIUC, the algorithm described is that the rate limiters are initialised to allowed_rate and reset every interval. Is that correct? Did you consider a leaky bucket rate limiter, where you specify a burst (effectively allowed_rate) and refill rate, e.g. allow a burst of 1000, and then 1 every second? I think that would be a bit more flexible.

What is the expected behaviour if rate_limit_type: BACKPRESSURE is configured, and there are multiple Resource<Signal>s in a request with different values for fields in rate_limit_fields?

Distributed rate limiting

We have a need for distributed rate limiting, i.e. multiple collectors coordinating rate limiting. In https://github.com/elastic/opentelemetry-collector-components/blob/main/processor/ratelimitprocessor/config.go this is enabled by configuring the gubernator: ... config, but I think we could/should extract this into an extension so there could be multiple independently maintained implementations.

@juergen-kaiser-by
Copy link
Author

@anuraj381 Do you have plans for reporting of happening rate limiting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants