Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate Limit Processor #35204

Open
juergen-kaiser-by opened this issue Sep 16, 2024 · 8 comments
Open

Rate Limit Processor #35204

juergen-kaiser-by opened this issue Sep 16, 2024 · 8 comments
Labels
enhancement New feature or request Sponsor Needed New component seeking sponsor

Comments

@juergen-kaiser-by
Copy link

juergen-kaiser-by commented Sep 16, 2024

Component(s)

No response

Is your feature request related to a problem? Please describe.

We run an observability backend (Elasticsearch) shared by many teams and services (thousands). The services run in kubernetes clusters and we want to collect the logs of all pods.

Problem: If a service/pod becomes very noisy for some reason, it can burden the backend so much that all of other teams feel it. In short: One team can ruin the day for all others.

We would like to limit the effect a single instance or service can have on the observability backend.

Describe the solution you'd like

  • There should be a method to limit the flow of logs based on some atributes. We apply a standardized set of labels to each Kubernetes deployment, so we would work with those.
  • Dropping log lines would be okay (for us) since we can not assume that a noisy service stops being noisy soon. Sampling is not okay because of the nature of logs.
  • If rate limiting happens then we and our users should be able to see it.
    • A single metric ingested into some (other) pipeline would suffice.
    • Let us choose a set of attributes the metric should have (copied) from the rate limited logs so that we can map it to the affected service. In our case, we would like to have our standard pod labels copied to the metric.

Considering the points above, we think that there should be a processor for this.

We have no requirements regarding the algorithm backing the rate limiting. It seems that a token bucket filter (example blog entry) is a reasonable choice here.

Describe alternatives you've considered

Rate Limiter in Receivers

Receiver rate limiting is okay if you only work with attributes available in the receivers. In our case, those are insufficient because we need pod labels. As a workaround, we could inject them into the collector config as env variables and focus the collector on a single pod by deploying it as a sidecar. However, a sidecar deployment consumes too much resources across all pods because we have large clusters (>= 10K pods).

A benefit of a rate limiting in receivers could give collector users to choose between dropping incoming telemetry and just not receive it, effectively creating backpressure.

Rate limiting in Receivers is discussed in #6908.

Rate Limiter as Extension

We lack knowledge about how extensions work internally to say anything about it. Rate limiting as extensions also is discussed in #6908.

Additional context

@juergen-kaiser-by juergen-kaiser-by added enhancement New feature or request needs triage New item requiring triage labels Sep 16, 2024
@juergen-kaiser-by juergen-kaiser-by changed the title Rate Limit based on Atributes Rate Limit Processor Sep 16, 2024
@atoulme
Copy link
Contributor

atoulme commented Sep 17, 2024

Can you explain why sampling doesn't work for you here? What would the configuration of the processor look like?

@juergen-kaiser-by
Copy link
Author

juergen-kaiser-by commented Sep 18, 2024

The sampling existing today does not work because it always is enabled. We would like to only limit too noisy pods/services. Note that a service suddenly can become noisy, so we do not know the beforehand.

Once noisyness is detected, there are different options how to limit the flow. Whether we need sampling or some other rate limiting algorithm boils down to how useful the output is afterwards. For logs, we think that larger chunks of consecutive log lines are more useful than smaller ones (think about a samples stack trace vs. having a full one), therefore the limiting algorithm should create large once before starting to drop log lines. A simple sampling creates small chunks.

That being said, I do not see why it should not be possible to specify different algorithms in the long run. For the other telemetry types, other algorithms are better.

I do not have a good example configuration, yet. Let me think about that.

@juergen-kaiser-by
Copy link
Author

As a first draft:

#[...]

processors:
  ratelimit:

    # defines the rate limiting for log signals. Idea for the structure is borrowed from the transformprocessor. We can have own sections for log, metric, and trace if we want this processor to be generic to all types.
    log:

      # grouping defines wheter and how to group the telemetry by a set of attributes.
      grouping:

        #   mode defines how to deal with logs/metrics/traces not not falling into one of the defined groups (not having the fields)
        #   - strict (default): nonexisting fields are treated as if they do (having a default value) => always rate limit. Fall back to the last group if existent.
        #   - relaxed: rate limiting applies only if all fields are present
        mode: strict

        groups:
        - tbf: # used rate limiting algorithm. Others are possible but only one must be set. (tbf = token bucket filter)
            average_rate_per_sec: 1000
            max_burst_per_sec: 5000

          # defines the list of attributes to use for grouping.
          attributes:
          - context: resource
            attribute: k8s.pod.uid
          - context: resource
            attribute: k8s.pod.tag.my-project-name

#[...]

The structures allows user to:

  • define rate limiting for different signal types
  • define different rate limiting strategies
  • define multiple groups

We could also add conditions to the group attributes to enable users to define more specialized groups.

@axw
Copy link
Contributor

axw commented Oct 11, 2024

Receiver rate limiting is okay if you only work with attributes available in the receivers. In our case, those are insufficient because we need pod labels. As a workaround, we could inject them into the collector config as env variables and focus the collector on a single pod by deploying it as a sidecar. However, a sidecar deployment consumes too much resources across all pods because we have large clusters (>= 10K pods).

Do you have any special authn/z requirements? If not, an option that may work for you is Kubernetes service account tokens in conjuction with the oidcauthextension.

Service account tokens are OIDC-compatible, and their claims contain the namespace and service account name, among other things. They do not contain pod labels - maybe you can enforce service account names for your customers?

If all of those stars align, then you could extract those claims and use them as keys for a rate limiter extension...

Rate limiting in Receivers is discussed in #6908.

We have built a distributed rate limiter extension for internal purposes, and we are planning to offer it to the contrib repo. The extension is currently rate limiting in the exporter, but it would be fairly straightforward to update it to also rate limit in receivers, and it's our plan to do so.

@juergen-kaiser-by
Copy link
Author

In our case, we consume logs generated by pods/containers in a Kubernetes clusters. Those logs are stored in files by Kubernetes and are mounted into the otel collector's container for consumption. There is no authentication involved in this path (in my understanding).

If you need some unique key for the "entity" which should be rate limited then we could work with the pod id or container id. However, those must be extracted first from the log file name. If the extension you mention works with the output of a receiver then this can work. However, it would not allow us to get useful notifications/metrics telling us about rate limited services. For that, we need more information than just the pod/container id.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 12, 2024
@axw
Copy link
Contributor

axw commented Oct 14, 2024

@juergen-kaiser-by I see, thanks for elaborating. The extension we have built is based on the auth framework (not the ideal interface, but it works) which would not fit with filelog anyway. The core functionality could also be extracted into a processor.

I think creating back pressure on the producer is ideal, which makes rate limiting on resource/record attributes tricky: there are potentially multiple resources per OTLP batch, so rate limiting on one resource may cause back pressure for the other. I think that probably doesn't apply in your case, I'm just thinking about whether it would lead to surprising behaviour for others.

Another option would be to allow the processor to only rate limit based on batch/request-level metadata. For example you might use this to rate limit a batch of data based on some HTTP header (e.g. a tenant ID), if you were using it with the OTLP receiver. Then you would find a way to get the metadata you need (i.e. pod labels) in there. We could potentially extend receivercreator to set this metadata based on pod labels. Something like:

extensions:
  k8s_observer:

processors:
  ratelimiter:
    metadata_keys: [tenant]
    # other config ...

receivers:
  receiver_creator:
    watch_observers: [k8s_observer]
    receivers:
      filelog:
        rule: type == "pod.container"
        metadata:
          - key: tenant
            value: `pod.labels["tenant"]`
        config:
          include:
            - /var/log/pods/`pod.namespace`_`pod.name`_`pod.uid`/`container_name`/*.log
          include_file_name: false
          include_file_path: true
          operators:
            - id: container-parser
              type: container

@juergen-kaiser-by
Copy link
Author

I think creating back pressure on the producer is ideal, which makes rate limiting on resource/record attributes tricky: there are potentially multiple resources per OTLP batch, so rate limiting on one resource may cause back pressure for the other. I think that probably doesn't apply in your case, I'm just thinking about whether it would lead to surprising behaviour for others.

Please remember that we would like to be able to see when log messages get dropped.

If you work with backpressure, then it is up to the previous instances to decide what to do if there is too much telemetry. In our case, the source of the log messages can be seen as the logging service or Kubernetes which stores the logs into files. Obviously, we do not want to slow down the service. Kubernetes itself does not care whether you consumed the logs. It does not wait for you and will rollover and delete the log messages as necessary. Consequently, our only chance to get information about dropped logs is to get them into the collector and drop the messages there as necessary.

@axw
Copy link
Contributor

axw commented Oct 16, 2024

@juergen-kaiser-by makes sense. I think the ratelimit processor could have configuration to either return an error, silently discard, or create backpressure. In your case could configure it to silently discard the data, and the processor would record metrics about rate limiting. I think that aligns with the solution you originally described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Sponsor Needed New component seeking sponsor
Projects
None yet
Development

No branches or pull requests

4 participants