Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honoring of Sampling Priority Headers #236

Open
matthalbersma opened this issue Aug 18, 2022 · 5 comments
Open

Honoring of Sampling Priority Headers #236

matthalbersma opened this issue Aug 18, 2022 · 5 comments

Comments

@matthalbersma
Copy link

It appears that the work for honoring sample priority headers was added 9b9248f.

Do you have an estimated release time for this? Does it work as expected in NGINX where passing an x-datadog-sampling-priority header will force ingestion?

@dgoffredo
Copy link
Contributor

The work done in 9b9248f is not about X-Datadog-Sampling-Priority exactly. Instead, it's about keeping track of the "sampling mechanism" associated with the sampling decision.

The behavior of extracting X-Datadog-Sampling-Priority from in incoming request, honoring it if found, and injecting it into outgoing requests, has always been supported.

What kind of setup do you have, and which behavior are you looking for?

@matthalbersma
Copy link
Author

We have Nginx with Opentracing module. I was expecting that adding that header and setting value to 1 or 2 would force ingest, but have found that not to be the case. It appears that the DD Agent is still filtering out requests. Do you have another recommended method for forcing trace ingest?

@dgoffredo
Copy link
Contributor

Yes, there are a few ways that you could do this.

If you want to set the probability that any trace is ingested, you can specify the "sample_rate" property in the JSON configuration in the file passed to opentracing_load_tracer, e.g.

$ cat /etc/nginx/dd-config.json
{
    "sample_rate": 1.0
}

$ grep 'opentracing_load_tracer' /etc/nginx/nginx.conf
    opentracing_load_tracer /usr/local/lib/libdd_opentracing_plugin.so /etc/nginx/dd-config.json;

Here 1.0 means "ingest 100% of the time."

You can do the same thing by setting the DD_TRACE_SAMPLE_RATE environment variable, though then you must be careful to also forward that environment variable to worker processes in the nginx configuration:

env DD_TRACE_SAMPLE_RATE;

You could even hard-code a value in the configuration:

env DD_TRACE_SAMPLE_RATE=1.0;

You can also override the sampling decision for a particular location by setting one of the manual.keep or manual.drop tags. They override the sampling decision for a particular request, e.g.

http {
    # ...
    location / {
        # Always keep these traces.
        opentracing_tag manual.keep 1;

        proxy_pass http://upstream;
    }

    location /healthcheck {
        # Never keep these traces.
        opentracing_tag manual.drop 1;

        proxy_pass http://upstream;
    }
}

Finally, if you already vary the "operation name" in different locations, then you can define sampling rules that change the sampling probability (sample rate) based on the request's operation name.

Note that if you use any of "sample_rate", DD_TRACE_SAMPLE_RATE, or sampling rules, then there is also a configurable throttle that defaults to 100 traces per second maximum. You can alter this limit by setting the "sampling_limit_per_second" property in the configuration JSON, or by setting the DD_TRACE_RATE_LIMIT environment variable (with the same caveat as before that you must tell nginx to forward the environment variable to its workers via the env configuration directive).

@matthalbersma
Copy link
Author

I'm aware of the manual.keep and sample rate. At scale our sample rate can be 1 but traces will get dropped by the Agent (rate-limited). I want to ensure that traces that I absolutely need (IE QA testing) are ingested into Datadog. I was hoping that the Sampling Priority header would do that but it does not. My problem with manual keep is I need manual keep : 1 on conditional if a header is present and meets pre determined value. The opentracing_tag directive can't be used in a conditional block and I need it for all locations. I guess I'm leaning to a nginx variable and lua solution where lua sets a variable to 1 or 0 and then put manual.keep tag on all requests. I was just hoping to avoid that.

@dgoffredo
Copy link
Contributor

dgoffredo commented Aug 31, 2022

My problem with manual keep is I need manual keep : 1 on conditional if a header is present and meets pre determined value.
[...]
I guess I'm leaning to a nginx variable and lua solution where lua sets a variable to 1 or 0 and then put manual.keep tag on all requests.

That would have the side-effect of dropping eveything except QA requests.

I have a few ideas about workarounds that you could use.

First, though, I'll mention that there is a new nginx module specifically for Datadog that is currently in beta: https://github.com/DataDog/nginx-datadog. This module does allow tags to be set in if blocks: https://github.com/DataDog/nginx-datadog/blob/master/doc/API.md#datadog_tag. However, I notice that I didn't write a test for that case in particular (https://github.com/DataDog/nginx-datadog/tree/master/test/cases/tags/conf). I'll have to add one.

If you don't want to switch over to bleeding edge software, then here are some workarounds to try instead:

  1. Here's a hack that will spew logging all over your stderr, but it could otherwise achieve what you want. Say that you want to sample 100% of traces for which the X-Internal-Thingy request header has the value QA. Define a variable $sampling_priority_for_internal_thingy by adding a map to the http block:
map $http_x_internal_thingy $sampling_priority_for_internal_thingy {
    QA    2;
    default 999;
}

Sampling priority 2 corresponds to "manual keep." Sampling priority 999 is bogus, and will not affect sampling, but will produce a line in stderr. This is probably a dealbreaker, but in case it's not, then you can use the new variable in opentracing_tag:

location / {
    # ...
    opentracing_propagate_context;
    opentracing_tag sampling.priority $sampling_priority_for_internal_thingy;
}
  1. You can make the "operation name" of the spans you want to keep different from other spans. You can then use sampling rules to associate the special operation name with 100% sampling:
$ cat /etc/nginx/dd-config.json
{
    "sampling_rules": [{"name": "handle.request.qa", "sample_rate": 1.0}]
}
http {
    map $http_x_internal_thingy $operation_name_for_internal_thingy {
        QA    handle.request.qa;
        default handle.request;
    }

    opentracing_load_tracer ... /etc/nginx/dd-config.json;

    # ...

    location / {
        # ...
        opentracing_propagate_context;
        opentracing_operation_name $operation_name_for_internal_thingy;
    }

    # ...
}

This is better than the spewing logs, but has the drawback that different requests will have different operation names, which you might not want.

Note that option (2) will not work if nginx is not the first service in the trace. (1) would work all the time, but sucks because of the log spew.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants