-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make the rules sample limiter and default sampling rate configurable #215
make the rules sample limiter and default sampling rate configurable #215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small bunch of comments, but generally looks OK to me.
I'll mark this approved after you have considered them and possibly pushed additional changes.
src/tracer_options.cpp
Outdated
error << name_for_diagnostic << " has trailing non-floating-point characters: " << text; | ||
return ot::make_unexpected(error.str()); | ||
} | ||
if (value < minimum || value > maximum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the value is "NaN"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NaN
is out of range, which is the desired behavior but maybe technically odd. I'll let it fail here rather than having a special case forbidding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I need to check that it's inside the interval, not that it's not outside the interval.
src/tracer_options.cpp
Outdated
@@ -85,6 +87,36 @@ std::vector<std::string> tokenize_propagation_style(const std::string &input) { | |||
return result; | |||
} | |||
|
|||
ot::expected<double, std::string> parseDouble(const std::string &text, double minimum, | |||
double maximum, | |||
const char *name_for_diagnostic) try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd typically make it the caller's responsibility to detail the error, but the "it's a parser" exclusion can be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The callers were always saying "$name_for_diagnostic
whatever the error condition was," so I combined them. An alternative might be:
auto maybe_value = parseDouble(analytics_rate, 0.0, 1.0);
if (!maybe_value) {
maybe_value.error().insert(0, "While parsing DD_TRACE_ANALYTICS_SAMPLE_RATE: ");
return ot::make_unexpected(maybe_value.error());
}
Yes, I'll change it to that.
test/limiter_test.cpp
Outdated
} | ||
|
||
SECTION("dedicated constructor configures based on desired allowed-per-second") { | ||
const double per_second = 23.887; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
23.97 (24FPS) would have been a cooler choice.
REQUIRE(metrics.find("_dd.agent_psr") != metrics.end()); | ||
REQUIRE(metrics.find("_dd.rule_psr") != metrics.end()); | ||
REQUIRE(metrics.find("_dd.limit_psr") != metrics.end()); | ||
REQUIRE(metrics.find("_dd.agent_psr") == metrics.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the rates sent by the agent are never used? Or is there a new test required to handle that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any configuration now that will cause the priority sampler to be used. There's always a matching rule.
New test for which case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation yesterday, now I understand what you mean. Have since pushed changes to correct the behavior. I'll update the PR description, too...
@@ -61,45 +78,52 @@ TEST_CASE("tracer options from environment variables") { | |||
{"DD_TRACE_REPORT_HOSTNAME", "true"}, | |||
{"DD_TRACE_ANALYTICS_ENABLED", "true"}, | |||
{"DD_TRACE_ANALYTICS_SAMPLE_RATE", "0.5"}, | |||
{"DD_TAGS", "host:my-host-name,region:us-east-1,datacenter:us,partition:5"}}, | |||
{"DD_TAGS", "host:my-host-name,region:us-east-1,datacenter:us,partition:5"}, | |||
{"DD_TRACE_RATE_LIMIT", "200"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for invalid values that a user might set for these variables?
include/datadog/opentracing.h
Outdated
uint64_t trace_tags_propagation_max_length = 512; | ||
// `sampling_max_per_second` is the limit on the number of rule-controlled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it max or limit?
{"DD_TAGS", "host:my-host-name,region:us-east-1,datacenter:us,partition:5"}}, | ||
{"DD_TAGS", "host:my-host-name,region:us-east-1,datacenter:us,partition:5"}, | ||
{"DD_TRACE_RATE_LIMIT", "200"}, | ||
{"DD_TRACE_SAMPLE_RATE", "0.7"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just curious: should we test this for different locales (such as ones that use comma instead of period), or would that just be unit-testingstd::stod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question. std::stod
does what std::strtod
does, which does what strtod
does, which changes based on the C locale.
My preference is to not think about it, and to hope that the tests never have to run in a different locale. I do wonder about people pulling our code and running the tests, though. Hm.
One option is to interpolate always from a double
serialized using to_string
.
Let's leave it like this for now.
Thanks for taking a look. I pushed some commits addressing your points, and commented on others. |
665ea8e
to
184d485
Compare
@@ -10,7 +10,7 @@ namespace opentracing { | |||
|
|||
ot::expected<TracerOptions> optionsFromConfig(const char *configuration, | |||
std::string &error_message) { | |||
TracerOptions options{"localhost", 8126, "", "web", "", 1.0}; | |||
TracerOptions options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line almost destroyed me. Hours lost...
5484578
to
9e03984
Compare
This is ready for another round of review. See from the commit |
510c208
to
d443d66
Compare
…sampling rate, and allow configuration using DD_TRACE_SAMPLE_RATE
…, fall back to priority sampling
This reverts commit 80d11a7.
d443d66
to
3b2bdcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few comments for additional consideration.
Thanks!
// traces that may be sampled per second. This includes traces that match | ||
// the implicit "catch-all" rule appended to `sampling_rules`. This option | ||
// is also configurable as the environment variable DD_TRACE_RATE_LIMIT. | ||
double sampling_limit_per_second = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an integer instead? It's a quantity, and I'm not certain values of 0.2 or 1.5 make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it's a use case anyone would want, but I did verify that values less than one have the expected behavior: 0.5 means at most one every two seconds. Might have even added a unit test.
It does no harm and allows for that kind of thing, so I'll keep it.
// by the environment variable DD_PROPAGATION_STYLE_INJECT. | ||
// | ||
// Extra keys will be ignored. | ||
// Accepts configuration as a JSON object. See `optionsFromConfig` in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - this comment wasn't being maintained when changes in the actual implementation occurred.
Does that part (.cpp) now need similar comments about what each item is for and the expected values, or just "read the code" for that detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the latter because I didn't feel like writing the documentation myself, and saw that it might fall out of sync anyway :)
I couldn't find a place in the public documentation, either, that describes everything supported here. One thing to consider is making this a public contract, maintenance and all, and then linking to it from the public docs.
Not today, though.
@@ -85,6 +87,30 @@ std::vector<std::string> tokenize_propagation_style(const std::string &input) { | |||
return result; | |||
} | |||
|
|||
ot::expected<double, std::string> parseDouble(const std::string &text, double minimum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for amusement, what happens if minimum or maximum are NaN, -Inf, +Inf etc?
(Don't change any code as a result of this question)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infinities I didn't test, but NaNs I did: https://github.com/DataDog/dd-opentracing-cpp/pull/215/files#diff-d7661a9a1ea89e9e8db7b4aa21e0bbc645e212a65a13cbafe65e2eab4c2aba2eR141-R143
const TracerOptions &input) { | ||
using namespace std::string_literals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion on this, but for people unfamiliar with the syntax, it's surprising to see the trailing s on the literals.
There's other parts of the code that could be modified to use string literals, but the choice at the time was to be explicit instead of subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the trailing s
is both explicit and subtle.
We could adopt a house style that says "avoid use of special string literals, because they are easily confused with built-in literals and may surprise the reader."
I'd prefer to use the literals instead.
test/integration/nginx/Dockerfile
Outdated
@@ -85,4 +85,5 @@ RUN go get github.com/jakm/msgpack-cli | |||
# Copy these last so that edits don't need as many image rebuild steps | |||
COPY ./test/integration/nginx/nginx_integration_test.sh ./ | |||
COPY ./test/integration/nginx/expected_*.json ./ | |||
# CMD [ "bash", "-x", "./nginx_integration_test.sh"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably left over from some hackering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, will fix.
"_dd.limit_psr": 1, | ||
"_dd.rule_psr": 1, | ||
"_sampling_priority_v1": 2 | ||
"_dd.agent_psr": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tags might need double-checking for valid combinations of limit_psr, rule_psr and agent_psr being present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From standup with Caleb:
In this example, those limit/rule metrics are not present. Wasn't there something about the omission of these meaning that the service is labeled "partially configured"? Double-check that to make sure that we don't introduce a case where a tracer with rules that don't match ends up being labeled "partially configured" even though it is "configured."
"CodeQL / Analyze (cpp)" takes its time. I didn't consider that when I reviewed the addition. Might be a good idea to get CircleCI to run that in parallel with the integration test and coverage report. |
Tip: Tell github to "hide whitespace" for this diff. I moved some
try
blocks around, which changed indentation.These revisions address the issue raised in customer issue APMS-6277 ("Attempting to use configuration 'max_traces_per_second' to limit tracing to 5 per second") and the work item OKR2,KR1 ("
DD_TRACE_SAMPLE_RATE
implemented and documented").Brett mentioned that in
dd-trace-py
, theDD_TRACE_SAMPLE_RATE
environment variable results in the equivalent sampling rule being appended to any other rules that the user might have specified. This means that agent-led priority sampling only happens ifDD_TRACE_SAMPLE_RATE
doesn't have a value. Additionally, the rule-based sampler's limiter can be configured using the environment variableDD_TRACE_RATE_LIMIT
.All of that behavior is implemented here.