-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddtrace/tracer,profiler: share agent default URL resolution logic #2719
Conversation
2ff5739
to
6aef0fd
Compare
BenchmarksBenchmark execution time: 2024-06-07 13:22:10 Comparing candidate commit b8a865b in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics. scenario:BenchmarkStartSpanConcurrent-24
|
6aef0fd
to
0189499
Compare
The ddtrace/tracer library has code to resolve the agent URL from the program environment, including checking for the default Unix domain socket, which would be good to use from the profiler. Put all this logic in internal so that the tracer and profiler can both use it.
0189499
to
8b0cf45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM for the most parts, but PTAL at my comments.
// - Then, DD_AGENT_HOST and DD_TRACE_AGENT_PORT if either are set | ||
// - Then, the default UDS path given here, if the path exists | ||
// (the path is replacable for testing purposes, otherwise use DefaultTraceAgentUDSPath) | ||
// - Finally, localhost:8126 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My google drive link for this is here (datadog internal): https://drive.google.com/file/d/134CKDYMwWRaphRD5-JjlZq7u71D79cKt/view?usp=sharing
Maybe you can refactor the code itself to make this more clear as well? In particular moving the os.Getenv("DD_AGENT_HOST")
and os.Getenv("DD_TRACE_AGENT_PORT")
calls below os.Stat(defaultSocketAPM)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a test case which sets those environment variables as well as creating the (test) UDS file, and confirms that the env vars are used: https://github.com/DataDog/dd-trace-go/pull/2719/files#diff-8db259852f0d34c3adcab6352683e75d796c0618397f08656c197403c6667256R51. Is the test inaccurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this agent discovery algorithm is a bit cursed (not your fault : p). I didn't carefully study this line:
if _, err := os.Stat(defaultSocketAPM); host == "" && port == "" && err == nil {
So in my mind I thought it was:
if _, err := os.Stat(defaultSocketAPM); err == nil {
But given the actual code, I guess the real discovery algorithm implemented by this function is this one?
So either I'm still confused, or I think the comment doesn't quite match the reality. In particular:
Then, DD_AGENT_HOST and DD_TRACE_AGENT_PORT if either are set
If this was correct, then DD_AGENT_HOST= DD_TRACE_AGENT_PORT=1234
should produce :1234
and DD_AGENT_HOST=foo DD_TRACE_AGENT_PORT=
should produce foo:
. But in reality they produce localhost:1234
and foo:8126
respectively. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the diagram you've shared matches how it should work according to this internal RFC: https://github.com/DataDog/architecture/blob/master/rfcs/apm/integrations/trace-autodetect-agent-config/rfc.md#explicit-configuration. The doc comment is incomplete. I've updated the doc comment to specify the defaults if only one of DD_AGENT_HOST
or DD_TRACE_AGENT_PORT
are defined, which are "localhost" and "8126", respectively. I've also added a few more test cases for this. See b2c2809.
if url.Scheme == "unix" { | ||
WithUDS(url.Path)(&c) | ||
} else { | ||
c.agentURL = url.String() + "/profiling/v1/input" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discovery Algorithm Change
Am I correctly understanding that the current profiler agent discovery algorithm looked like this:
And we're replacing it with this new algorithm that's used by the tracer already?
If yes, I think we should include these graphics in the release notes to make it clear what we're changing. We should also advise people who run into issues with the new algorithm to open a GH issue and downgrade until we have a workaround.
Alternatively we could implement a DD_PROFILING_LEGACY_AGENT_DISCOVERY
env var that would allow people to switch back to the old algorithm.
But for now I can't imagine there to be a plausible need for this. What are the chances that an agent socket exists and for some reason that's not where they want to send the profiles, but they're okay with using that socket for traces (or traces are not enabled).
Hostname detection
Unrelated to this PR, but while studying the code I noticed that the tracer is using a sophisticated hostname resolution approach introduced in #1712 when not using a UDS for the connection. I think the profiler should do the same ... and I'm kinda annoyed we were not pinged on the PR that introduced this 😞. Maybe you can create a follow-up task to look into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discovery Algorithm Change
See the other comment for the intended resolution order. I think we should document this change in the release that introduces. We can specify that it is a bug fix (or enhancement?) for the profiler, to check for the default trace agent Unix domain socket if the agent URL is not explicitly configured. I don't think we'll need a fallback option.
Hostname detection
Opened PROF-9903 for this.
The comment describing the agent URL resolution did not specify what happens if only one of DD_AGENT_HOST or DD_TRACE_AGENT_PORT are defined. Do so, according to the internal RFC for agent URL resolution. Rearrange the code so that it more clearly matches the priority order. Before this commit, the check for whether the host or port are defined was not clearly visible, hidden away in the check for the existence of the default UDS path. Add some more tests to check the host/port behavior. Also, the assert package functions expect the "want" before the "got", so fix that up in the test cases.
Suggested in review. Make DefaultTraceAgentUDSPath a variable in the internal package and have the tracer and profiler tests modify that, rather than keep their own variables. That way, the actual tracer and profiler implementations don't need to rely on it.
Specifically set up a handler for /profiling/v1/input to test that we don't use the filesystem path to the Unix domain socket in the HTTP request path.
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
profiler/options.go
Outdated
@@ -63,6 +63,9 @@ const ( | |||
defaultAgentPort = "8126" | |||
) | |||
|
|||
// Path to the default trace agent Unix domain socket, replaced by tests | |||
var defaultSocketAPM = internal.DefaultTraceAgentUDSPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be dead code? I don't see anything referencing it. Tests still pass if I remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's dead code, not needed after 9b525fc. Thanks for catching it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🙇🏻♂️. Sorry for the delay.
What does this PR do?
Move the agent default URL resolution logic to
internal
so it can be sharedby the tracer and profiler.
This is motivated by the profiler failing to connect to the default trace agent
Unix domain socket in some situations. That gets picked up as a benefit of
sharing more logic, so this PR adds a test for that functionality.
Motivation
The ddtrace/tracer library has code to resolve the agent URL from the program
environment, including checking for the default Unix domain socket. Right now
the profiler doesn't check for the default Unix domain socket and so will fail
to send profiles to the right place in some circumstances. The tracer and
profiler should ideally share all this logic.
Reviewer's Checklist