Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APMLP-350 fix crash in crashtracker when agent url is an ipv6 address #4237

Merged
merged 6 commits into from
Dec 19, 2024

Conversation

p-datadog
Copy link
Member

What does this PR do?

  • Repairs the crash in crashtracker due to insufficient error handling - adds a check for the endpoint construction failing (returning NULL) and raises an exception in this case
  • Repairs incorrect agent URL construction from IPv6 addresses - the IPv6 addresses must be bracketed when put into the hostname of a URL

Motivation:

Crash reported by system tests

Change log entry

Yes: fix a crash in crashtracker when agent hostname is an IPv6 address

Additional Notes:

The code that this PR touches may have more missing error handling (of other situations), I didn't check it exhaustively.

How to test the change?

Unit tests are added.
I also wrote DataDog/libdatadog#809 for libdatadog to verify it works OK with our provided input (which it does).

@p-datadog p-datadog requested review from a team as code owners December 19, 2024 13:51
@github-actions github-actions bot added the core Involves Datadog core libraries label Dec 19, 2024
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Dec 19, 2024

Datadog Report

Branch report: crashtracking-crash
Commit report: 20aac55
Test service: dd-trace-rb

✅ 0 Failed, 22121 Passed, 1477 Skipped, 6m 5.16s Total Time

@codecov-commenter
Copy link

codecov-commenter commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.75%. Comparing base (11b9ae1) to head (20aac55).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4237   +/-   ##
=======================================
  Coverage   97.74%   97.75%           
=======================================
  Files        1355     1355           
  Lines       82333    82365   +32     
  Branches     4226     4230    +4     
=======================================
+ Hits        80477    80516   +39     
+ Misses       1856     1849    -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pr-commenter
Copy link

pr-commenter bot commented Dec 19, 2024

Benchmarks

Benchmark execution time: 2024-12-19 15:00:58

Comparing candidate commit 20aac55 in PR branch crashtracking-crash with baseline commit 7fd1feb in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 30 metrics, 2 unstable metrics.

scenario:tracing - Propagation - Trace Context

  • 🟩 throughput [+4487.343op/s; +4606.193op/s] or [+13.147%; +13.495%]

Copy link
Member

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@p-datadog p-datadog merged commit c116d75 into master Dec 19, 2024
337 checks passed
@p-datadog p-datadog deleted the crashtracking-crash branch December 19, 2024 23:03
@github-actions github-actions bot added this to the 2.9.0 milestone Dec 19, 2024
ivoanjo added a commit that referenced this pull request Jan 2, 2025
**What does this PR do?**

This PR builds atop #4237 and fixes a similar-ish issue in the profiler
caused by the same mishandling of ipv6 addresses.

In particular, when provided with an ipv6 address in the agent url,
the profiler would fail with an exception:

```
$ env DD_AGENT_HOST=2001:db8:1::2 DD_PROFILING_ENABLED=true \
bundle exec ddprofrb exec ruby -e "sleep 2"

dd-trace-rb/lib/datadog/profiling/http_transport.rb:27:in `initialize':
Failed to initialize transport: invalid authority (ArgumentError)
```

**Motivation:**

Luckily we didn't have any customers using this, as it fails immediately
and loudly, but it's still a bug on a configuration that should be
supported.

**Additional Notes:**

Since we had similar buggy logic copy-pasted in crashtracking and
profiling (crashtracking had been fixed in #4237) I chose to extract
out the relevant logic into the `AgentSettings` class, so that
both can reuse it.

**How to test the change?**

I've added unit test coverage for this issue to profiling, and
the snippet above can be used to end-to-end test it's working fine.

Here's how it looks on my machine now:

```
E, [2025-01-02T17:32:32.398756 #359317] ERROR -- datadog: [datadog]
(dd-trace-rb/lib/datadog/profiling/http_transport.rb:68:in `export')
Failed to report profiling data (agent: http://[2001:db8:1::2]:8126/):
failed ddog_prof_Exporter_send: error trying to connect: tcp connect
error: Network is unreachable (os error 101): tcp connect error:
Network is unreachable (os error 101): Network is unreachable (os error 101)
```

E.g. we correctly try to connect to the dummy address, and fail :)

(Note: The error message is a bit ugly AND repeats itself a bit.
That's being tracked separately
in DataDog/libdatadog#283 )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants