-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ares_gethostbyname() and exception safety #219
Comments
I think option 2 is the way to go. c-ares is a C library and so isn't really in a position to do anything about C++ exceptions. As a result, the callbacks it invokes really need to act like C code too (e.g. by having an outer wrapper that consumes all exceptions) -- which is option 2. |
Previously, we were taking an exception due to the late validation of update_merge_window duration in ClusterManagerImpl::scheduleUpdate, which happened under a c-ares strict DNS host resolution callback. There are several related issues here: 1. c-ares is exception unsafe, see c-ares/c-ares#219. 2. We should be validating Durations with PGV, see bufbuild/protoc-gen-validate#97. 3. We should defer the c-ares resolution callbacks to be outside the c-ares callback context for exception safety. This PR addresses (3) by moving callbacks, even when they are "immediate", to a dispatcher post, so that we never take an exception under a c-ares callback. A workaround for (2) is provided, in lieu of bufbuild/protoc-gen-validate#97, which is blocked on our ability to bump PGV version in Envoy, see lyft/protoc-gen-star#28. Fixes oss-fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9868. Risk level: Medium (DNS clusters will have some timing changes). Testing: Updated DNS implementation unit tests, server fuzz corpus entry added. Signed-off-by: Harvey Tuch <[email protected]>
Yeah, I think in general this is hard to do, although in this specific case it would be possible to save a pointer to the allocated string somewhere (not on the stack) that Anyway, totally reasonable that c-ares as a C library doesn't want to be in the business of trying to handle C++ exceptions; I have a fix Envoy-side in envoyproxy/envoy#4307, so I will close this out. Thanks. |
Previously, we were taking an exception due to the late validation of update_merge_window duration in ClusterManagerImpl::scheduleUpdate, which happened under a c-ares strict DNS host resolution callback. There are several related issues here: 1. c-ares is exception unsafe, see c-ares/c-ares#219. 2. We should be validating Durations with PGV, see bufbuild/protoc-gen-validate#97. 3. We should defer the c-ares resolution callbacks to be outside the c-ares callback context for exception safety. This PR addresses (3) by moving callbacks, even when they are "immediate", to a dispatcher post, so that we never take an exception under a c-ares callback. A workaround for (2) is provided, in lieu of bufbuild/protoc-gen-validate#97, which is blocked on our ability to bump PGV version in Envoy, see lyft/protoc-gen-star#28. Fixes oss-fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9868. Risk level: Medium (DNS clusters will have some timing changes). Testing: Updated DNS implementation unit tests, server fuzz corpus entry added. Signed-off-by: Harvey Tuch <[email protected]>
While investigating a memory leak in Envoy (spotted by https://github.com/google/oss-fuzz), a c-ares consumer, I noticed the following trace:
what's happening is we're doing a DNS resolution on an IP address, the
strdup
atc-ares/ares_gethostbyname.c
Line 290 in ad58527
c-ares/ares_gethostbyname.c
Line 293 in ad58527
free
atc-ares/ares_gethostbyname.c
Line 304 in ad58527
There are two ways to fix this:
The text was updated successfully, but these errors were encountered: