-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async networking #63
Conversation
What major use cases for asynchronous networking do you have in mind? Is this intended to be used by exporters? |
Asynchronous networking is more efficient. If the tracer uses synchronous networking, then it could be blocked writing data to a socket even if could move forward with work on other sockets or events. Exporters can use the interface, but they're also welcome to do their own synchronous networking (though that will block the background thread). Additionally, asynchronous networking is important for allowing an application to exit quickly. If you do a synchronous write to an endpoint that's not responding and the app exits, you can hang up the app until the write's timeout is exceeded. |
Yes, I completely agree. I mostly wonder where in the SDK it will be used, apart from exporters. |
It could also be used in the sdk to set up the the timers that regularly trigger the span buffer to flush (similarly for metrics) And asynchronous dns resolution can be supported over an interface like this with c-ares. |
bazel/libevent.BUILD
Outdated
}), | ||
make_commands = select({ | ||
"@io_opentelemetry_cpp//bazel:windows": ["MSBuild.exe INSTALL.vcxproj"], | ||
"//conditions:default" : None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this file didn't go through buildifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update format script to run on these files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #69
/** | ||
* Run the event loop until. | ||
*/ | ||
virtual void Run() noexcept = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this documented? As a user of the SDK, what do I need to do with the Dispatcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intended to support a design where you have a single background thread and asynchronous networking. The background thread would be making successive calls to the systems multiplexing io function (e.g. epoll, kqueue, etc) and invoking callbacks as events are ready.
If you're a user of the SDK and you wanted to use async networking for your vendor exporter, you'd call CreateFileEvent and pass the file descriptors for the sockets you used to connect to your endpoint.
Or if you don't want to do asynchronous io, you can ignore the dispatcher, use synchronous sockets, and block the background thread when you want to do io (less efficient and can hang the process from exiting but supported).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we capture this in a doc please? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added commenting
t1 = std::chrono::steady_clock::now(); | ||
dispatcher.Run(); | ||
auto duration = t2 - t1; | ||
EXPECT_TRUE(duration > std::chrono::milliseconds{50}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On different CI systems we might need to adjust the tolerance of timing.
Consider making this a configurable thing which we could leverage in other test cases (in particular metrics and async exporter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to pick a constant that should be large enough for most environments, though we can probably go larger without slowing down the tests too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just needs an autoformat to pass CI. |
void Dispatcher::Exit() noexcept {} | ||
void Dispatcher::Exit() noexcept | ||
{ | ||
running_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not meant to be threadsafe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, meant for interacting with only a single background thread.
while (running_ && !events_.empty()) | ||
{ | ||
auto next_event = events_.begin(); | ||
std::this_thread::sleep_until(next_event->first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this sleep be interrupted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend a short polling interval over an interrupt mechanism.
I'd rather people use the libevent implementation, and this is meant to be a drop-in replacement that provides the same interface and allows you to use the same codebase but without an event library (in case you only care about synchronous networking and don't want to add an external dependency).
When the dispatcher is built on top of libevent, the background thread makes successive calls to the systems polling function (e.g. epoll or an equivalent). Epoll blocks waiting for either a timeout or and event on a list of file descriptors.
I believe it is possible to artificially interrupt epoll if you create a dummy file descriptor for it to include in its list and then close the file descriptor, forcing an event.
But I prefer, and I think it's simpler, to use a polling function with a short timer interval that monitors for termination, closure, and flush events. Basically what's done here
https://github.com/lightstep/lightstep-tracer-cpp/blob/master/src/recorder/stream_recorder/stream_recorder_impl.cpp#L57-L82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep_until may hang "forever" in gcc-4.8 and vs2017 15.4 when system clock changes, i.e. after NTP time sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses the steady clock which shouldn't have that problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is possible interrupt epoll
According to man epoll_wait
can be interrupted by a signal handler;
you create a dummy file descriptor for it to include in its list and then close the file descriptor, forcing an event.
Codecov Report
@@ Coverage Diff @@
## main #63 +/- ##
==========================================
- Coverage 93.61% 90.70% -2.91%
==========================================
Files 71 85 +14
Lines 1754 1894 +140
==========================================
+ Hits 1642 1718 +76
- Misses 112 176 +64
|
Assigning to @lalitb, as discussed in the SIG meeting on October 21st. The main challenge here would be rebasing and making all GH actions pass. |
Triaged Dec 7 2020: This is a good change to integrate but non-trivial. The maintainers need to figure out what items are blocked on this. This is not an immediate priority. Will keep open until re-evaluated. |
@lalitb do we need rework on this PR to get it merged? |
|
Keeping it open, for someone to take a dig at it. |
|
Adds an interface for managing async file and timer events within the io polling system calls.
Will support asynchronous networking for tracers and meters.
A concrete implementation is provided with libevent.
The dispatcher interface was derived from Envoy.