-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker threads not going idle #114
Comments
The worker threads in timely busy-wait, so all of the threads that have returned are spinning at 100% polling their input channels. That's, .. by design at the moment. Thread wake-up was a non-trivial source of latency in the Naiad design. Depending on what your goals are (lower power util, cooler laptop) you can have the other workers not Is your goal to use less power at the expense of some latency when events arrive? This is a noble goal, but just trying to grok your desiderata. Edit: for context, in most streaming timely computations workers rarely wait for input, and even when there is not data there is information that the low watermark for timestamps have advanced, and the workers need to scurry a bit to determine and communicate that the output has not changed. What you are seeing in the stack above is the An arguably better thing to do would be for each trace to put some effort towards progressive merging, in what is otherwise downtime. This would work towards reducing the memory footprint of in-progress merges in between "actual work". |
Interesting, thanks for the explanation! I cannot afford to busy-wait, because my differential program runs in a bigger system with many other things going on. The differential computation only kicks in when a certain trigger event happens, e.g., system configuration changes. Hogging the CPU is going to affect all other computations. Power and thermal considerations are important as well. |
Probably the best thing to do is coordinate a wake-up, then. If you are just running single-process, you should be able to have the input thread signal each of the worker threads, then everyone works until the timestep has advanced, at which point they can go back to sleep. There is a bit of a concurrency mechanism issue there, but it should be tractable (e.g. an Probably the first thing to do is not have the other threads return, but rather spin explicitly so that you can add a condition that they test. E.g. let mut time = ..; // largest time worker zero has advanced to
while probe.less_than(time) {
worker.step();
}
// double-check time, consider suspending Edit: I wrote a |
Good idea! I just tried with |
The "next-gen" worker implementations, with all the zero-copy goodness, actually each have one of those |
That would be lovely. I really don't think that my use case is unique. There must be many applications of differential where you want to quickly respond to a change in the input and then go idle. In the meanwhile, the coordinated wake up you proposed should work just fine. I wish they did not remove semaphores from the Rust standard library; would have made the implementation easier :) |
The plan is to head towards something like It's possible that there are applications that go idle, but .. so far most of the reports have been a mix of batch (go as fast as possible then terminate) and real-time streaming where you are always taking in data or the signal of no data. Yours isn't unreasonable, and at least one person asked about an even longer-latency version (run DD step; freeze computation; start up an hour later and make a change). We'll get there, eventually. Mostly gated on my programming cycles, unfortunately. T.T |
Btw, timely has (recently) a I'm planning (based on this convo) to expose a |
The solution I ended up with uses a combination of I was not able to get away with, e.g., the Does this make sense and do you seen any value is packaging this solution in some reusable form? |
I started up an issue (https://github.com/frankmcsherry/timely-dataflow/issues/189) about this in the timely repo. Ideally medium/long term it would probably be best to make sure the channels all wake up workers, and expose a mechanism to park workers (in the issue: park operators, with the worker becoming parkable if all operators are). However, I'm not 100% clear on the issue you mention about the first round. Can you explain that in more detail? |
Actually I was wrong, the problem is not in the first round. Here is the high-level structure I am trying to implement. worker0:
worker1..N
The system livelocks if one thread enters the while loop while another thread is blocked waiting for a signal. Here is the simplest livelock scenario:
w0 will loop forever (since without w1 it cannot make progress), and w1 will wait forever. I could not think of an alternative scheme using only the |
Does the event_driven feature address the issue discussed in this thread, i.e., put an end to busy-waiting in workers? |
No, it does not. That's on the road map, but it involves the cooperation of a few parts that don't cooperate yet (the typed inter-thread allocator uses MPSC channels, and they don't wake a thread in the way we want on message send). |
One anecdotal thing is that our server's CPU usage when idle dropped significantly with event_driven. I haven't had time to investigate this yet, but I was surprised. |
I believe this issue is addressed with |
I've noticed that worker threads in my program go into some form of an infinite loop when there is no input data. Specifically, I use worker 0 to receive input from an I/O thread. This one behaves well and blocks waiting for data, but all other workers are consuming 100% of CPU even when there is no data to process.
Here is the high-level structure of the program:
The stack trace of the worker that consumes 100% CPU looks like this:
I'd appreciate any hints on how to investigate this.
The text was updated successfully, but these errors were encountered: