-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When spawning many tasks, some tasks never run. #1388
Comments
@DevQps: Are you able to post your current project? |
Hey Aaron, I am sorry for not doing so before. I created a more or less minimum viable project to reproduce the bug. It's around 200 lines, but most are about setting up the connection and spawning tasks, so I think it should not be too hard to understand. It contains many comments as well (hope that helps!) Here it is:
Cargo.toml
Output:
As you can see here: It reached the part where it actually sent the request, but it seems like the first queried tasks are new executed to completion anymore (tasks with index 0 till 12000). If you have more questions please ask! So my basic question is: Is this due to tokio's scheduling algorithm? (and do you have any cool tips and tricks on how I can fix this?) Or is this because of something else I did wrong? Ps. If you cannot reproduce the problem, increasing the amount of tasks in main might help to reproduce it! I was able to reproduce it on my desktop and laptop. Thanks! |
@Aaron1011 Hey Aaron! Were you able to take a look at it? Just wondering if you missed my previous message maybe because I didn't tag you! |
+1 on this. I'm seeing similar behavior where some tasks just stop getting run, especially in cases where you have more tasks than cores. My next step is to be adjust the runtime by setting a higher number of core threads to see if it alleviates the problem. |
It could be google disliking many requests at once. |
I don't think that is it though because only the first requests get stalled, the ones after that are running great. But thanks for the suggestion though!
Get Outlook for Android<https://aka.ms/ghei36>
…________________________________
From: Alice Ryhl <[email protected]>
Sent: Saturday, August 10, 2019 5:14:43 PM
To: tokio-rs/tokio <[email protected]>
Cc: Christian Veenman <[email protected]>; Mention <[email protected]>
Subject: Re: [tokio-rs/tokio] When spawning many tasks, some tasks never run. (#1388)
It could be google disliking many requests at once.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1388?email_source=notifications&email_token=ALFZIMWSQTQRM2IEC7N5ZO3QD3SPHA5CNFSM4IJFTU72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4AQOCI#issuecomment-520161033>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ALFZIMULJYCW3WLXYFODBETQD3SPHANCNFSM4IJFTU7Q>.
|
I have a similar/the same issue, but I can only reproduce it using PollEvented and the current_thread executor: [package]
name = "untitled"
version = "0.1.0"
edition = "2018"
[dependencies]
tun = { version = "0.4.4", features = ["mio"] }
tokio = "0.2.0-alpha.1" #![feature(async_await)]
use std::error::Error;
use tokio;
use tokio::io::AsyncReadExt;
use tokio::reactor::PollEvented;
use tun;
#[tokio::main(single_thread)]
async fn main() -> Result<(), Box<dyn Error>> {
let mut config = tun::Configuration::default();
config
.address((10, 0, 0, 1))
.netmask((255, 255, 255, 0))
.up();
let dev = tun::create(&config)?;
let mut dev = PollEvented::new(dev);
let mut buf = [0; 1024];
loop {
tokio::spawn(async { println!("spawn") });
let size = dev.read(&mut buf).await?;
println!("read: {}", size);
}
}
Any future spawned before the first await on the PollEvented works fine, but afterwards it doesn't. |
What does CPU usage look like? |
Usage is zero, checking strace shows it's actually blocking on a read
I just tried tokio 0.1 without async and get the same result there. |
Even with an enlarged thread pool and no timeout on threads, still occurs for me, and with low cpu as well. Would love to be able to debug this better, but threadpool::worker logging is trace only and a little spammy for figuring out what is going on. Luckily I can debug on the box, but perf wasn't informative, will try attaching gdb next. |
I'm having the same issue. I was trying to do a tokio 'hello world' exercise to get started in the tokio/async world. `#![feature(async_await)] extern crate tokio; #[tokio::main(multi_thread)] Cargo.toml: [dependencies] |
@colepoirier Your issue is likely due to an early exit. You'll need someway to indicate your main is done, possibly using a channel to send a result when your async function completes, and your main receives all the completes. |
Thanks for your help @dbcfd! Unfortunately, with the modification of the channel sending a result, its somewhat better but still not consistent. Am I not using the channel correctly? With the code: #![feature(async_await)]
extern crate crossbeam_channel;
extern crate tokio;
#[tokio::main(multi_thread)]
async fn main() {
let (s, r) = crossbeam_channel::bounded::<Result<(), ()>>(10);
for x in 0..10 {
let s = s.clone();
tokio::spawn(async move {
println!("Hello {}", x);
if let Ok(_) = s.try_send(Ok(())) {
()
};
});
}
if let Ok(_) = r.recv() {
()
}
} I still get these results: tokio_test_2 colepoirier $ cargo run --release
Finished release [optimized] target(s) in 0.07s
Running `target/release/tokio_test_2`
Hello 3
Hello 0
Hello 1
Hello 2
Hello 4
Hello 5 |
Possibly it's error'ing on the send, but going to grab your code and try it as well. Still seeing my issue after upgrading to tokio 0.2, but this gives me a lot better test to debug. |
Oh, you need to receive 10. |
use std::time::{Duration, Instant};
#[tokio::main(multi_thread)]
async fn main() -> Result<(), ()> {
let tasks = 100;
let (s, r) = crossbeam_channel::bounded::<usize>(tasks);
for x in 0..tasks {
let s = s.clone();
tokio::spawn(async move {
println!("Running {}", x);
tokio::timer::Delay::new(Instant::now() + Duration::from_millis(100)).await;
if let Err(e) = s.try_send(x) {
panic!("Failed to send: {:?}", e);
}
});
}
let mut received_messages = 0;
let mut received_values = vec![];
while received_messages < tasks {
match r.recv() {
Err(e) => panic!("Failed to receive: {:?}", e),
Ok(v) => {
received_values.push(v);
received_messages += 1;
}
}
}
received_values.dedup_by_key(|x| *x);
println!("Received: {:?}", received_values);
assert_eq!(received_values.len(), tasks);
Ok(())
} Although all the tasks run, some tasks take quite a while to run.
|
@ntkoopman I've done some digging with your test-case and it seems that it's not related to tokio. The blocking read somewhat indicates that the actual file-descriptor is not in non-blocking mode, hence the blocking read. I've modified rust-tun, so that setting the fd to non-blocking mode is possible (https://github.com/cynecx/rust-tun/tree/non_blocking). The test-case seems to run correctly with the tun fd set in non-blocking mode. |
@dbcfd Btw, I am not 100% sure if using crossbeam-channel inside an async-fn is a good idea. The issue is that the wake-up-model is different with crossbeam-channel, a recv-operation (with crossbeam) might actually #[tokio::main(multi_thread)]
async fn main() -> Result<(), ()> {
let tasks = 100;
let (s, mut r) = tokio::sync::mpsc::channel(tasks);
for x in 0..tasks {
let mut s = s.clone();
tokio::spawn(async move {
println!("Running {}", x);
tokio::timer::delay(Instant::now() + Duration::from_millis(100)).await;
s.send(1).await;
});
}
let mut received_messages = 0;
while received_messages < tasks {
match r.recv().await {
None => panic!("Failed to receive"),
Some(v) => {
received_messages += v;
}
}
}
assert_eq!(received_messages, tasks);
Ok(())
} The modified test-case completes instantaneously, however I haven't really experienced a noticeable delay with the original test-case. |
@cynecx I do think that the issue that I'm seeing is something with the interplay between crossbeam and tokio, even with a try_recv, rather than a receive. On my test case though, it completes fairly quickly, but some tasks get starved, e.g. task 5. There's about 40 some tasks that run before task 5 runs. |
Thank you for your help! Trying to use tokio became more trouble than it was worth, so I switched to the runtime crate, and it works flawlessly. No weird bugs like this so far. |
Sorry to hear you have been hitting trouble, but I will close the issue as I see no actionable items. For others, the issue is the use of blocking operations on the event loop. The solution is to use tokio::sync::mpsc instead of crossbeam. Using runtime would have the same problems, so I am not immediately sure why the issue is not immediately apparent. My guess is runtime spawns way more threads by default tokio spawns threads based on physical cores and my guess is that runtime does not by default. |
Now I’ve been using runtime for a little while and everything just works as you’d expect it to i.e. it does not have the same problems with crossbeam queues and channels. I’ve had nothing but problems with tokio 0.1 and tokio 0.2-alpha.x. |
Version
tokio = "0.1.22"
Platform
Windows 10
Description
Before I fill in the complete form, I'll first give a small description (hope that's fine).
I am trying to scrape the Google Certificate Transparency Logs using Tokio, hyper, h2 and the new async-await syntax. The problem arises when I spawn many requests over a single HTTP2 connection. This occurs around 25 tasks looping the following psuedocode:
Worker:
The behavior I see is that the first 10 tasks get up to point 3 but then never go to point for. Other tasks spawned after that are able to complete the whole progress.
Question: Can this be due to the scheduling algorithm of tokio? That if many tasks are being spawned that all have work to do when polled, that some old tasks might never be triggered anymore, because newer tasks keep the runtime busy? If this is the case: Is there any workaround around this?
Thanks in advance! If this is not the case I will rework my code to a minimum example and post it here!
The text was updated successfully, but these errors were encountered: