Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IORING_SETUP_SQPOLL: Understanding behaviour of the kernel thread that never goes to sleep. #729

Open
sohaibiftikhar opened this issue Nov 15, 2022 · 11 comments

Comments

@sohaibiftikhar
Copy link

sohaibiftikhar commented Nov 15, 2022

I was running one of the examples from loiu wiki. I was trying out kernel-side polling which can be quite efficient for our use case. Consider this tiny snippet. It pretty much follows from the example above apart from the addition of the sleep call.

params.flags |= IORING_SETUP_SQPOLL;
params.sq_thread_idle = 2000; // should wait for 2 second idle time.
// I/O logic. Usually lasts less than 2 seconds.
sleep(1000); // calling thread goes to sleep before kernel thread can go to sleep.

In this situation from monitoring my CPU (using basic htop filtering) I notice that the kernel thread status continues to stay R (running) and CPU usage is high. If I adjust the above snipped slightly.

params.flags |= IORING_SETUP_SQPOLL;
params.sq_thread_idle = 2000; // should wait for 2 second idle time.
// I/O logic. Usually lasts less than 2 seconds.
sleep(5); // An additional sleep call. At the end of this the kernel thread should  have expired its timeout.
sleep(1000); // this time calling thread goes to sleep after kernel thread has expired the idle timeout.

The kernel thread goes to sleep after the first sleep returns. I am wondering what should be the relation b/w the calling thread being awake when the kernel thread needs to go back to sleep.

The question is simple. Is this a requirement of the API that I am unaware of or is this a bug?

@isilence
Copy link
Collaborator

After preparing a request you have to call some variation of io_uring_submit(). It usually returns not doing any syscalls, unless it finds that the kernel thread is sleeping and needs to be waken up. Also, the kernel thread won't go to sleep until it processes all sqes it has seen.

In your example, if you submitted all your requests prior to the first sleep and you find that it's sleeping after 5 second, it only means that the kernel has completed its job and has nothing to do.

@sohaibiftikhar
Copy link
Author

sohaibiftikhar commented Nov 16, 2022

Thanks for the answer. Maybe I was not correctly explaining it the first time around.

In your example, if you submitted all your requests prior to the first sleep

I did. But I also received all responses before the first sleep. There is nothing to do in the SQ or to reap from the CQ. The bug/behavior I am hinting at is that the kernel thread continues to spin at 100% CPU if the application thread goes to sleep before the kernel thread has been put to sleep (the first snippet). So for the example:

  • Setup the rings with IORING_SETUP_SQPOLL with a kernel thread idle timeout (sq_thread_idle) of 2s.
  • Do some I/O. In the case of the example make two requests and reap two completions.
  • Application thread goes to sleep.
    • Here I would expect that the kernel thread would go to sleep after 2 seconds. But it continues to spin at 100%.

My question is if my expectation is correct or if not then why?

@isilence
Copy link
Collaborator

Ok, I see what you mean.

I am wondering what should be the relation b/w the calling thread being awake when the kernel thread needs to go back to sleep

There is no relation.

In both your examples the kernel thread should go to sleep shortly. There is certainly a bug somewhere if it doesn't.
Do you have a test program? I'll take a look.

@sohaibiftikhar
Copy link
Author

Sure. Here is the test program I was testing off. If it helps my kernel version is 5.15.0-52-generic.
https://gist.github.com/sohaibiftikhar/826baa2813a71c46d19bb8939cbb51cb#file-io_uring_sq-c-L127
This is the complete example with the couple of lines of change that I made.
To verify CPU usage you can just run top with a grep on the process name or use htop with filtering for a very basic diagnosis.

@axboe
Copy link
Owner

axboe commented Nov 16, 2022

I wrote a small test program and ran it in 5.15-stable (5.15.19 to be specific), and it seems to behave like it should. I then ran your test program, and it also seems to behave like it should - the sqpoll thread goes to sleep after 2 seconds when it has no work left to do. I do have one box that's running 5.15.0-53-generic and I ran it there too and it behaved like it should too. I have no insight into the distro kernels, does it still keep spinning for you if you update to -53?

@sohaibiftikhar
Copy link
Author

Interesting. Ubuntu 20 doesn't seem to come with a stock version of the stable release. I am already sort of on the "bleeding edge"... I'm gonna patch it in a VM and try it out and report back.

@sohaibiftikhar
Copy link
Author

Okay so I repeated it with 5.15.0-53-generic and it seems to repeat itself. I am not sure why it did not reproduce on your end. I will try it also with a later kernel report back.
screenshare
I am using clang here but I can repeat it also with gcc.

@axboe
Copy link
Owner

axboe commented Jan 24, 2023

I think I inadvertently figured out what this is... Are you able to run self built 5.15-stable kernels? Or how can you test a fix?

@sohaibiftikhar
Copy link
Author

Awesome!
I should be able to test it yes. I'm away for a month or so from my linux PC though so this would, unfortunately, have to wait until then.

@sohaibiftikhar
Copy link
Author

Can you link me the commit for the 5.15 for testing.

@Yukigaru
Copy link

Interesting, what was that? Is the bug confirmed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants