Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io-wait used 100% cpu after 6.1.39 and 6.5.1 #943

Open
beldzhang opened this issue Sep 4, 2023 · 23 comments
Open

io-wait used 100% cpu after 6.1.39 and 6.5.1 #943

beldzhang opened this issue Sep 4, 2023 · 23 comments

Comments

@beldzhang
Copy link

after this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.1.39&id=f32dfc802e8733028088edf54499d5669cb0ef69
a running io-uring object will cause one cpu showing 100% usage of io-wait, in my environment 8 rings/threads make 8 cpu got 100% io-wait usage. beside this nothing else found.

try to revert this commit from 6.1.39 then everything is ok.
6.1.51/6.5.1 it is the same, can not direct revert this commit so not tested yet.

all following test got the same performance result:
6.1.38
6.1.39
6.1.39(reverted)
6.1.51

re-produce:
use this testing program: https://github.com/axboe/liburing/files/9571382/issue-643-v2.zip, build and just run server, it will showing io-wait 100% on one cpu.

@redbaron
Copy link

redbaron commented Sep 5, 2023

isn't IO wait time is what you want to see? That is time app is blocked waiting on io (at least it was the case for file IO), but CPU has nothing else to do, so that time is accounted as IO wait.

@isilence
Copy link
Collaborator

isilence commented Sep 5, 2023

Note that io-wait doesn't burn CPU cycles, it's sleeping and so it's not a problem apart from reporting. There was a change doing that, and I think it actually makes more sense reporting a task waiting for io_uring completions as io-wait

@axboe
Copy link
Owner

axboe commented Sep 5, 2023

Yes this is expected. iowait literally just means "waiting on IO", which is what the task is doing. It does NOT mean that it's busy 100% of the time, in fact if you have 100% iowait it means 1 task is sleeping waiting on IO 100% of the time.

@beldzhang
Copy link
Author

beldzhang commented Sep 5, 2023

Yes this is expected. iowait literally just means "waiting on IO", which is what the task is doing. It does NOT mean that it's busy 100% of the time, in fact if you have 100% iowait it means 1 task is sleeping waiting on IO 100% of the time.

noticed that, when this happen, system load is only 0.0x

we are using io-uring in a storage service, so is very sensitive about the storage/network load, previously io-wait is a good indicator to check this.
and before kernel 5.0, iostat's %util value can be used also, but after 5.0, this number will easily up to 100% even on a small load. many articles also said this number is not reliable.
is there any other way to check the disk load?

and I calculate a performance score after each test based on total cpu usage, just ignore the io-wait part looks like not a good solution...

@beldzhang
Copy link
Author

looks no more comments, closed.

@beldzhang beldzhang closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2023
@romange
Copy link
Contributor

romange commented Nov 16, 2023

@axboe, one of Dragonfly's users, also reported this as a behavioural change: Dragonfly, which does not use disk IO, bumps up the IOWAIT metric to 100%. If they run it using the epoll API, it does not affect IOWAIT. I am just double checking whether this change is indeed intended.

@rickytato
Copy link

I see same issue with Kernel 6.5.11 (on Proxmox)
image

@beldzhang
Copy link
Author

emmm... reopen?

@RX14
Copy link

RX14 commented Feb 22, 2024

IOwait has been traditionally thought of as "waiting for disk io", which will always complete. Since io_uring can be used to wait on the network, which has unbounded waiting time, it changes the metric considerably. For example, many monitoring systems have alerts for iowait being high, correctly or not assuming it to be a proxy for disk contention.

@axboe axboe reopened this Feb 22, 2024
@axboe
Copy link
Owner

axboe commented Feb 22, 2024

Here's what I think we should do:

  1. Default to not using iowait, as it is indeed somewhat confusing for networked or mixed network/storage workloads. I do think iowait is an awful metric that makes very little sense for async workloads, even just pure storage based ones. Lots of consumers will assume it's busy time, or has a direct correlation with disk usage, which is just wrong.
  2. Add an IORING_ENTER_IOWAIT flag that can be used in conjunction with IORING_ENTER_GETEVENTS. If set, iowait will be used. Storage can use this, if they so wish.
  3. Add an IORING_FEAT_IOWAIT flag, which tells the app/liburing that this feature is available.
  4. Add liburing helpers, ala io_uring_set_iowait() and io_uring_clear_iowait(), which can be used to toggle this flag, if IORING_FEAT_IOWAIT is set. Storage based workloads can set this.

And that should be it. That gives the app control over whether iowait should be used or not.

@beldzhang
Copy link
Author

will test when avaliable

@isilence
Copy link
Collaborator

@axboe, I just mentioned it in the mailing list, but even though I don't understand why people are taken aback by hi iowait from io_uring waiting, but I think we should just revert that change, there has been too many reports from different people regarding this one. We should be able to do the optimisation that was the reason for the change without reporting iowait.

@axboe
Copy link
Owner

axboe commented Feb 24, 2024

We can't just revert it, as it solved a real problem. I have my doubts that we can separate the cpufreq side from iowait in a way that would make the scheduler side happy. If we can, I'd be all for it, and would love to see a patch.

@romange
Copy link
Contributor

romange commented Mar 15, 2024

A great discussion about this topic on lore.kernel.org.

Just for us to understand, once we call io_uring_register_iowait, it will flag networking I/O as iowait but iouring will run in more efficient manner?

Another interesting comment I read is about multiple rings. Currently https://github.com/romange/helio has ring-per-thread architecture. @axboe are you saying that sometimes it makes sense to have two rings? For what use-cases it makes sense?

@isilence
Copy link
Collaborator

A great discussion about this topic on lore.kernel.org.

Just for us to understand, once we call io_uring_register_iowait, it will flag networking I/O as iowait but iouring will run in more efficient manner?

The long story. There is a patch upstream since a while ago which does two unrelated things: first it enables some cpu governor optimisation useful for QD1 and not only, and it also changes the io-wait stat behaviour as per this thread. They're coupled together for implementation reasons, it's much easier going this way. So, the optimisation is already in the kernel and always enabled, let's say it's a free lunch. Now, that io_uring_register_iowait() patch would disable the optimisation by default and turn it back on only if you call the function.

I have to say that it's quite a horrendous approach, having side effects from seemingly an optimisation, mixing responsibilities and levels at what the feature enabled and the iowait stat is observed, and so on. I think the register_iowait patch should never be given the light, at least as far as it mixes things together.

@isilence
Copy link
Collaborator

Another interesting comment I read is about multiple rings. Currently https://github.com/romange/helio has ring-per-thread architecture. @axboe are you saying that sometimes it makes sense to have two rings? For what use-cases it makes sense?

IMHO, it doesn't make sense apart maybe from some IOPOLL + normal ring weird cases. However, sometimes it happens (unfortunately). For instance, when a library / framework you use has some io_uring support inside, and then the app creates another rings for its own purposes.

@axboe
Copy link
Owner

axboe commented Mar 15, 2024

There will be no register iowait, the current pending fixes are here:

https://git.kernel.dk/cgit/linux/log/?h=iowait.2

and will be posted for review soon, so they can get into the 6.10 kernel.

@beldzhang
Copy link
Author

https://git.kernel.dk/cgit/linux/log/?h=iowait.2

@axboe
brief tested, io-wait is gone, will following up. mailing list followed also.

@isilence
to the end users, they are sensitive to the latency and high response time of server, to sys admin, io-wait and load are directly showing the situations. generally storage parts is slowest in whole system, user/admin didn't care about the waiting of io read/write is sync or async, they just want to know how much loading of the entire server
I already remove iostat %util displaying because it's non-sense anymore. but 100% iowait of io-uring on cpu, is terrify a lot of users/admins.

@beldzhang
Copy link
Author

ready for testing, for-6.10/io_uring? for-6.10/block or for-next? thanks.

@beldzhang
Copy link
Author

emmm.... any updates?

@solarvm
Copy link

solarvm commented Jul 22, 2024

still happening for us too

@isilence
Copy link
Collaborator

Nothing was merged yet, as it's a low priority reporting issue. However, there is interest in that for some other reasons, and it's in the backlog. will get picked up hopefully soon.

@beldzhang
Copy link
Author

iowait.4 branch tested, no iowait usage. detailed test pending, no regression for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants