-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High IO Wait CPU usage #2729
Comments
duplicate of #2181 #2270 #2287 and #2444 For the explanation please, see #2270 (comment) Based on these discussions, I understand that high IOWAIT is not related to high cpu usage, it's part of the CPU IDLE time, i.e. it's the time spent by a thread being blocked on (networking and disk) I/O, so if you need to monitor high cpu usage, you may consider ignoring IOWAIT contributions for that. |
And yes, |
Also related:
I understand that this is "working as intended" and we should not be worried. It is a bit unfortunate that this "newish" way of doing async IO is not well understood / worked out yet, and that monitoring tools assume that high IO Wait is bad. I'm currently wading through info about this new topic and hoping to find how we can best resolve this. Any suggestions are welcome. Small nit: the documentation doesn't really explain these tradeoffs and the cli docs only say what arguments exist, not how they affect DragonFlyDB. I found out about io_uring and more details via this code:
|
@hermanbanken where did you search for the documentation ? |
I did search the documentation site:
|
And thanks for digging this - it's a great discussion where a rational logic tries to overcome tradition and of course looses. Jen's response summarises it all:
Once the kernel will be released with this commit, dragonfly will automcatically revert to its "normal" behavior on that version :) |
This is my second biggest issue I had with Dragonfly. Sorry I'm out now: MbinOrg/mbin#641 And I do believe a high io wait can have performance impact by the cpu scheduler. Even if I'm wrong about that, I still would like to have a low IOWait, so I know if there are problems on disks IO (which also causes high IOWait in some cases). |
@romange I'm running Dragonfly 1.23.0 on Ubuntu 24.04 with kernel 6.8.0-1016-aws with default options. According to aforementioned links this If I will enable |
|
i do not think it has been fixed - see axboe/liburing#943 (comment) |
Cool, thank you for details. I'll exclude iowait from our alerts and let's see then if liburing will release with fix (according to the last comment you mentioned they're testing it). |
Describe the bug
Slightly related to #66 but different, we see that when a GKE node VM runs (any or more) DragonFlyDB container, the CPU stats contain a very high "IO Wait", up to 100%. When all DragonFlyDB containers are stopped, this drops back to 0%. It seems like DragonFly consumes all of the remaining CPU with IO wait.
Our DataDog monitoring for the CPU usage of our nodes has been alerting for a few weeks now, and we dismissed it as we saw no high CPU usage (not including IO Wait) in any of our containers, but it makes the automatic monitoring completely useless as we can not rely on the CPU usage to be low. It seems to be impossible to see the IO wait per process in Linux.
We'd like to understand and prevent DragonFlyDB from causing 100% IO Wait. What is DragonFlyDB even using IO for if this is all stored in memory?
To Reproduce
Steps to reproduce the behavior:
vmstat 1
outputs >80 values in theWA
column.top
outputs >80 values in the wa fieldExpected behavior
Low CPU usage on idle systems.
Screenshots
Environment (please complete the following information):
Linux hostname 5.15.133+ #1 SMP Sat Dec 30 13:01:38 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
To workaround
Adding
--force-epoll
seems to avoid the issue.The text was updated successfully, but these errors were encountered: