-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VFS: read()/write() lock the vnode for too long #450
Comments
Gleb rightly pointed out that in the Cassandra benchmark, after the warmup all the data is cached in memory so read()s do not involve any actual I/O, and aren't supposed to block or be especially slow (just to copy the data from the ARC cache, which is pretty short). In that case, trying to hold the vnode lock for slightly shorter durations might not be very helpful. But it is definitely important when reads may block (/dev/console is an extreme example, but in this case it is probably a mistake that we continue to go through the VFS layer after the first open). |
It's possible that b5eadc3 should close this issue. UPDATE: Actually, the issue shouldn't be closed as the commit above only addresses ZFS. |
@raphaelsc, @nyh for the Cassandra case we have solved the issue - if so we should close this bug as it was created for cassandra - if for other cases there is a differenr issue letcs create an issue for that |
Please do not close this issue - there's nothing Cassandra-specific in it, and it is still very much an issue. |
Use: perf callstack tracepoint to list frequent callstacks for a tracepoint (from 'perf list')
The comment #504 (comment) could be helpful to track the attempt to address this issue and possibly re-use the original commit b5eadc3 and fix the bugs in it. |
As already noticed long ago in commit 907e633, our VFS implementation (copied from Prex) has a serious problem in vfs_file::read() and vfs_file::write(): These unduely hold the vnode lock for the entire duration of read() or write call.
In commit 907e633, we noticed one bad result of this fact: While one thread is blocked read()ing from /dev/console, another thread cannot write() to it! This is completely broken behavior, and the aforementioned commit just worked around it (not open /dev/console) instead of fixing it.
Today, the same bug is causing us slowdown in multi-cpu Cassandra runs: Multiple threads hold several fd's pointing to the same on-disk file, and doing lseek(); read() on those fd's concurrently. Because we hold the vnode lock throughout the entire read() call, we basically serialize the calls to read() instead of doing much of the work in parallel (and batching I/O).
The text was updated successfully, but these errors were encountered: