-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weride iouring writer #17
base: master
Are you sure you want to change the base?
Conversation
Thank you for your work. Some questionsWhile the benefit of an asynchronous How does the caller decide whether How does the caller decide about What happens if the calling code requests writing data at a higher pace than the fd allows? There should probably be some throttling mechanism. Maybe it should be based on the amount of memory kept for the asynchronous buffers, to avoid allocating an unbounded amount of memory in such case. Choosing a
|
Hi Marcin, Thank you for your patient. I read your comments carefully and thought about the details you pointed. Though I have not figured out the whole structure, I thought it is necessary to reply and discuss with you. Some questionsThe synchronous FdIoUringWriter also has some benefit. In io_uring, although we use the synchronous mode, it has less overhead than the normal IO (write/pwrite). Due to the design of io_uring, the system can save some overhead of memory copy and system call. In the write function, the kernel needs to copy the memory to complete operation. For example, if you want to write 100 MB to a file, the kernel will copy this 100 MB first. In contrast, io_uring can bypass this operation. In my test in other projects with io_uring, the io_uring could decrease the CPU utilization and system time by about 30% when comparing with normal IO in the same condition. Of course, the asynchronous mode has higher throughput than synchronous mode. But the problem here is that we have a reap thread to process the completion queue. This is an extra overhead for CPU. So, there is a tradeoff here. If you need higher throughput, choose asynchronous mode, if you want to optimize the overhead of CPU, synchronous mode is better. But no matter which mode you choose, the speed and overhead of io_uring will be better than the normal IO. As for registering fd. It is a mechanism in io_uring. The io_uring allow us to register some fds in advance, so that the kernel could bypass the step of copying fd reference when do IO operation. The effect of this will be obvious for high IOPS workloads. I set this function because I want to write a more general class for io_uring. But your opinion is correct, in consideration of the instance is associated with a fixed file, we can set it unconditionally. The caller can choose the size from 1 to 4096, every number which is the power of 2 will be valid. This number is the size of the submission queue. In some extreme situation, when the submission queue is full, you cannot get a new sq element from the function io_uring _get_sqe. As I mentioned, when the submission queue buffer is full, we will get a nullptr when we call io_uring_get_sqe. This will happen when the speed of user calling requests exceeds the kernel a lot, and the submission cannot catch up with your expectation. For example, you want to submit 10 sqe, but the return value of io_uring_submit is only 4 because the fundamental layer has no more space for 6 extra elements. These 6 sqes will be remained in submission queue. It is necessary for us to design a throttling mechanism. There are two simple options, the first is that we just throw this operation, and the second is that we do a busy loop to submit the element continuously when the return value for io_uring_get_sqe is nullptr, until that we get a enough space. In the second option, the writer will be blocked for a while to control the throughput. Choosing a writer at runtimeYes, I think io_uring is a very well-done feature after kernel 5.10. I also think option 1 is simple and acceptable. Avoiding copying the bufferI will need more time to read the code and think about this part, will reply you later. MiscellaneousThank you. I will fix these two parts. But could you please explain more about the technical reasons? Why is the std::mutex inappropriate here? Besides, what do you mean “a run time switch”? If it refers to check the environment and decide which io we will apply automatically, I think it is a useful and necessary feature. |
I wonder why the kernel copies the data first in Threading
Chosing the
|
HI @QrczakMK
kernel will do the copy:
|
Yunhua will give you a specifica explain about the write function. By the way, do you have any opinions about the throttling mechanism in last comment? Avoiding copy the bufferYour suggestion is good. We can implement a new custom buffer which will only free the space after the asynchronous writing finish. And the string_view should be copied to the buffer unconditionally regardless of the size. In this case, an extra copy could be avoided. As for the Cord or the Chain, the io_uring has the API for writev, which can provide support if you need. Chosing the Writer at runtimeI think 3b is important. The user should know that the io_uring is a feature of Linux. So, we don't need to consider the situation the user wants to use it on other platforms. What we should do is guaranteeing that the system will back up to normal IO if the kernel is not available for io_uring. I think the structure you mentioned in last comment (RecordWriter<std::unique_ptr>) is appropriate. |
Copying by the kernelIf One of the reasons I asked this is that if only asynchronous mode is important, then I wanted to name this class ThrottlingI think there should be a limit of the amount of data being buffered for asynchronous writing. But at least in some circumstances a reasonable limit seems to be These circumstances do not necessarily apply all the time. Maybe the client wants to write at a fast pace now, and will rest later. In this case it might be better to allow for a larger queue, which will eventually be flushed. I am not sure how this should be handled, and what should be the default. The default might depend on Choosing the
|
I think this is a good question. I may not be able to give perfect answer. I guess most important is compatibility. as you can see, for io_uring API, it requires different setup/cleanup API call. historically before io_uring, we do have another LinuxAio exist, sometimes referred as "Linux Native AIO". |
Hi Marcin,
Hope your work goes well.
This is our first version of io_uring writer. Files related to io_uring are in /riegeli/iouring directory. This version is an attempt and there is some room for improvement. I have some questions and want to discuss with you. Thank you very much!
In this version, I created a separate writer named fd_io_uring_writer to manage the io_uring instance. Each fd_io_uring_writer class has its own seperate io_uring instance. The user can choose to use io_uring or not manually. But the final feature I want is merging the io_uring to fd_writer. The system will check the environment automatically, if the io_uring is available, it will apply io_uring, otherwise it will go through the old one. The difficulty now is that the byte writer is a parameter of the record writer template. So I have no idea how to organize the structure here.
Besides, the io_uring instance has two mode, syn and async. In async mode, the function will return immediately and report the error later. In this case, we need to maintain the writing buffer until the result is returned. At this time, I copy the buffer one more time to avoid changing the structure of your buffer class. But this overhead can be eliminated. Maybe we can write a new buffer to meet this goal?
Thank you for your patience. If you have any suggestion about the code or above questions, please feel free to reach out me.