-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code cleanup and fix #1 #2
Conversation
Thanks for taking the time! I will merge when I have some time to look at it. Strange that we have -20% performance. |
I get about -75% performance when not sending redundant text. |
No, I don't. But
I will test
Really? |
I'd like to contribute my (customized) code to liburing examples, it could be useful for benchmarking for the creator of io_uring IMO. Nevertheless it's modified from your repo. Would you agree with that? |
|
||
struct io_uring_sqe *sqe = io_uring_get_sqe(ring); | ||
iovecs[fd].iov_len = size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason this line (iovecs[fd].iov_len = size;
) causes a ~70% performance drop:
without iovecs[fd].iov_len = size;
:
max@ubuntu:~/hdv/source/rust_echo_bench$ cargo run --release -- --address "localhost:6666" --number 50 --duration 30 --length 512
Finished release [optimized] target(s) in 0.01s
Running `target/release/echo_bench --address 'localhost:6666' --number 50 --duration 30 --length 512`
Benchmarking: localhost:6666
50 clients, running 512 bytes, 30 sec.
Speed: 387685 request/sec, 387685 response/sec
Requests: 11630554
Responses: 11630554
with iovecs[fd].iov_len = size;
:
max@ubuntu:~/hdv/source/rust_echo_bench$ cargo run --release -- --address "localhost:6666" --number 50 --duration 30 --length 512
Finished release [optimized] target(s) in 0.01s
Running `target/release/echo_bench --address 'localhost:6666' --number 50 --duration 30 --length 512`
Benchmarking: localhost:6666
50 clients, running 512 bytes, 30 sec.
Speed: 124698 request/sec, 124698 response/sec
Requests: 3740948
Responses: 3740947
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lack of this line is causing to wrong behavior.
readv does not modify iov_len
, which will cause writev to always send 2048 bytes.
Since bufs
is uninitialized, this may cause UB.
Ah, this seems to be expected behavior. It's marvellous, I'm interested in this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lack of this line is causing to wrong behavior.
readv does not modify
iov_len
, which will cause writev to always send 2048 bytes.
Sincebufs
is uninitialized, this may cause UB.Ah, this seems to be expected behavior. It's marvellous, I'm interested in this reason.
It might have something to do with the used test tool. I see the same performance jump/drop with the epoll echo server.
It's however odd that the performance drops with 70-80% in the case of io_uring and 50% with epoll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, rust_echo_bench
expects to return data of same length, and return too much data will cause shorter read times.
see https://github.com/haraldh/rust_echo_bench/blob/master/src/main.rs#L126
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed that io_uring-echo-server
did not implement full writes, which could lead to loss.
see
https://github.com/frevib/io_uring-echo-server/blob/master/io_uring_echo_server.c#L162
https://github.com/quininer/io-uring/blob/master/examples/tcp_echo.rs#L211
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks.
I might keep this echo server as simple as possible. The goal is to let people learn people the basics of io_uring. Maybe handling partial writes is a bit too much ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still don't know what does a partial write mean. Could you explain it further?
EDIT: did you mean that situations that message lengths are greater then MAX_MESSAGE_LEN
?
Also please see my fork repo, I fixed the issue there, and got some banchmark results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, are these benchmark tools really reliable? How can you ensure that it's not the tool that limits the performance results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still don't know what does a partial write mean. Could you explain it further?
EDIT: did you mean that situations that message lengths are greater then
MAX_MESSAGE_LEN
?
Partial write happens when we try to write more bytes to the socket send buffer than is actually free in this buffer. We write until the buffer is full and then return, writing only a part of the all the bytes. If the receiving end expects the full message to return, this could lead to odd behaviour.
Also please see my fork repo, I fixed the issue there, and got some banchmark results.
Nice, I’ll have a look asap!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, are these benchmark tools really reliable? How can you ensure that it's not the tool that limits the performance results.
No, they are not. @quininer found out there’s a problem when returning more bytes than the amount of bytes that was received: #2 (comment)
This would make the epoll echo server ~70% more performant than the io_uring echo server, on my machine. This also doesn’t make any sense :)
Absolutely, go for it :) |
Agree. On another note I've had the problem that when using
Yes, see #2 (review) |
Negative value returned by
I saw your twitter.
Wasn't the performance boost too huge? ;) |
If I remember correctly -22 and for and for a different echo server -14. I can rollback the kernel and try.
Yes, unrealistically huge. I can't believe it's only |
|
I got about 20% performance reduction for not sending redundant text, details unknown