Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections might be silentrly not served #59

Closed
redbaron opened this issue Jul 10, 2016 · 8 comments
Closed

Connections might be silentrly not served #59

redbaron opened this issue Jul 10, 2016 · 8 comments
Assignees
Labels

Comments

@redbaron
Copy link
Contributor

redbaron commented Jul 10, 2016

On current master (7d12511), simple echo server might "forget" to serve response to certain clients, no erros are reported on server side, tcp connection is not dropped either.

This is very racey and you might need to play with params to trigger it more or less reliably on your machines. Params I am using in my Core i3 M370 (2 cores + 2 HT):

server:

cd benchmark; cargo run --bin coio-tcp-echo-server --release -- --bind 127.0.0.1:3000 -t 2

client:

cd benchmark; GOMAXPROCS=1 go run tcp_bench.go -c 3 -t 10 -a "127.0.0.1:3000" -l 4096 -d 10s

New -d param specifies read/write timeout and was added in #58

Output:

2016/07/10 14:01:52 read tcp 127.0.0.1:40988->127.0.0.1:3000: i/o timeout
2016/07/10 14:01:52 read tcp 127.0.0.1:40990->127.0.0.1:3000: i/o timeout
Benchmarking: 127.0.0.1:3000
3 clients, running 4096 bytes, 30 sec.

Speed: 18879 request/sec, 18879 response/sec
Requests: 566395
Responses: 566393

It is not happening on every run, but I found that it happens more often when client opens less connections.

UPD1: it is not a performance issue, set timeout to "60s", you'll see that test completes, no more active communication happens, yet few connections remain waiting to read data back indefinitely.

@zonyitoo
Copy link
Owner

zonyitoo commented Jul 10, 2016

Hmm, bug confirmed. But when I tried to run it in debug mode, and I found it works pretty well

Benchmarking: 127.0.0.1:3000
3 clients, running 4096 bytes, 10 sec.

Speed: 11870 request/sec, 11870 response/sec
Requests: 118703
Responses: 118703

and if I run it with --release, then all of them will end up timedout:

2016/07/10 23:05:51 read tcp 127.0.0.1:49317->127.0.0.1:3000: i/o timeout
2016/07/10 23:05:51 read tcp 127.0.0.1:49318->127.0.0.1:3000: i/o timeout
2016/07/10 23:05:51 read tcp 127.0.0.1:49316->127.0.0.1:3000: i/o timeout
Benchmarking: 127.0.0.1:3000
3 clients, running 4096 bytes, 10 sec.

Speed: 4400 request/sec, 4399 response/sec
Requests: 44000
Responses: 43997

Update: Oh! It will also happen in debug mode!

@zonyitoo
Copy link
Owner

zonyitoo commented Jul 10, 2016

Interestingly ... The problem is: read() returns WouldBlock, and then Mio will not trigger any events until tcp_bench gets timeouts:

## Skips...
TRACE:coio::net: GenericEvented(Token(1)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(1)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(1)): read() => WouldBlock
TRACE:coio::net: GenericEvented(Token(1)): wait(Readable)     <--- Client 1 starts to be blocked
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(3)): read() => Ok(4096)
## Skips...
TRACE:coio::net: GenericEvented(Token(3)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(3)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(3)): read() => WouldBlock
TRACE:coio::net: GenericEvented(Token(3)): wait(Readable) <--- Client 3 starts to be blocked
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): read() => WouldBlock
## Skips ....
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): read() => WouldBlock
TRACE:coio::net: GenericEvented(Token(2)): wait(Readable) <--- Client 2 starts to be blocked

###  Nothing goes here, and then tcp_bench reports timeouts

TRACE:coio::net: GenericEvented(Token(1)): read() => Ok(4096)  <-- Client 1 is waken up
TRACE:coio::net: GenericEvented(Token(1)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(1)): read() => Ok(0)
TRACE:coio::net: GenericEvented(Token(3)): read() => Ok(4096) <-- Client 3 is waken up
TRACE:coio::net: GenericEvented(Token(3)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(3)): read() => Ok(0)
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(4096) <-- Client 2 is waken up
TRACE:coio::net: GenericEvented(Token(2)): write() => Ok(4096)
TRACE:coio::net: GenericEvented(Token(2)): read() => Ok(0)

It would be:

  1. Mio haven't trigger the event when the socket is readable (not likely)
  2. Coroutines haven't been waken up when the socket is readable

@redbaron
Copy link
Contributor Author

Tried with https://github.com/carllerche/mio.git#c3069801, problem still exists.

Although, after making this change:

diff --git a/benchmarks/Cargo.toml b/benchmarks/Cargo.toml
index cf65c7f..20006bf 100644
--- a/benchmarks/Cargo.toml
+++ b/benchmarks/Cargo.toml
@@ -7,7 +7,7 @@ authors = ["Y. T. Chung <[email protected]>"]
 bytes = "0.3"
 clap = "2.1"
 #hyper = "*"
-mio = "0.5"
+mio = { git =  "https://github.com/carllerche/mio.git" }
 mioco = "0.7"

I see multiple versions of many crate in output of cargo run:

   Compiling bitflags v0.4.0
   Compiling bitflags v0.3.3
   Compiling bitflags v0.5.0
   Compiling nix v0.5.0-pre (https://github.com/carllerche/nix-rust?rev=c4257f8a76#c4257f8a)
   Compiling nix v0.6.0
   Compiling nix v0.5.1
   Compiling slab v0.2.0
   Compiling slab v0.1.3
   Compiling slab v0.1.3 (https://github.com/carllerche/slab.git?rev=44f9f41a1680e69db7d370d1912898fb0f90b1f8#44f9f41a)
   Compiling mio v0.5.0 (https://github.com/carllerche/mio.git#c3069801)
   Compiling mio v0.5.1

So am I not sure whether mio master was used at all :)

zonyitoo added a commit that referenced this issue Jul 11, 2016
@zonyitoo
Copy link
Owner

zonyitoo commented Jul 11, 2016

I found a bug in the event notifier (Condvar), which will drop the event notification if no coroutine is waiting on it. Please try again, this bug should be fixed.

@zonyitoo zonyitoo added the bug label Jul 11, 2016
@zonyitoo zonyitoo self-assigned this Jul 11, 2016
@redbaron
Copy link
Contributor Author

I can confirm that no timeouts happening anymore.

Performance is also superb, with slowdown being just 8%:

mio 84000 req/s
coio(1 thread) 78000 req/s

Although achieved at the cost of much higher CPU utilisation:

Average:      UID       PID    %usr %system  %guest    %CPU   CPU  Command
Average:     1000     24915   19.23  112.68    0.00  131.91     -  coio-tcp-echo-s`
Average:     1000     25130    7.55   88.89    0.00   96.44     -  mio-tcp-echo-se

@zonyitoo
Copy link
Owner

There are still lots of work to do, :)

@redbaron
Copy link
Contributor Author

redbaron commented Jul 11, 2016

That is kinda worst case scenario - no userspace work is done here. in more realistic scenarios relative overhead will go down, while benefits of multicore processing will shoot up.

Bug like this scary though, can be missed easily, we were lucky to be able to trigger it relatively easy. Would it be possible to create a test case for problems like that? Custom condvars, spinlocks, notifies - all of that might break simply because just 2 pair of eyes are looking into them, it doesn't add confidence I admit :(

EDIT: grammar

@zonyitoo
Copy link
Owner

Hmm, the current implementation is just a temporary solution, we wanted to make it work just like Go's Cond implementation but failed to compile. So I committed to make it compile and work without any self-tests. That's my fault.

We will continue working on this, but currently we are focus more on #56 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants