Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Failures on TCP Stress Testing: (libuv) kqueue(): Too many open files and more #12666

Closed
ehsanul opened this issue Mar 3, 2014 · 8 comments · Fixed by #12823
Closed

Comments

@ehsanul
Copy link
Contributor

ehsanul commented Mar 3, 2014

I'm on OSX 10.8.4, x86_64, latest Rust on master. And to head off any concerns about my file handle limits:

$ ulimit
unlimited

I ran into this when stress-testing my web sockets library. After just a couple hundred connections, some connections were being refused at random. The only thing I could see was some output with RUST_LOG=debug saying (libuv) kqueue(): Too many open files. I've come up with a smaller test case that brings this out with some spectacular errors:

use std::io::net::tcp::{TcpListener, TcpStream};
use std::io::net::ip::{Ipv4Addr, SocketAddr};
use std::io::{Acceptor, Listener};
use std::io::IoResult;

fn main() {
    let addr = SocketAddr { ip: Ipv4Addr(127, 0, 0, 1), port: 8989 };
    spawn(proc() {
        let mut acceptor = TcpListener::bind(addr).listen();
        loop {
            let mut stream = match acceptor.accept() {
                Ok(stream) => stream,
                Err(error) => {
                    debug!("accept failed: {:?}", error);
                    continue;
                }
            };
            println!("{:?}", stream.read_byte().unwrap());
        }
    });
    for _ in range(0, 1000) {
        spawn(proc() {
            match TcpStream::connect(addr) {
                Ok(stream) => {
                    let mut stream = stream;
                    stream.write([1]);
                    let mut buf = [0];
                    stream.read(buf);
                },
                Err(e) => debug!("{:?}", e)
            }
        });
    }
}

Fails less often with 100-200 connections. The errors are different every time I run this with RUST_LOG=debug.

Here's the end of one run:

(libuv) kqueue(): Too many open files
bug(54921,0x10881c000) malloc: *** error for object 0x7fca4b620f70: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
std::io::IoError{kind: OtherIoError, desc: "too many open files", detail: None}
task '<unnamed>' failed at 'assertion failed: `(left == right) && (right == left)` (left: `140506823655696`, right: `140506824839680`)', /Users/harish/rust/rust/src/librustuv/homing.rs:110
(libuv) kqueue(): Too many open files
std::io::IoError{kind: OtherIoError, desc: "too many open files", detail: None}
task '<unnamed>' failed at 'didnt want an unwind during: going home', /Users/harish/rust/rust/src/librustuv/lib.rs:199
Abort trap: 6

Another:

task '<unnamed>' failed at 'didnt want an unwind during: going home', /Users/harish/rust/rust/src/librustuv/lib.rs:199
task failed during unwinding (double-failure - total drag!)
rust must abort now. so sorry.
Segmentation fault: 11

Another:

task '<unnamed>' failed at 'didnt want an unwind during: going home', /Users/harish/rust/rust/src/librustuv/lib.rs:199
task failed during unwinding (double-failure - total drag!)
rust must abort now. so sorry.
Illegal instruction: 4

And lastly:

fatal runtime error: leaked managed memory (140518327703008 objects)
(libuv) kqueue(): Too many open files
task '<unnamed>' failed at 'didnt want an unwind during: going home', /Users/harish/rust/rust/src/librustuv/lib.rs:Abort trap: 6
@emberian
Copy link
Member

emberian commented Mar 3, 2014

cc @alexcrichton

@alexcrichton
Copy link
Member

Oh dear that's bad, thanks for the small issue! I will definitely investigate.

@bharrisau
Copy link
Contributor

What is the result of ulimit -n though? Just calling ulimit is only the maximum file size (unless OSX is different).

@ehsanul
Copy link
Contributor Author

ehsanul commented Mar 4, 2014

@bharrisau you're right!

$ ulimit -n
256

So I set it to a higher value and things seem fine now. Sorry about that. Though, It's still a surprising way for the runtime to fail, and clearly something is still wrong there.

@alexcrichton
Copy link
Member

Despite OSX having small file descriptor limits, this is a serious bug that I'd like to address with the runtime. Something funny is going on.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 10, 2014
This commit fixes a small bug in the green scheduler where a scheduler task
calling `maybe_yield` would trip the assertion that `self.yield_check_count > 0`

This behavior was seen when a scheduler task was scheduled many times
successively, sending messages in a loop (via the channel `send` method), which
in turn invokes `maybe_yield`. Yielding on a sched task doesn't make sense
because as soon as it's done it will implicitly do a yield, and for this reason
the yield check is just skipped if it's a sched task.

I am unable to create a reliable test for this behavior, as there's no direct
way to have control over the scheduler tasks.

cc rust-lang#12666, I discovered this when investigating that issue
@alexcrichton
Copy link
Member

With #12818 applied, I am unable to reproduce this on OSX. I am not convinced that it fixes anything, but I'm no longer seeing any scary assertions.

@alexcrichton
Copy link
Member

I take that back, I was using libnative, not libgreen (silly me)

@alexcrichton
Copy link
Member

I attempted to debug this issue on linux, and I quickly ran into two problems. The first is an assertion fixed by #12818, and the other is filed as joyent/libuv#1189. Sadly none of this seems related to the assertions on OSX, so I'll continue to investigate there. If only OSX still had valgrind...

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 12, 2014
This commit fixes a small bug in the green scheduler where a scheduler task
calling `maybe_yield` would trip the assertion that `self.yield_check_count > 0`

This behavior was seen when a scheduler task was scheduled many times
successively, sending messages in a loop (via the channel `send` method), which
in turn invokes `maybe_yield`. Yielding on a sched task doesn't make sense
because as soon as it's done it will implicitly do a yield, and for this reason
the yield check is just skipped if it's a sched task.

I am unable to create a reliable test for this behavior, as there's no direct
way to have control over the scheduler tasks.

cc rust-lang#12666, I discovered this when investigating that issue
bors added a commit that referenced this issue Mar 13, 2014
If a TTY fails to get initialized, it still needs to have uv_close invoked on
it. This fixes the problem by constructing the TtyWatcher struct before the call
to uv_tty_init. The struct has a destructor on it which will close the handle
properly.

Closes #12666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants