Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file descriptor leak #3711

Closed
manio opened this issue Apr 17, 2021 · 6 comments
Closed

file descriptor leak #3711

manio opened this issue Apr 17, 2021 · 6 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-fs Module: tokio/fs

Comments

@manio
Copy link

manio commented Apr 17, 2021

Version
tokio-fd-leak v0.1.0 (/home/tokio-fd-leak)
└── tokio v1.5.0
└── tokio-macros v1.1.0 (proc-macro)

Platform
Linux x86_64

Description
Hi!
I think I found an interesting bug in tokio. The problem is that in some circumstances (when the device file is waiting for data) the tokio::timeout for read is not releasing/closing the file descriptor.
I can see that the future timeouts with error, but the descriptor is not closed leading to problems after fd exhaustion

I prepared a minimal sample code to reproduce the problem.
Building/steps to reproduce:

  1. First we need to build and load a sample kernel driver which is necessary to show the problem:
git clone https://github.com/manio/wait.git
cd wait
make
insmod ./wait.ko
#this will create /dev/mychar0 for debugging purposes
  1. Now clone my rust/tokio sample program:
git clone https://github.com/manio/tokio-fd-leak.git
cd tokio-fd-leak
cargo build
./target/debug/tokio-fd-leak

I expected to see this happen: timeout is closing descriptor for /dev/mychar0

Instead, this happened:
the program is running like this:

/dev/mychar0: Opening device
/dev/zero: Opening device
/dev/mychar0: device opened successfully...
/dev/mychar0: before read...
/dev/zero: device opened successfully...
/dev/zero: before read...
/dev/zero: read 1 byte(s)
/dev/mychar0: response timeout: deadline has elapsed
----> the descriptor is not closed!!!
/dev/mychar0: Opening device

When you look at open descriptors:
ls -la /proc/`pidof tokio-fd-leak`/fdinfo|wc -l
it is constantly raising until number of 522
Then the problem is that it timeouts even when opening any new descriptors from that time:

dev/zero: Opening device
/dev/mychar0: Opening device
/dev/mychar0: file open timeout: deadline has elapsed
/dev/zero: file open timeout: deadline has elapsed
@manio manio added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Apr 17, 2021
@Darksonn Darksonn added the M-fs Module: tokio/fs label Apr 17, 2021
@Darksonn
Copy link
Contributor

If you drop a tokio::fs::File while there is any in-flight operation, it will wait for that operation to finish before it closes the file descriptor. If the operation never finishes, the file descriptor is therefore not closed. Files behave in this manner because you cannot use epoll with ordinary files, which forces us to put the file IO on a separate thread pool with ordinary blocking calls.

Ultimately when we use epoll, there's nothing we can do about this. The timeout did not cancel the operation because there is no way to cancel it.

Regarding your use of thread::sleep, that's a really bad idea in async code because it blocks the thread. Don't do that.

@manio
Copy link
Author

manio commented Apr 17, 2021

@Darksonn
Thanks, and got your point about thread::sleep...
To sum it all - is there a way for workaround this problem? My production application is just first trying to open a file and then make a read call. If it timeouts (on both stages) it is just trying again (it assumes something is wrong with the device).

So do I have to close the descriptor somehow with some unsafe raw_fd? or maybe something else - rework the code flow somehow?

@Darksonn
Copy link
Contributor

Darksonn commented Apr 17, 2021

I don't think there is really anything we can do here. If you close the fd yourself, you get a double close when you drop the Tokio File object. If you skip the destructor with mem::forget, you have a memory leak instead.

Maybe we could provide an operation for force closing the fd now even if an operation is in process, but this would be vulnerable to race conditions where you e.g. close it in one thread before the other thread starts its operation on the fd, meaning that it could have been reassigned to some other resource in the meantime.

What kind of files are you reading from that block forever? If they are device files that support epoll, consider using AsyncFd instead of tokio::fs::File.

@manio
Copy link
Author

manio commented Apr 17, 2021

@Darksonn
I have a device connected to serial port converter exposed with cdc_acm and /dev/ttyACM0. It is a boiler to which I am sending a request packet and reading a response in this loop. Generally it is working ok, but when the device suddenly disconnect for whatever reason I was just trying to make the code reliable and try to re-open device it in a loop.
... and I encountered the described fd leak problem.

@Darksonn
Copy link
Contributor

Darksonn commented Apr 17, 2021

Right, it sounds like you shouldn't be using tokio::fs::File for this. Serial ports generally support epoll, so if change your code to read from it via AsyncFd, the problems should go away as epoll operations are easily cancelled.

I know that AsyncFd can be difficult to use, but unfortunately the tokio-serial crate is outdated.

@manio
Copy link
Author

manio commented Apr 17, 2021

Thanks, will dig into this...
Please close the issue.

manio added a commit to manio/hard that referenced this issue Apr 20, 2021
The remeha device handle was not properly desctructed when there was
some in-flight operation on the file.
Changed to use the AsyncFd on std::fs::File instead.

References:
tokio-rs/tokio#3711
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-fs Module: tokio/fs
Projects
None yet
Development

No branches or pull requests

2 participants