Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A memory leak in Windows #919

Closed
oherrala opened this issue Feb 25, 2019 · 10 comments
Closed

A memory leak in Windows #919

oherrala opened this issue Feb 25, 2019 · 10 comments
Labels
windows Related to the Windows OS.

Comments

@oherrala
Copy link

The following code can be used to trigger a memory leak in Windows. I first found this with trust-dns-client (see https://github.com/bluejekyll/trust-dns/issues/692), but the issue can be isolated into mio only.

[package]
name = "mio-mao"
version = "0.1.0"
authors = ["Ossi Herrala <[email protected]>"]
edition = "2018"

[dependencies]
env_logger = "*"
mio = "0.6.16"
use std::io;
use std::net::{SocketAddr, Ipv4Addr};
use std::thread::sleep_ms;

use env_logger;
use mio::{Poll, Token, PollOpt, Ready};
use mio::net::UdpSocket;

fn main() -> io::Result<()> {
    env_logger::init();

    let target = SocketAddr::new(Ipv4Addr::new(192,168,1,1).into(), 53);
    let buf = &[0x00; 1400];
    let socket = UdpSocket::bind(&SocketAddr::new(Ipv4Addr::UNSPECIFIED.into(), 0))?;

    let poll = Poll::new()?;
    poll.register(&socket, Token(0), Ready::readable(), PollOpt::edge())?;

    socket.send_to(buf, &target)?;
    sleep_ms(1_000);

    Ok(())
}

Trace level debug output of a run:

[2019-02-25T12:21:48Z TRACE mio::poll] registering with poller
[2019-02-25T12:21:48Z TRACE mio::sys::windows::selector] register Token(0) Readable
[2019-02-25T12:21:48Z TRACE mio::sys::windows::selector] set readiness to (empty)
[2019-02-25T12:21:48Z TRACE mio::sys::windows::udp] scheduling a read
[2019-02-25T12:21:48Z TRACE mio::sys::windows::selector] set readiness to (empty)
[2019-02-25T12:21:48Z TRACE mio::sys::windows::udp] scheduling a send

And Dr. Memory results of the leak:

Dr. Memory version 1.11.0 build 2 built on Aug 29 2016 02:41:18
Dr. Memory results for pid 5244: "mio-mao.exe"
Application cmdline: "C:\Users\IEUser\AppData\Local\Temp\cargo\debug\mio-mao.exe"
Recorded 115 suppression(s) from default C:\Program Files (x86)\Dr. Memory\bin64\suppress-default.txt

Error #1: LEAK 352 direct bytes 0x000001d9588e2c90-0x000001d9588e2df0 + 137680 indirect bytes
# 0 replace_RtlAllocateHeap                                            [d:\drmemory_package\common\alloc_replace.c:3770]
# 1 ntdll.dll!RtlPcToFileHeader                                       +0x1e     (0x00007fff43500f6f <ntdll.dll+0x10f6f>)
# 2 ntdll.dll!LdrAddRefDll                                            +0x4b     (0x00007fff4354913c <ntdll.dll+0x5913c>)
# 3 core::cell::UnsafeCell<>::new<>                                    [/rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libcore\cell.rs:1471]
# 4 std::alloc::__default_lib_allocator::__rdl_alloc                   [/rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\alloc.rs:243]
# 5 alloc::alloc::alloc                                                [/rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\alloc.rs:82]
# 6 alloc::alloc::exchange_malloc                                      [/rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\alloc.rs:192]
# 7 core::sync::atomic::AtomicUsize::new                               [/rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libcore\sync\atomic.rs:1158]
# 8 mio::sys::windows::from_raw_arc::FromRawArc<>::new<>               [C:\Users\IEUser\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.16\src\sys\windows\from_raw_arc.rs:41]
# 9 mio::sys::windows::udp::UdpSocket::new                             [C:\Users\IEUser\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.16\src\sys\windows\udp.rs:61]
#10 mio::net::udp::UdpSocket::from_socket                              [C:\Users\IEUser\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.16\src\net\udp.rs:139]
#11 log::set_max_level                                                 [C:\Users\IEUser\.cargo\registry\src\github.com-1ecc6299db9ec823\log-0.4.6\src\lib.rs:1059]

===========================================================================
FINAL SUMMARY:

DUPLICATE ERROR COUNTS:

SUPPRESSIONS USED:

ERRORS FOUND:
      0 unique,     0 total unaddressable access(es)
      0 unique,     0 total invalid heap argument(s)
      0 unique,     0 total GDI usage error(s)
      0 unique,     0 total handle leak(s)
      0 unique,     0 total warning(s)
      1 unique,     1 total, 138032 byte(s) of leak(s)
      0 unique,     0 total,      0 byte(s) of possible leak(s)
ERRORS IGNORED:
      4 potential error(s) (suspected false positives)
         (details: C:\Users\IEUser\AppData\Roaming\Dr. Memory\DrMemory-mio-mao.exe.5244.000\potential_errors.txt)
      1 potential leak(s) (suspected false positives)
         (details: C:\Users\IEUser\AppData\Roaming\Dr. Memory\DrMemory-mio-mao.exe.5244.000\potential_errors.txt)
     22 unique,    22 total,   3609 byte(s) of still-reachable allocation(s)
         (re-run with "-show_reachable" for details)
Details: C:\Users\IEUser\AppData\Roaming\Dr. Memory\DrMemory-mio-mao.exe.5244.000\results.txt
@oherrala
Copy link
Author

My investigation so far:

Windows side of UdpSocket uses overlapped memory to communicate with IOCP. This seem to include mem::forget() for the UdpSocket's FromRawArc<Imp>.

Tcp side documentation:

https://github.com/carllerche/mio/blob/f2c110363c2acc527db90eb5454508440da7be8c/src/sys/windows/tcp.rs#L36-L46

So the implementation should match every mem::forget() to a overlapped2arc!() to catch the struct and keep good count of the memory.

I added a trace debug for FromRawArc's clone and drop, and to UdpSocket's drop and indeed there seem to be a situation where refcount > 1 when UdpSocket is finally dropped:

[2019-02-25T14:51:07Z TRACE mio::poll] registering with poller
[2019-02-25T14:51:07Z TRACE mio::sys::windows::selector] register Token(0) Readable | Writable
[2019-02-25T14:51:07Z TRACE mio::sys::windows::selector] set readiness to (empty)
[2019-02-25T14:51:07Z TRACE mio::sys::windows::udp] scheduling a read
[2019-02-25T14:51:07Z TRACE mio::sys::windows::from_raw_arc] FromRawArc clone to cnt 2
[2019-02-25T14:51:07Z TRACE mio::sys::windows::selector] set readiness to Writable
[2019-02-25T14:51:07Z TRACE mio::sys::windows::selector] set readiness to (empty)
[2019-02-25T14:51:07Z TRACE mio::sys::windows::udp] scheduling a send
[2019-02-25T14:51:07Z TRACE mio::sys::windows::from_raw_arc] FromRawArc clone to cnt 3
[2019-02-25T14:51:08Z TRACE mio::sys::windows::udp] UdpSocket Drop
[2019-02-25T14:51:08Z TRACE mio::sys::windows::udp] cancelling active UDP read
[2019-02-25T14:51:08Z TRACE mio::sys::windows::from_raw_arc] FromRawArc drop to cnt 2

Since the refcount is still over1, the inners of UdpSocket are not dropped.

Also it looks like UDP's send_done() nor recv_done() are never called (there should be "finished a send|recv" debug trace). These are the two functions with overlapped2arc!() calls in UDP.

@oherrala
Copy link
Author

This might be related to #776.

yorickpeterse pushed a commit to inko-lang/inko that referenced this issue May 9, 2019
This adds support for performing non-blocking network operations, such
as reading and writing to/from a socket. The runtime API exposed is
similar to Erlang, allowing one to write code that uses non-blocking
APIs without having to resort to using callbacks. For example, in a
typicall callback based language you may write the following to read
from a socket:

    socket.create do (socket) {
      socket.read do (data) {

      }
    }

In Inko, you instead would (more or less) write the following:

    import std::net::socket::TcpStream

    let socket = try! TcpStream.new(ip: '192.0.2.0', port: 80)
    let message = try! socket.read_string(size: 4)

The VM then takes care of using the appropriate non-blocking operations,
and will reschedule processes whenever necessary.

This functionality is exposed through the following runtime modules:

* std::net::ip: used for parsing IPv4 and IPv6 addresses.
* std::net::socket: used for TCP and UDP sockets.
* std::net::unix: used for Unix domain sockets.

The VM uses the system's native polling mechanism to determine when a
file descriptor is available for a read or write. On Linux we use epoll,
while using kqueue for the various BSDs and Mac OS. For Windows we use
wepoll (https://github.com/piscisaureus/wepoll). Wepoll exposes an API
that is compatible with the epoll API, but uses Windows IO completion
ports under the hoods.

When a process attempts to perform a non-blocking operation, the process
is registered (combined with the file descriptor to poll) in a global
poller and suspended. When the file descriptor becomes available for a
read or write, the corresponding process is rescheduled. The polling
mechanism is set up in such a way that a process can not be rescheduled
multiple times at once.

We do not use MIO (https://github.com/tokio-rs/mio), instead we use
epoll, kqueue, and wepoll (using
https://crates.io/crates/wepoll-binding) directly. At the time of
writing, while MIO offers some form of support for Windows it comes with
various issues:

1. tokio-rs/mio#921
2. tokio-rs/mio#919
3. tokio-rs/mio#776
4. tokio-rs/mio#913

It's not clear when these issues would be addressed, as the maintainers
of MIO appear to not have the experience and resources to resolve them
themselves. MIO is part of the Google Summer of Code 2019, with the goal
of improving Windows support. Unfortunately, this likely won't be done
before the end of 2019, and we don't want to wait that long.

Another issue with MIO is its implementation. Internally, MIO uses
various forms of synchronisation which can make it expensive to use a
single poller across multiple threads; it certainly is not a zero-cost
library. It also offers more than we need, such as being able to poll
arbitrary objects.

We are not the first to run into these issues. For example, the Amethyst
video game engine also ran into issues with MIO as detailed in
https://community.amethyst.rs/t/sorting-through-the-mio-mess/561.

With all of this in mind, I decided it was not worth the time to wait
for MIO to get fixed, and to instead spend time directly using epoll,
kqueue, and wepoll. This gives us total control over the code, and
allows us to implement what we need in the way we need it. Most
important of all: it works on Linux, BSD, Mac, and Windows.
@Thomasdezeeuw Thomasdezeeuw added the windows Related to the Windows OS. label Jun 17, 2019
@Thomasdezeeuw Thomasdezeeuw added this to the v0.7 milestone Jun 17, 2019
@Thomasdezeeuw
Copy link
Collaborator

@PerfectLaugh or @oherrala can either of you conform the memory leak doesn't exists on current master? I don't this is does any more as #776 is resolved.

@oherrala
Copy link
Author

I assume it's this PR: #1034

Adding it here just for cross referencing.

@oherrala
Copy link
Author

Thanks @PerfectLaugh for working on this!

I quickly tested the current master (e134e79) in Windows 10 using Dr. Memory 2.2.0 and my previous test code (adjusted for new mio) and Dr. Memory seems to be happy. No more leak like what was found above.

My program:

use std::io;
use std::net::{Ipv4Addr, SocketAddr};
use std::thread::sleep;
use std::time::Duration;

use env_logger;
use mio::net::UdpSocket;
use mio::{Interests, Poll, Token};

fn main() -> io::Result<()> {
    env_logger::init();

    let target = SocketAddr::new(Ipv4Addr::new(192, 168, 1, 1).into(), 53);
    let buf = &[0x00; 1400];
    let socket = UdpSocket::bind(SocketAddr::new(Ipv4Addr::UNSPECIFIED.into(), 0))?;

    let poll = Poll::new()?;
    poll.registry()
        .register(&socket, Token(0), Interests::READABLE)
        .unwrap();

    socket.send_to(buf, target)?;
    sleep(Duration::from_secs(1));

    Ok(())
}

@oherrala
Copy link
Author

@Thomasdezeeuw @PerfectLaugh is there any change to get this backported into mio 0.6?

Trust-DNS on Windows is affected on this issue as well as probably other users of Tokio in Windows. Without fix in mio 0.6 I'm afraid it might take a while to get the ecosystem updated and fix deployed.

Issues hopefully solved by the fix:

@carllerche
Copy link
Member

It could be possible to backport, but I would like the code to mature some on 0.7 first. There could be other bugs introduced in the change given that it is a rewrite.

The more testing we get on 0.7, the sooner we can backport :-)

@Thomasdezeeuw
Copy link
Collaborator

Closing this for v0.7 and moving it to a new maybe backport to v0.6 milestone.

@PerfectLaugh
Copy link
Contributor

PerfectLaugh commented Aug 21, 2019

@Thomasdezeeuw @PerfectLaugh is there any change to get this backported into mio 0.6?

Trust-DNS on Windows is affected on this issue as well as probably other users of Tokio in Windows. Without fix in mio 0.6 I'm afraid it might take a while to get the ecosystem updated and fix deployed.

Issues hopefully solved by the fix:

I have made a backport. But due to 0.7 only supports edge-triggered mode. Level-triggered in the backport will cause error

Here is the code: https://github.com/PerfectLaugh/mio-compat

@oherrala
Copy link
Author

oherrala commented Jun 9, 2020

Congratulation on 0.7.0 release!

I tested the release with following program:

[package]
name = "miotest"
version = "0.1.0"
authors = ["Ossi Herrala <[email protected]>"]
edition = "2018"

[dependencies]
env_logger = "0.7"
mio = { version = "0.7", features = [ "os-poll", "udp" ] }
use std::io;
use std::net::{Ipv4Addr, SocketAddr};
use std::thread::sleep;
use std::time::Duration;

use env_logger;
use mio::{Interest, Poll, Token};
use mio::net::UdpSocket;

fn main() -> io::Result<()> {
    env_logger::init();

    let target = SocketAddr::new(Ipv4Addr::new(192, 168, 1, 1).into(), 53);
    let buf = &[0x00; 1400];
    let mut socket = UdpSocket::bind(SocketAddr::new(Ipv4Addr::UNSPECIFIED.into(), 0))?;

    let poll = Poll::new()?;
    poll.registry()
        .register(&mut socket, Token(0), Interest::READABLE)
        .unwrap();

    socket.send_to(buf, target)?;
    sleep(Duration::from_secs(1));

    Ok(())
}

And Dr. Memory 2.3.0 reported no issues.

I'll close this issue and hope that upstreams pick up 0.7 instead of requiring a backport fix to 0.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
windows Related to the Windows OS.
Projects
None yet
Development

No branches or pull requests

4 participants