UVTimer leaks #37

kaniini · 2016-07-05T05:34:40Z

uvloop version: 0.4.29
python version: 3.5.1
platform: FreeBSD

We are seeing a serious memory leak that only shows up when using UVLoop on Gunicorn. Examining the GC state does not show anything useful, which indicates to me that it is likely inside native code, so I report this here. Under moderate load, we see each Gunicorn worker taking 2GB ram within 18-24 hours.

We are presently retesting under 0.4.32 and will update if we still see it. If you have any suggestions for debugging it that may be helpful, that would be useful too. We only see this leak under UVLoop and not the default asyncio loop, so it has to be related to UVLoop.

1st1 · 2016-07-05T05:39:48Z

Would it be possible for you to share the program code, or to reduce it to a smaller size that can be shared?

I'd suggest to build a debug version of uvloop with make debug. You then can use an undocumented function loop.print_debug_info() to print a bunch of information about the loop and the running process. For instance, you can use the following code:

async def print_debug(loop):
    while True:
        print(chr(27) + "[2J")  # clear screen
        loop.print_debug_info()
        await asyncio.sleep(0.5, loop=loop)

to create an asyncio task -- loop.create_task(print_debug(loop)) -- and have a live info screen.

1st1 · 2016-07-05T05:46:53Z

Also, how exactly do you use Gunicorn with uvloop? Do you use aiohttp? Do you use the worker from aio-libs/aiohttp#878?

kaniini · 2016-07-05T06:05:59Z

Hello,

Unfortunately I can't share the program code in this instance as it contains trade secrets. We have a "scoreboard" facility (like Apache) -- would it be possible to get the text from loop.print_debug_info() into that using a StringIO object?

We used the aiohttp.worker.GunicornUVLoopWebWorker from aiohttp git, so that is probably the same from aio-libs/aiohttp#878. As such, I don't think the example task will work, but our scoreboard facility should be helpful.

UVLoop provides a significant CPU usage reduction over the default loop, so we are very eager to help in whatever way we can. Of note is that we haven't ported the application to Python 3.5 async/await statements from the old yield from syntax.

kaniini · 2016-07-05T06:12:18Z

Based on what I see here: https://github.com/MagicStack/uvloop/blob/master/uvloop/loop.pyx#L832-L918 it will need to be modified, but no worries, we can do that modification and send it as a pull request.

1st1 · 2016-07-05T06:13:05Z

loop.print_debug_info() just prints to stdout, so you can use contextlib.redirect_stdout to redirect it to a file or, prehaps, a StringIO object.

Unfortunately I can't share the program code in this instance as it contains trade secrets.

NP. Can you tell me more about the app though? What modules does it use; for instance, do you use some async driver for memcache/redis/database/etc? Also, can you try to create a simple version of your app that still has the leak? It might help us to understand what's going on here.

kaniini · 2016-07-05T06:16:47Z

We use a custom asyncio driver for Aerospike, code available at http://github.com/kaniini/aerospike-py as well as aioredis. We also use an async client to talk to the BIRD BGP routing daemon (but this is being replaced for various reasons, and has been determined not responsible for the leak). Other than that, it is a pretty simple aiohttp.web application.

1st1 · 2016-07-05T06:18:55Z

And, just to confirm, you can see the leak only when you're running the app behind Gunicorn?

kaniini · 2016-07-05T06:31:36Z

We haven't really bothered to test outside of Gunicorn as we use Gunicorn to manage our workers -- retooling to use aiohttp directly (even though the aiohttp worker basically replaces the entirety of Gunicorn's actual web serving infrastructure) is not something we're eager to do right now. I will add the loop debug info to the scoreboard tomorrow and post some stats.

kaniini · 2016-07-05T16:31:35Z

Hello,

So we have added uvloop debugging to the scoreboard module, and I wonder if there is an issue with the way callback handles are being cleaned up, because the "total callback handles" number is going up at a rate that is proportional to the memory increase.

TCPTransport and UVTimer objects are the most heavily used, which is unsurprising.

An interesting note is that the Read callbacks number is much smaller than the total callback handles number.

1st1 · 2016-07-05T16:46:07Z

So we have added uvloop debugging to the scoreboard module, and I wonder if there is an issue with the way callback handles are being cleaned up, because the "total callback handles" number is going up at a rate that is proportional to the memory increase.

But the "current" number of callbacks is still low (not steadily increasing?)

Please try to remove @cython.freelist(DEFAULT_FREELIST_SIZE) decorator from Handle and TimerHandle classes in cbhandles.pyx. And then try to remove the @cython.no_gc_clear decorators (although this might cause a segfault theoretically).

An interesting note is that the Read callbacks number is much smaller than the total callback handles number.

It's a different kind of callbacks -- "read callbacks" is the number of times libuv reads something from the socket and sends it back to uvloop.

1st1 · 2016-07-05T16:49:29Z

How many UVTimer objects do you have?

kaniini · 2016-07-05T17:04:26Z

Hello,

Yes, the current number of callbacks is low.

Here is one of the UVHandles dumps from a worker, which should answer the other questions:

                        alive  | closed  |
UVHandles               python | libuv   | total
                        objs   | handles |
-------------------------------+---------+---------
    TCPServer                1 |       0 |       1
    TCPTransport            57 |  414552 |  414609
    UVAsync                  1 |       0 |       1
    UVCheck                  1 |       0 |       1
    UVIdle                   1 |       0 |       1
    UVPoll                   0 |      10 |      10
    UVSignal                 8 |       0 |       8
    UVTimer                  2 |  558155 |  558157
    UnixTransport            1 |  139022 |  139023

I will make the requested modifications shortly, and report back.

kaniini · 2016-07-05T17:09:11Z

Hello,

I applied the modifications and now it is leaking heavily. Roughly 1MB per second per worker.

kaniini · 2016-07-05T17:11:02Z

Seems the leak rate is now slowing down on the modified server, so I will monitor for a bit more and give an update.

1st1 · 2016-07-05T17:20:29Z

I think it's UVTimer.

kaniini · 2016-07-05T17:41:16Z

Right now what I am seeing is heavy CPU use (probably due to GC pressure), and a much slower leak rate, but it is still leaking a little: an asyncio default event loop worker started at the same time with the same load is using about 210mb RSS, while the uvloop is using about 250mb.

I agree that it may be UVTimer, do you want me to put the freelists back for the non-timer case?

1st1 · 2016-07-05T17:51:45Z

It's definitely UVTimer. I wrote a small program that creates thousands of them, and it's leaking super fast. I don't think it's related to freelists...

kaniini · 2016-07-05T17:57:09Z

Awesome! Is there anything I can do on my end to help with getting a fix going? I'm unfortunately not very familiar with libuv internals, but if it is a bug on their side I may be able to help fix it.

1st1 · 2016-07-05T17:59:47Z

No, it's a uvloop bug. Somehow UVTimers do not free memory of their uv_timer_t handles. Looking into that now.

I also have a refactoring branch where I remove most of PyMem_Malloc calls from uvloop, and it doesn't leak in that branch.

kaniini · 2016-07-05T18:09:17Z

If you can make that branch public, I'm more than happy to deploy it and give it a go.

This should fix a memory leak discovered in [1]. It will also make everything slightly faster. [1] #37

1st1 · 2016-07-05T18:45:37Z

Pushed to the master branch. Please test.

kaniini · 2016-07-05T18:53:25Z

Looks good so far, will run it for 24 hours and see where it's at. Thanks!

1st1 · 2016-07-05T18:59:23Z

Thanks a lot for discovering this and for your help with debug. I'll release a new version of uvloop shortly, as the bug is pretty severe; perhaps it affected other handles, such as Poll or TCP.

kaniini · 2016-07-05T21:42:53Z

3 hours later and no leak, I think we can call this one done. Will deploy uvloop 0.4.33 to all workers, thanks so much!

1st1 · 2016-07-05T21:43:42Z

Awesome! Thanks!

1st1 · 2016-07-05T21:44:34Z

BTW, did you guys do any benchmarking of your app? How much faster is it when it's run on uvloop?

kaniini · 2016-07-05T21:47:38Z

The main advantage for the app (a real-time ad server DSP/SSP engine) in question is that UVLoop provides more stable latencies (due to more performant socket I/O and polling provided by libuv). CPU is cut by about 30% over baseline, so it is a nice performance improvement there. The key thing though is that latencies are more stable. We have been testing with libuv for about a month now and finally had diagnosed uvloop as the only remaining culprit, hince the report :)

Each server has 8 workers and processes approximately 6000-12000 requests per second.

This is a different fix for issue #37. The original fix that removed dynamic memory allocation of uv_handles had to be reverted.

1st1 · 2016-07-12T17:27:49Z

@kaniini I had to revert the original fix for this issue (see issue #41) and it's now fixed in a different way. Will issue a new release soon.

1st1 · 2016-07-12T20:12:56Z

Please try uvloop v0.4.34. I've made several changes to make sure we won't introduce similar memory leaks in the future:

Loop has two new counters in debug build: _debug_uv_handles_total and _debug_uv_handles_freed which count number of PyMem_Malloc and PyMem_Free calls for uv_handle_t structs.
unittests, when run on debug build, test that those counters are equal at the end of each unittest.
Travis now tests both production and debug builds of uvloop.

kaniini · 2016-07-12T23:39:51Z

we will deploy it on one of our nodes and verify there are no regressions, thanks!

kaniini · 2016-07-13T02:43:11Z

We've deployed uvloop 0.4.34 and see no regressions in terms of memory leaks.

1st1 · 2016-07-13T02:44:15Z

Awesome, thanks!

1st1 added a commit that referenced this issue Jul 5, 2016

Don't use dynamic memory allocation for libuv handles/requests

6ebd07d

This should fix a memory leak discovered in [1]. It will also make everything slightly faster. [1] #37

kaniini closed this as completed Jul 5, 2016

1st1 added the resolved label Jul 5, 2016

1st1 changed the title ~~aiohttp.worker.GunicornUVLoopWebWorker has memory leak~~ UVTimer leaks Jul 6, 2016

1st1 added a commit that referenced this issue Jul 12, 2016

Correctly dealloc detached handlers

2af7b2a

This is a different fix for issue #37. The original fix that removed dynamic memory allocation of uv_handles had to be reverted.

1st1 reopened this Jul 12, 2016

1st1 mentioned this issue Jul 13, 2016

uvloop crashed and core dumped #41

Closed

1st1 closed this as completed Jul 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UVTimer leaks #37

UVTimer leaks #37

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016 via email

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016 •

edited

Loading

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 12, 2016

1st1 commented Jul 12, 2016

kaniini commented Jul 12, 2016

kaniini commented Jul 13, 2016

1st1 commented Jul 13, 2016

UVTimer leaks #37

UVTimer leaks #37

Comments

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016 via email

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016 • edited Loading

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 5, 2016

1st1 commented Jul 5, 2016

kaniini commented Jul 5, 2016

1st1 commented Jul 12, 2016

1st1 commented Jul 12, 2016

kaniini commented Jul 12, 2016

kaniini commented Jul 13, 2016

1st1 commented Jul 13, 2016

1st1 commented Jul 5, 2016 •

edited

Loading