Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UVTimer leaks #37

Closed
kaniini opened this issue Jul 5, 2016 · 32 comments
Closed

UVTimer leaks #37

kaniini opened this issue Jul 5, 2016 · 32 comments
Labels

Comments

@kaniini
Copy link
Contributor

kaniini commented Jul 5, 2016

  • uvloop version: 0.4.29
  • python version: 3.5.1
  • platform: FreeBSD

We are seeing a serious memory leak that only shows up when using UVLoop on Gunicorn. Examining the GC state does not show anything useful, which indicates to me that it is likely inside native code, so I report this here. Under moderate load, we see each Gunicorn worker taking 2GB ram within 18-24 hours.

We are presently retesting under 0.4.32 and will update if we still see it. If you have any suggestions for debugging it that may be helpful, that would be useful too. We only see this leak under UVLoop and not the default asyncio loop, so it has to be related to UVLoop.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

Would it be possible for you to share the program code, or to reduce it to a smaller size that can be shared?

I'd suggest to build a debug version of uvloop with make debug. You then can use an undocumented function loop.print_debug_info() to print a bunch of information about the loop and the running process. For instance, you can use the following code:

async def print_debug(loop):
    while True:
        print(chr(27) + "[2J")  # clear screen
        loop.print_debug_info()
        await asyncio.sleep(0.5, loop=loop)

to create an asyncio task -- loop.create_task(print_debug(loop)) -- and have a live info screen.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

Also, how exactly do you use Gunicorn with uvloop? Do you use aiohttp? Do you use the worker from aio-libs/aiohttp#878?

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Hello,

Unfortunately I can't share the program code in this instance as it contains trade secrets. We have a "scoreboard" facility (like Apache) -- would it be possible to get the text from loop.print_debug_info() into that using a StringIO object?

We used the aiohttp.worker.GunicornUVLoopWebWorker from aiohttp git, so that is probably the same from aio-libs/aiohttp#878. As such, I don't think the example task will work, but our scoreboard facility should be helpful.

UVLoop provides a significant CPU usage reduction over the default loop, so we are very eager to help in whatever way we can. Of note is that we haven't ported the application to Python 3.5 async/await statements from the old yield from syntax.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Based on what I see here: https://github.com/MagicStack/uvloop/blob/master/uvloop/loop.pyx#L832-L918 it will need to be modified, but no worries, we can do that modification and send it as a pull request.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

loop.print_debug_info() just prints to stdout, so you can use contextlib.redirect_stdout to redirect it to a file or, prehaps, a StringIO object.

Unfortunately I can't share the program code in this instance as it contains trade secrets.

NP. Can you tell me more about the app though? What modules does it use; for instance, do you use some async driver for memcache/redis/database/etc? Also, can you try to create a simple version of your app that still has the leak? It might help us to understand what's going on here.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

We use a custom asyncio driver for Aerospike, code available at http://github.com/kaniini/aerospike-py as well as aioredis. We also use an async client to talk to the BIRD BGP routing daemon (but this is being replaced for various reasons, and has been determined not responsible for the leak). Other than that, it is a pretty simple aiohttp.web application.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

And, just to confirm, you can see the leak only when you're running the app behind Gunicorn?

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

We haven't really bothered to test outside of Gunicorn as we use Gunicorn to manage our workers -- retooling to use aiohttp directly (even though the aiohttp worker basically replaces the entirety of Gunicorn's actual web serving infrastructure) is not something we're eager to do right now. I will add the loop debug info to the scoreboard tomorrow and post some stats.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Hello,

So we have added uvloop debugging to the scoreboard module, and I wonder if there is an issue with the way callback handles are being cleaned up, because the "total callback handles" number is going up at a rate that is proportional to the memory increase.

TCPTransport and UVTimer objects are the most heavily used, which is unsurprising.

An interesting note is that the Read callbacks number is much smaller than the total callback handles number.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

So we have added uvloop debugging to the scoreboard module, and I wonder if there is an issue with the way callback handles are being cleaned up, because the "total callback handles" number is going up at a rate that is proportional to the memory increase.

But the "current" number of callbacks is still low (not steadily increasing?)

Please try to remove @cython.freelist(DEFAULT_FREELIST_SIZE) decorator from Handle and TimerHandle classes in cbhandles.pyx. And then try to remove the @cython.no_gc_clear decorators (although this might cause a segfault theoretically).

An interesting note is that the Read callbacks number is much smaller than the total callback handles number.

It's a different kind of callbacks -- "read callbacks" is the number of times libuv reads something from the socket and sends it back to uvloop.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

How many UVTimer objects do you have?

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Hello,

Yes, the current number of callbacks is low.

Here is one of the UVHandles dumps from a worker, which should answer the other questions:

                        alive  | closed  |
UVHandles               python | libuv   | total
                        objs   | handles |
-------------------------------+---------+---------
    TCPServer                1 |       0 |       1
    TCPTransport            57 |  414552 |  414609
    UVAsync                  1 |       0 |       1
    UVCheck                  1 |       0 |       1
    UVIdle                   1 |       0 |       1
    UVPoll                   0 |      10 |      10
    UVSignal                 8 |       0 |       8
    UVTimer                  2 |  558155 |  558157
    UnixTransport            1 |  139022 |  139023

I will make the requested modifications shortly, and report back.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Hello,

I applied the modifications and now it is leaking heavily. Roughly 1MB per second per worker.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Seems the leak rate is now slowing down on the modified server, so I will monitor for a bit more and give an update.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016 via email

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Right now what I am seeing is heavy CPU use (probably due to GC pressure), and a much slower leak rate, but it is still leaking a little: an asyncio default event loop worker started at the same time with the same load is using about 210mb RSS, while the uvloop is using about 250mb.

I agree that it may be UVTimer, do you want me to put the freelists back for the non-timer case?

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

It's definitely UVTimer. I wrote a small program that creates thousands of them, and it's leaking super fast. I don't think it's related to freelists...

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Awesome! Is there anything I can do on my end to help with getting a fix going? I'm unfortunately not very familiar with libuv internals, but if it is a bug on their side I may be able to help fix it.

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

No, it's a uvloop bug. Somehow UVTimers do not free memory of their uv_timer_t handles. Looking into that now.

I also have a refactoring branch where I remove most of PyMem_Malloc calls from uvloop, and it doesn't leak in that branch.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

If you can make that branch public, I'm more than happy to deploy it and give it a go.

1st1 added a commit that referenced this issue Jul 5, 2016
This should fix a memory leak discovered in [1].  It will also
make everything slightly faster.

[1] #37
@1st1
Copy link
Member

1st1 commented Jul 5, 2016

Pushed to the master branch. Please test.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

Looks good so far, will run it for 24 hours and see where it's at. Thanks!

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

Thanks a lot for discovering this and for your help with debug. I'll release a new version of uvloop shortly, as the bug is pretty severe; perhaps it affected other handles, such as Poll or TCP.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

3 hours later and no leak, I think we can call this one done. Will deploy uvloop 0.4.33 to all workers, thanks so much!

@kaniini kaniini closed this as completed Jul 5, 2016
@1st1
Copy link
Member

1st1 commented Jul 5, 2016

Awesome! Thanks!

@1st1
Copy link
Member

1st1 commented Jul 5, 2016

BTW, did you guys do any benchmarking of your app? How much faster is it when it's run on uvloop?

@kaniini
Copy link
Contributor Author

kaniini commented Jul 5, 2016

The main advantage for the app (a real-time ad server DSP/SSP engine) in question is that UVLoop provides more stable latencies (due to more performant socket I/O and polling provided by libuv). CPU is cut by about 30% over baseline, so it is a nice performance improvement there. The key thing though is that latencies are more stable. We have been testing with libuv for about a month now and finally had diagnosed uvloop as the only remaining culprit, hince the report :)

Each server has 8 workers and processes approximately 6000-12000 requests per second.

@1st1 1st1 added the resolved label Jul 5, 2016
@1st1 1st1 changed the title aiohttp.worker.GunicornUVLoopWebWorker has memory leak UVTimer leaks Jul 6, 2016
1st1 added a commit that referenced this issue Jul 12, 2016
This is a different fix for issue #37. The original fix that removed
dynamic memory allocation of uv_handles had to be reverted.
@1st1 1st1 reopened this Jul 12, 2016
@1st1
Copy link
Member

1st1 commented Jul 12, 2016

@kaniini I had to revert the original fix for this issue (see issue #41) and it's now fixed in a different way. Will issue a new release soon.

@1st1
Copy link
Member

1st1 commented Jul 12, 2016

Please try uvloop v0.4.34. I've made several changes to make sure we won't introduce similar memory leaks in the future:

  • Loop has two new counters in debug build: _debug_uv_handles_total and _debug_uv_handles_freed which count number of PyMem_Malloc and PyMem_Free calls for uv_handle_t structs.
  • unittests, when run on debug build, test that those counters are equal at the end of each unittest.
  • Travis now tests both production and debug builds of uvloop.

@kaniini
Copy link
Contributor Author

kaniini commented Jul 12, 2016

we will deploy it on one of our nodes and verify there are no regressions, thanks!

@kaniini
Copy link
Contributor Author

kaniini commented Jul 13, 2016

We've deployed uvloop 0.4.34 and see no regressions in terms of memory leaks.

@1st1
Copy link
Member

1st1 commented Jul 13, 2016

Awesome, thanks!

@1st1 1st1 closed this as completed Jul 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants