Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with Python 3.11.2 #7252

Closed
1 task done
scotloach opened this issue Apr 12, 2023 · 26 comments
Closed
1 task done

Memory leak with Python 3.11.2 #7252

scotloach opened this issue Apr 12, 2023 · 26 comments
Labels

Comments

@scotloach
Copy link

Describe the bug

I am seeing a memory leak with aiohttp using Python 3.11.2.

When I downgrade to Python 3.11.0, it does not leak.

This is a script that does a large number of client requests to a large number of different servers. The script leaks hundreds of MB over the course of a few hours. I haven't yet gotten this to a form that can be easily reproduced and shared, but I thought I would get this entered in case anyone else is seeing similar symptoms.

To Reproduce

The script I am running creates about 700 https client sessions to different servers. Each session does a simple GET request, which most succeed but some fail in various ways. The script sleeps for 5 seconds and then repeats this.

Expected behavior

This should not leak memory over time. And it doesn't with 3.11.0. (I don't currently have an easy way to run it with 3.11.1).

Logs/tracebacks

n/a

Python Version

Python 3.11.2

aiohttp Version

Name: aiohttp
Version: 3.8.4
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /usr/lib64/python3.11/site-packages
Requires: aiosignal, async-timeout, attrs, charset-normalizer, frozenlist, multidict, yarl

multidict Version

Name: multidict
Version: 6.0.4
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /usr/lib64/python3.11/site-packages
Requires: 
Required-by: aiohttp, yarl

yarl Version

Name: yarl
Version: 1.8.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /usr/local/lib64/python3.11/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

Fedora release 37 (Thirty Seven)

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@scotloach scotloach added the bug label Apr 12, 2023
@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Apr 13, 2023

If the memory leak appears only in a specific patch version of Python, I suspect the bug report will need to go to cpython.
If you can profile the memory to atleast figure out which objects are causing the memory leak, that would certainly help figure out the cause.

@Dreamsorcerer
Copy link
Member

Also, sharing the script would be useful.

@scotloach
Copy link
Author

scotloach commented Apr 20, 2023

tracemalloc profile after running for awhile: (This is python 3.11.3 which also exhibits the leak)
/usr/lib64/python3.11/asyncio/sslproto.py:274: size=349 MiB, count=2788, average=128 KiB /usr/lib64/python3.11/asyncio/sslproto.py:374: size=14.7 MiB, count=241614, average=64 B /usr/lib64/python3.11/asyncio/sslproto.py:343: size=2156 KiB, count=2788, average=792 B /usr/lib64/python3.11/site-packages/aiohttp/connector.py:598: size=1505 KiB, count=112, average=13.4 KiB /usr/lib64/python3.11/asyncio/sslproto.py:305: size=1035 KiB, count=2788, average=380 B /usr/lib64/python3.11/site-packages/aiohttp/connector.py:666: size=1015 KiB, count=40, average=25.4 KiB /usr/lib64/python3.11/site-packages/aiohttp/client_proto.py:213: size=525 KiB, count=7042, average=76 B /usr/lib64/python3.11/site-packages/aiohttp/streams.py:571: size=507 KiB, count=1366, average=380 B /usr/lib64/python3.11/site-packages/aiohttp/connector.py:365: size=467 KiB, count=42, average=11.1 KiB

This doesn't accurately reflect the heap sizes I see, but I suspect that second line with 241614 allocations is the issue.
This is with long running ClientSession objects. If I close the ClientSession objects and re-initialize them after each request, it doesn't leak.

This is the main body of the script I'm using to reproduce it; it's loading a set of around 700 private IP endpoints which respond in a variety of different ways.

def init():
    timeout = aiohttp.ClientTimeout(connect=5, sock_read=5)
    ssl_ctx = ssl._create_unverified_context()
    conn = aiohttp.TCPConnector(ssl_context=ssl_ctx, enable_cleanup_closed=True)
    session = aiohttp.ClientSession(connector=conn, timeout=timeout, cookie_jar=aiohttp.CookieJar(unsafe=True))
    return session
    
async def fetch(client, ip):
    try:
        async with client.request('GET', url=f'https://{ip}:8000/status.cgi') as r:
            msg = await r.text()
    except asyncio.CancelledError:
        raise
    except Exception as err:
        pass

async def main(): 
    ips = [...] # load IPs to poll here
    clients = {ip: init() for ip in ips}
    tracemalloc.start()
    try:
        while True:
            await asyncio.gather(*[fetch(client, ip) for ip, client in clients.items()])
            await asyncio.sleep(5)
    except asyncio.CancelledError:
        pass  # end and clean things up
    finally:
        memory_used = tracemalloc.get_traced_memory()
        snapshot = tracemalloc.take_snapshot()
        stats = snapshot.statistics('lineno')
        for stat in stats[:10]:
            print(stat)
        try:
            await asyncio.gather(*[client.close() for client in clients.values()])
        except:
            pass

asyncio.run(main()) 

@scotloach
Copy link
Author

I have an update. It seems that removing the enable_cleanup_closed=True from my test script, has resolved the leak.

I know I had to put that in at some earlier time to prevent a memory leak. I wonder if something has changed in recent Python versions that has fixed the cause of this, and now that flag ends up causing a leak.

@Dreamsorcerer
Copy link
Member

I'm not familiar with the code, but at a glance it just appends to a list:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/connector.py#L314
And create a weakref:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/connector.py#L363

Maybe check the size of that list? I think your tracealloc might be indicating something with that too. No idea why a cpython change would affect the behaviour though.

@bdraco
Copy link
Member

bdraco commented May 12, 2023

I'm seeing the leak as well with enable_cleanup_closed=True

I see a lot of _SSLProtocolTransport objects in memory with no socket attached to them.

2023-05-12 23:34:58.325 CRITICAL (MainThread) [homeassistant.components.profiler] _SSLProtocolTransport <asyncio.sslproto._SSLProtocolTransport object at 0x7fa1382d6210> ssl_proto=None ssl_object=<ssl.SSLObject object at 0x7fa156521650> peercert={'su
bject': ((('commonName', 'new-home.sensibo.com'),),), 'issuer': ((('countryName', 'US'),), (('organizationName', 'Amazon'),), (('commonName', 'Amazon RSA 2048 M02'),)), 'version': 3, 'serialNumber': '0FFA0B652D4CB4ADA67C967B5136739D', 'notBefore': 'F
eb 23 00:00:00 2023 GMT', 'notAfter': 'Mar 23 23:59:59 2024 GMT', 'subjectAltName': (('DNS', 'new-home.sensibo.com'), ('DNS', 'home.sensibo.com')), 'OCSP': ('http://ocsp.r2m02.amazontrust.com',), 'caIssuers': ('http://crt.r2m02.amazontrust.com/r2m02.
cer',), 'crlDistributionPoints': ('http://crl.r2m02.amazontrust.com/r2m02.crl',)} sock=None

@bdraco
Copy link
Member

bdraco commented May 12, 2023

python/cpython#98539 looks related

@bdraco
Copy link
Member

bdraco commented May 12, 2023

python/cpython#98540

@bdraco
Copy link
Member

bdraco commented May 12, 2023

related elastic/rally#1712

@bdraco
Copy link
Member

bdraco commented May 12, 2023

related elastic/rally#1714

bdraco added a commit to home-assistant/core that referenced this issue May 13, 2023
There is currently a relatively fast memory leak when using
cpython 3.11.2+ and cleanup_closed with aiohttp

For my production instance it was leaking ~450MiB per day
of `MemoryBIO`, `SSLProtocol`, `SSLObject`, `_SSLProtocolTransport`
`memoryview`, and `managedbuffer` objects

see aio-libs/aiohttp#7252
see python/cpython#98540
@webknjaz
Copy link
Member

related elastic/rally#1712

Looks like this was backported via python/cpython@bd8b32b?
Maybe 3.11 nightly has the fix already?

@Dreamsorcerer
Copy link
Member

It was backported the same day that 3.11.0 was released. So, I think the suspicion is that the "fix" may actually be causing the memory leak? Because it must have been in 3.11.1+, but probably just missed the 3.11.0 release, which aligns with the reports.

bdraco added a commit to home-assistant/core that referenced this issue May 13, 2023
@Dreamsorcerer
Copy link
Member

Going by elastic/rally#1714, it looks like we just need to add some checks for None. Before that change, it would never be None, so the code worked correctly. I guess something about that exception could also end up resulting in a memory leak?

@bdraco
Copy link
Member

bdraco commented May 13, 2023

I haven't seen the exception but I have a Home Assistant leaking about 450MiB of ram per day. I just turned off enable_cleanup_closed and should know tomorrow if the leak continues

@bdraco
Copy link
Member

bdraco commented May 13, 2023

The leaked objects are MemoryBIO, SSLProtocol, SSLObject, _SSLProtocolTransport memoryview, and managedbuffers

The SSLProtocol objects have no socket attached to them

The asyncio.sslproto._SSLProtocolTransport have no SSLProtocol and no socket

Thats about as far as I've gotten with the debugging

@Dreamsorcerer
Copy link
Member

Well, if you could also test with the changes in #7280 later, that'd be great.

@Dreamsorcerer
Copy link
Member

I don't know if the exceptions might get suppressed for some reason, but I suppose that if an exception occurs on that abort(), then none of the other transports will get aborted, and none of the transports will get removed from that list in order to be garbage collected. So, it makes sense that a memory leak could occur.

@bdraco
Copy link
Member

bdraco commented May 13, 2023

I'll switch my production instance to use #7280 after the enable_cleanup_closed=False test runs overnight

@bdraco
Copy link
Member

bdraco commented May 13, 2023

Setting enable_cleanup_closed=False significant reduces the leak. Not 100% sure if it means that we are trading the fast leak for a slow leak though.

Trying #7280 now.

@SishaarRao
Copy link

SishaarRao commented May 13, 2023

Hey all, seems like you all are triaging and attempting to solve the issue so best of luck 🙏
Not sure if this is helpful, but I'm running aiohttp==3.8.4 and Python version 3.11.2 in a production server. We have observed a steady increase in Inactive Memory(proc/meminfo) with a rate correlated with the rate HTTP requests. I haven't previously used the enable_cleanup_closed setting before however, so not sure where that leaves us as potentially being affected by this issue.

For the raw delta, we saw an increase in ~600MB in Inactive Memory over ~6.5k HTTP requests via aiohttp

@bdraco
Copy link
Member

bdraco commented May 13, 2023

While its too soon to be sure (need to give it a few more hours), keeping enable_cleanup_closed=True along with #7280 does seem to have fixed the leak I am seeing.

@Dreamsorcerer
Copy link
Member

I haven't previously used the enable_cleanup_closed setting before however, so not sure where that leaves us as potentially being affected by this issue.

I think this change will only have an effect with enable_cleanup_closed. If there is a memory leak without that option, then it is likely a different issue. Worth testing with Python 3.11.0 to see if the memory leak is still there (which would rule out the same change being the cause).

@bdraco
Copy link
Member

bdraco commented May 13, 2023

I can no longer replicate the issue after the change in #7280

It would be great to get a 3.8.5 since this affects Home Assistant users in production.

@Dreamsorcerer
Copy link
Member

I'm thinking it might be better to just get this fixed in cpython, 3.11.4 will be released in 3 weeks, so if we get it fixed there in the next week, it'll probably beat aiohttp to a new release anyway.

balloob pushed a commit to home-assistant/core that referenced this issue May 14, 2023
…93013)

* Disable cleanup_closed for aiohttp.TCPConnector with cpython 3.11.2+

There is currently a relatively fast memory leak when using
cpython 3.11.2+ and cleanup_closed with aiohttp

For my production instance it was leaking ~450MiB per day
of `MemoryBIO`, `SSLProtocol`, `SSLObject`, `_SSLProtocolTransport`
`memoryview`, and `managedbuffer` objects

see aio-libs/aiohttp#7252
see python/cpython#98540

* Update homeassistant/helpers/aiohttp_client.py
balloob pushed a commit to home-assistant/core that referenced this issue May 14, 2023
…93013)

* Disable cleanup_closed for aiohttp.TCPConnector with cpython 3.11.2+

There is currently a relatively fast memory leak when using
cpython 3.11.2+ and cleanup_closed with aiohttp

For my production instance it was leaking ~450MiB per day
of `MemoryBIO`, `SSLProtocol`, `SSLObject`, `_SSLProtocolTransport`
`memoryview`, and `managedbuffer` objects

see aio-libs/aiohttp#7252
see python/cpython#98540

* Update homeassistant/helpers/aiohttp_client.py
@Dreamsorcerer
Copy link
Member

python/cpython#104485

Should be fixed in 3.11.4. So, the only affected versions are 3.11.1, 3.11.2 and 3.11.3.

@bdraco
Copy link
Member

bdraco commented May 14, 2023

Thanks 👍

ollo69 added a commit to ollo69/ha-smartthinq-sensors that referenced this issue May 16, 2023
Enabling cleanup closed on python 3.11.1+ and before python 3.11.4 leaks memory relatively quickly (see aio-libs/aiohttp#7252)
ollo69 added a commit to ollo69/ha-smartthinq-sensors that referenced this issue May 16, 2023
* Disable cleanup_closed on cpython <= 3.11.3

Enabling cleanup closed on python 3.11.1+ and before python 3.11.4 leaks memory relatively quickly (see aio-libs/aiohttp#7252)
teharris1 pushed a commit to teharris1/home-assistant that referenced this issue May 23, 2023
…ome-assistant#93013)

* Disable cleanup_closed for aiohttp.TCPConnector with cpython 3.11.2+

There is currently a relatively fast memory leak when using
cpython 3.11.2+ and cleanup_closed with aiohttp

For my production instance it was leaking ~450MiB per day
of `MemoryBIO`, `SSLProtocol`, `SSLObject`, `_SSLProtocolTransport`
`memoryview`, and `managedbuffer` objects

see aio-libs/aiohttp#7252
see python/cpython#98540

* Update homeassistant/helpers/aiohttp_client.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants