-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio.StreamReader.read hangs for reused socket file descriptors when asyncio.StreamWriter.close() is not called #88968
Comments
Problem When using asyncio streams via
with a already connected socket
again, the Client and server programs that reproduce the bug Run the server in one shell and then run the client in the other Usage: python3 client.py CLOSE_MODE Where CLOSE_MODE can be
These are also attached, but here's the source. The import asyncio, socket, sys
async def client(src_ip, close_mode):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
src_ip_port = (src_ip, 0)
s.bind(src_ip_port)
dst_ip_port = ('127.0.0.2', 12345)
s.connect(dst_ip_port)
print(f'Connected from {src_ip}')
print(s)
try:
(r,w) = await asyncio.open_connection(sock=s)
print('<- ', end='', flush=True)
in_line = await r.read(100)
print(in_line)
out_line = b'client'
print('-> ', end='', flush=True)
w.write(out_line)
await w.drain()
print(out_line)
finally:
if 'S' in close_mode:
s.close()
if 'A' in close_mode:
w.close()
await w.wait_closed()
print('Closed socket')
print()
async def main(close_mode):
await client('127.0.0.3', close_mode)
await client('127.0.0.4', close_mode)
close_mode = sys.argv[1]
asyncio.run(main(close_mode)) The import asyncio, socket, sys
async def server(close_mode):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
ip = '127.0.0.2'
port = 12345
print(f'Listening on {ip}:{port}')
print(s)
try:
s.bind((ip, port))
s.listen()
while True:
(a, (a_ip, a_port)) = s.accept()
print(f'Client connected from {a_ip}:{a_port}')
print(a)
try:
(r,w) = await asyncio.open_connection(sock=a)
print('-> ', end='', flush=True)
out_line = b'server'
w.write(out_line)
await w.drain()
print(out_line)
print('<- ', end='', flush=True)
in_line = await r.read(100)
print(in_line)
finally:
if 'S' in close_mode:
a.close()
if 'A' in close_mode:
w.close()
await w.wait_closed()
print('Closed client socket')
print()
finally:
s.close()
print('Closed server socket')
close_mode = sys.argv[1]
asyncio.run(server(close_mode)) Example session: Note that file descriptor 7 is reused on the server side, before the Run the server in one shell: $ python3 server.py S
Listening on 127.0.0.2:12345
<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)>
Client connected from 127.0.0.3:34135
<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.3', 34135)>
-> b'server'
<- b'client'
Closed client socket
Client connected from 127.0.0.4:46639
<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.4', 46639)>
-> b'server'
<- The server is hanging on Run the client in another shell, after starting the server: $ python3 client.py A
Connected from 127.0.0.3
<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.3', 34135), raddr=('127.0.0.2', 12345)>
<- b'server'
-> b'client'
Closed socket
$ lsof -ni @127.0.0.2
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python3 26692 conathan 6u IPv4 9992763 0t0 TCP 127.0.0.2:12345 (LISTEN)
python3 26692 conathan 7u IPv4 9991149 0t0 TCP 127.0.0.2:12345->127.0.0.4:46639 (CLOSE_WAIT) Example session: Note that file descriptors are not reused on the server side now, and Server: $ python3 server.py ''
Listening on 127.0.0.2:12345
<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)>
Client connected from 127.0.0.3:37833
<socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.3', 37833)>
-> b'server'
<- b'client'
Closed client socket
Client: $ python3 client.py A
Connected from 127.0.0.3
<socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.3', 37833), raddr=('127.0.0.2', 12345)>
<- b'server'
-> b'client'
Closed socket
Behavior for different combinations of closure modes Perhaps this is overkill, but here's what happens for all 15 possible For example, "Client=S, Server=S" below means we run Client=S, Server='';
Client=S, Server=S;
Client=S, Server=A;
Client=S, Server=SA Client hangs on Client='', Server=S; Server hangs on Client=SA, Server=S Everything works the first time, but if you run the client in a loop, $ while true; do python3 client.py SA; done then the server will eventually hang on Client='', Server=''; Everything works! But here we see that the client and/or server Client=A, Server=A;
Client=A, Server=SA;
Client=SA, Server=A;
Client=SA, Server=SA Everything works, but this is not surprising because both sides closed Possibly related bugs https://bugs.python.org/issue43253: Windows only, calling https://bugs.python.org/issue41317: reused file descriptors across https://bugs.python.org/issue34795, https://bugs.python.org/issue35065: reading from a closed stream can https://bugs.python.org/issue43183: sockets used with asyncio getting System info $ python3 --version
Python 3.9.6
$ lsb_release -a
LSB Version: core-9.20170808ubuntu1-noarch:printing-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic |
Oh, and I can't count: there are 16 = 4x4 possible combinations of socket closure modes for the client and server. The one I missed was Client='', Server=SA, where everything works because the client doesn't reuse file descriptors and the server closes its sockets correctly. |
Just wanted to clarify: my previous "where everything works" comment is not saying this bug doesn't exist, I just mean I missed one case in my analysis of the bug. The bug is very much there, and easy to reproduce using the example programs I attached. Bump! |
@ntc2 Is this still an issue? This issue description is very large and is hard to understand what the problem is. |
Since the OP has invested much effort in the analysis, maybe they can submit a PR using a fix? We're still short of asyncio experts and it's a lot easier to review a PR than it is to come up with a fix for a complex bug like this. Also, the first step should be if this is still a problem with 3.11 (or the main branch). |
Yes, this is still an issue, I just tested against 6cbbc26. For example, the Sorry that the issue description is so large, I may have gotten carried away describing the test cases. But the high level description of the issue is not complex. I'll repeat it again here, trying to be more clear. Let me know if it still doesn't make sense. ProblemThe
There are two different Example Programs Illustrating the BugThere are simple example programs attached to the original bug report. You can reproduce the problem by running I don't know anything about how |
The cause is a bug in the user code, right? Because you should be calling The cause of the hang seems to be that, since Note: I haven't attempted to run the example code, so I may be off about this. It's possible that the selector is confused about the reader end of the socket instead of the writer, or that what's confused is the transport. In any case I'm not sure I want to look more into this unless you feel this is a legitimate usage pattern and the problem is a bug in asyncio. One thing we could do (assuming we agree it's a user bug) would be to make it easier to detect the problem, e.g. when opening a new connection it might check with the selector if the fd still appears to be in use and then raise an exception instead of proceeding. t's possible that this is as simple as calling the selector's |
Yes, this is a minimal example illustrating a real bug I ran into porting part of a large program to use As to whether my usage pattern was legitimate, I think what happened was that I had some code like this, that didn't use socket = create_and_connect_socket()
try:
long_running_network_stuff(socket)
finally:
socket.close() Then I refactored |
Yeah, we should at least make sure the docs clarify this. I haven't thought about it too much yet, but when you pass a socket into |
Please have a look at GH-88968 to review the doc changes I am proposing. |
I am looking into providing better diagnostics. Since you mentioned hanging in def add_reader(self, fd, callback, *args):
"""Add a reader callback."""
self._ensure_fd_no_transport(fd)
self._add_reader(fd, callback, *args) And guess what! The Next I'm going to run your test scenario and check whether my theory is right. |
Yup, it seems to go so. If I change
Unfortunately there seems to be something in the client where |
…97936) (cherry picked from commit 74ea204) Co-authored-by: Guido van Rossum <[email protected]>
…97936) (cherry picked from commit 74ea204) Co-authored-by: Guido van Rossum <[email protected]>
Hm, it looks like that method (added in GH-72555) is for a slightly different cross-check. I have to ponder this some more. UPDATE: More discussion in this old PR and this old issue. |
(cherry picked from commit 74ea204) Co-authored-by: Guido van Rossum <[email protected]>
(cherry picked from commit 74ea204) Co-authored-by: Guido van Rossum <[email protected]>
It's perhaps possible to call |
I looked at your doc changes in #97936. In conjunction with a defensive "fail fast" fix, like making sure that Anyway, thanks for debugging this! Making sure that |
* main: pythonGH-88050: fix race in closing subprocess pipe in asyncio (python#97951) pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962) pythongh-95986: Fix the example using match keyword (python#95989) pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944) pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929) pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610) pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703) pythonGH-88968: Add notes about socket ownership transfers (python#97936) pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528)
I consider this good enough. |
* main: (53 commits) pythongh-94808: Coverage: Test that maximum indentation level is handled (python#95926) pythonGH-88050: fix race in closing subprocess pipe in asyncio (python#97951) pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962) pythongh-95986: Fix the example using match keyword (python#95989) pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944) pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929) pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610) pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703) pythonGH-88968: Add notes about socket ownership transfers (python#97936) pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528) pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879) docs(typing): add "see PEP 675" to LiteralString (python#97926) pythongh-97850: Remove all known instances of module_repr() (python#97876) I changed my surname early this year (python#96671) pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768) pythongh-91539: improve performance of get_proxies_environment (python#91566) build(deps): bump actions/stale from 5 to 6 (python#97701) pythonGH-95172 Make the same version `versionadded` oneline (python#95172) pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073) pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774) ...
* main: (38 commits) pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058) pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060) pythongh-57179: Add note on symlinks for os.walk (python#94799) pythongh-94808: Fix regex on exotic platforms (python#98036) pythongh-90085: Remove vestigial -t and -c timeit options (python#94941) pythonGH-83901: Improve Signature.bind error message for missing keyword-only params (python#95347) pythongh-61105: Add default param, note on using cookiejar subclass (python#95427) pythongh-96288: Add a sentence to `os.mkdir`'s docstring. (python#96271) pythongh-96073: fix backticks in NEWS entry (pythonGH-98056) pythongh-92886: [clinic.py] raise exception on invalid input instead of assertion (pythonGH-98051) pythongh-97997: Add col_offset field to tokenizer and use that for AST nodes (python#98000) pythonGH-88968: Reject socket that is already used as a transport (python#98010) pythongh-96346: Use double caching for re._compile() (python#96347) pythongh-91708: Revert params note in urllib.parse.urlparse table (python#96699) pythongh-96265: Fix some formatting in faq/design.rst (python#96924) pythongh-73196: Add namespace/scope clarification for inheritance section (python#92840) pythongh-97646: Change `.js` and `.mjs` files mimetype to conform to RFC 9239 (python#97934) pythongh-97923: Always run Ubuntu SSL tests with others in CI (python#97940) pythongh-97956: Mention `generate_global_objects.py` in `AC How-To` (python#97957) pythongh-96959: Update HTTP links which are redirected to HTTPS (python#98039) ...
(cherry picked from commit 74ea204) Co-authored-by: Guido van Rossum <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: