Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fix upload timeouts with sock_read set #7150

Merged
merged 3 commits into from
Feb 11, 2023

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented Dec 28, 2022

What do these changes do?

Prevent the sock_read timeout callback from firing by only scheduling it afterthe payload (if any) has been fully written.

Are there changes in behavior for the user?

No

Related issue number

Fixes #7149

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@dtrifiro dtrifiro force-pushed the fix-chunked-upload-timeouts branch 2 times, most recently from bd62e41 to 4651b63 Compare January 2, 2023 14:07
@dtrifiro dtrifiro marked this pull request as ready for review January 2, 2023 14:07
@dtrifiro dtrifiro requested a review from asvetlov as a code owner January 2, 2023 14:07
@dtrifiro dtrifiro force-pushed the fix-chunked-upload-timeouts branch from 4651b63 to aa11d7f Compare January 2, 2023 14:10
@dtrifiro dtrifiro requested a review from webknjaz as a code owner January 2, 2023 14:10
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jan 2, 2023
@codecov
Copy link

codecov bot commented Jan 8, 2023

Codecov Report

Merging #7150 (2f702a3) into master (4635161) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 2f702a3 differs from pull request most recent head 303d3c0. Consider uploading reports for the commit 303d3c0 to get more accurate results

@@           Coverage Diff           @@
##           master    #7150   +/-   ##
=======================================
  Coverage   97.37%   97.38%           
=======================================
  Files         106      106           
  Lines       31093    31113   +20     
  Branches     3875     3876    +1     
=======================================
+ Hits        30278    30298   +20     
  Misses        613      613           
  Partials      202      202           
Flag Coverage Δ
CI-GHA 97.28% <100.00%> (+<0.01%) ⬆️
OS-Linux 96.94% <100.00%> (-0.01%) ⬇️
OS-Windows 95.35% <100.00%> (+<0.01%) ⬆️
OS-macOS 96.50% <100.00%> (+<0.01%) ⬆️
Py-3.10.9 97.03% <100.00%> (-0.01%) ⬇️
Py-3.11.0 95.60% <100.00%> (+<0.01%) ⬆️
Py-3.7.15 96.76% <100.00%> (+<0.01%) ⬆️
Py-3.7.9 95.24% <100.00%> (+<0.01%) ⬆️
Py-3.8.10 95.14% <100.00%> (+<0.01%) ⬆️
Py-3.8.15 96.65% <100.00%> (+<0.01%) ⬆️
Py-3.8.16 ?
Py-3.9.13 95.12% <100.00%> (-0.01%) ⬇️
Py-3.9.16 96.66% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.11 94.17% <100.00%> (-0.02%) ⬇️
VM-macos 96.50% <100.00%> (+<0.01%) ⬆️
VM-ubuntu 96.94% <100.00%> (-0.01%) ⬇️
VM-windows 95.35% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohttp/client_proto.py 96.73% <100.00%> (+0.02%) ⬆️
aiohttp/client_reqrep.py 97.72% <100.00%> (+<0.01%) ⬆️
tests/test_client_functional.py 98.84% <100.00%> (+<0.01%) ⬆️
tests/test_client_proto.py 100.00% <100.00%> (ø)
aiohttp/web.py 99.18% <0.00%> (ø)
aiohttp/http.py 100.00% <0.00%> (ø)
aiohttp/client.py 94.16% <0.00%> (ø)
aiohttp/__init__.py 92.85% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Dreamsorcerer
Copy link
Member

I'm wondering if this is the correct behaviour. If the timeout relates to reading data, shouldn't we start the timeout after we've finished writing? Or do we need to handle cases where reading happens simultaneously to writing?

Currently, it looks like we start writing as a task:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client_reqrep.py#L621

Then we start the response immediately after, which initialises the timer:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L529

Also, wouldn't this issue also occur if you tried to send a large payload without chunking? I'm not sure this fix works in that case.

@webknjaz Thoughts?

@dtrifiro
Copy link
Contributor Author

If the timeout relates to reading data, shouldn't we start the timeout after we've finished writing? Or do we need to handle cases where reading happens simultaneously to writing?

I also asked myself that question. I guess one of the use cases is to start reading the response while the file is being uploaded in order to catch potential errors early and abort the upload, although there's currently no mechanism for that: whole file is uploaded even if the server immediately returns an error (e.g. 401). In my limited testing, curl does this:

curl -X PUT -H "Transfer-Encoding: chunked" http://localhost:8472/file -T <large file>

Fails immediately if the server returns a 401:

async def handler(request: aiohttp.web.Request) -> aiohttp.web.Response:
    return aiohttp.web.Response(status=401)

Also note that the sock read timeout is actually started before we send the request (via _reschedule_timeout):

def set_response_params(
self,
*,
timer: Optional[BaseTimerContext] = None,
skip_payload: bool = False,
read_until_eof: bool = False,
auto_decompress: bool = True,
read_timeout: Optional[float] = None,
read_bufsize: int = 2**16,
timeout_ceil_threshold: float = 5,
) -> None:
self._skip_payload = skip_payload
self._read_timeout = read_timeout
self._reschedule_timeout()


Also, wouldn't this issue also occur if you tried to send a large payload without chunking? I'm not sure this fix works in that case.

Good point, that's still broken, and the current fix is also broken when the time to upload a single chunk is comparable to the timeout value.

Another proposal to fix this could be something like this:

diff --git a/aiohttp/client.py b/aiohttp/client.py
index c4074577..8ceeeb09 100644
--- a/aiohttp/client.py
+++ b/aiohttp/client.py
@@ -525,6 +525,9 @@ class ClientSession:
                     try:
                         try:
                             resp = await req.send(conn)
+                            conn.protocol.pause_reading()
+                            await resp.wait_for_close()
+                            conn.protocol.resume_reading()
                             try:
                                 await resp.start(conn)
                             except BaseException:

would fix the issue for all cases although I'm not sure it makes sense to completely pause reading while uploading.

Any thoughts?

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Jan 10, 2023

I'm not sure it makes sense to completely pause reading while uploading.

I think you're probably right. I wonder if there's a reasonable way to only start the read timeout once the write has completed, even if we're already waiting on the read...

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Jan 10, 2023

If we can use that resp.wait_for_closed() and it's not too messy to get a reference to that in the right place, then maybe we can modify the timeout code.

e.g. The timeout might be modified to do something like:

async def timeout():
    await resp.wait_for_closed()
    await asyncio.sleep(...)
    task.cancel()

CHANGES/7149.bugfix Outdated Show resolved Hide resolved
aiohttp/http_writer.py Outdated Show resolved Hide resolved
@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jan 11, 2023

@Dreamsorcerer

I think it might be easier to do something like this:

diff --git a/aiohttp/client_reqrep.py b/aiohttp/client_reqrep.py
index 9478e8d3..a11a600f 100644
--- a/aiohttp/client_reqrep.py
+++ b/aiohttp/client_reqrep.py
@@ -525,6 +525,7 @@ class ClientRequest:
 
         protocol = conn.protocol
         assert protocol is not None
+        protocol.drop_timeout()
         try:
             if isinstance(self.body, payload.Payload):
                 await self.body.write(writer)
@@ -551,6 +552,8 @@ class ClientRequest:
                 protocol.set_exception(exc)
         except Exception as exc:
             protocol.set_exception(exc)
+        else:
+            protocol.reschedule_timeout()
         finally:
             self._writer = None

This works in every case we discussed above and does not block reading (note that I renamed _drop_timeout and _reschedule_timeout).

Somewhat related: if the total timeout is not set, this could result in hangs while uploading since there's no write_timeout (could be added to StreamWriter?)

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Jan 11, 2023

This works in every case we discussed above and does not block reading (note that I renamed _drop_timeout and _reschedule_timeout).

Somewhat related: if the total timeout is not set, this could result in hangs while uploading since there's no write_timeout (could be added to StreamWriter?)

This looks promising to me. Might be cleaner to skip the reschedule_timeout() call in set_response_params() so the timeout is not started initially, and then have a start_timeout() call or something. Or maybe an extra kwarg start_timeout=False?
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client_proto.py#L160

The only other code I see using this is in connector.py, and appears that it should adopt the same logic.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Jan 11, 2023

@webknjaz There's no reason to expect ResponseHandler.set_response_params() to be used outside of the library, is there?

So, should be safe to change the timeout logic in that class?

@dtrifiro dtrifiro force-pushed the fix-chunked-upload-timeouts branch from aa11d7f to 9159c7c Compare January 11, 2023 19:22
@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jan 11, 2023

This looks promising to me. Might be cleaner to skip the reschedule_timeout() call in set_response_params() so the timeout is not started initially, and then have a start_timeout() call or something.

If we want to modify set_response_params, we can just get rid of the _reschedule_timeout and schedule it only when we are finished writing, see the latest changes.

@dtrifiro dtrifiro force-pushed the fix-chunked-upload-timeouts branch from 159730b to bb708d9 Compare January 11, 2023 19:40
Copy link
Member

@Dreamsorcerer Dreamsorcerer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for @webknjaz to confirm as well, but this looks good to me.

@Dreamsorcerer
Copy link
Member

Adding a write timeout, as you mentioned before, is probably a good idea. But, maybe best to do it in a second PR.

CHANGES/7149.bugfix Outdated Show resolved Hide resolved
@dtrifiro dtrifiro force-pushed the fix-chunked-upload-timeouts branch from ca7cc0b to 2c8c17b Compare January 12, 2023 18:58
@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jan 12, 2023

Thanks @Dreamsorcerer. I'll try and find time to implement the write timeout in the next few weeks. I'm still concerned about the whole file being uploaded even if the server immediately replies with an error, but I haven't yet figured it out. Will probably open an issue for that too. edit: this is not an issue

@dtrifiro dtrifiro changed the title client: fix chunked upload timeouts with sock_read client: fix upload timeouts with sock_read set Jan 12, 2023
@Dreamsorcerer
Copy link
Member

Thanks @Dreamsorcerer. I'll try and find time to implement the write timeout in the next few weeks. I'm still concerned about the whole file being uploaded even if the server immediately replies with an error, but I haven't yet figured it out. Will probably open an issue for that too.

Might need a bit of testing, but I'd expect the server to close the connection in that case, which should cause the write code to error.

@dtrifiro
Copy link
Contributor Author

Might need a bit of testing, but I'd expect the server to close the connection in that case, which should cause the write code to error.

That's indeed the behaviour, a simpler test setup shows this. The error I was seeing was due to a blocking call in my code.

dtrifiro and others added 3 commits January 23, 2023 11:12
Start the sock_read timeout handler after the payload body has been fully written.
@dtrifiro dtrifiro force-pushed the fix-chunked-upload-timeouts branch from 2f702a3 to 303d3c0 Compare January 23, 2023 10:12
@dtrifiro
Copy link
Contributor Author

Hi, is there anything I can do to help this get merged? Thanks!

@Dreamsorcerer
Copy link
Member

Think this is good now. Would be great to get that write_timeout in a new PR.

@Dreamsorcerer Dreamsorcerer merged commit fecbe99 into aio-libs:master Feb 11, 2023
@patchback
Copy link
Contributor

patchback bot commented Feb 11, 2023

Backport to 3.9: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply fecbe99 on top of patchback/backports/3.9/fecbe999c7a110fbeba8aa6ba269497435b2870d/pr-7150

Backporting merged PR #7150 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.9/fecbe999c7a110fbeba8aa6ba269497435b2870d/pr-7150 upstream/3.9
  4. Now, cherry-pick PR client: fix upload timeouts with sock_read set #7150 contents into that branch:
    $ git cherry-pick -x fecbe999c7a110fbeba8aa6ba269497435b2870d
    If it'll yell at you with something like fatal: Commit fecbe999c7a110fbeba8aa6ba269497435b2870d is a merge but no -m option was given., add -m 1 as follows intead:
    $ git cherry-pick -m1 -x fecbe999c7a110fbeba8aa6ba269497435b2870d
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR client: fix upload timeouts with sock_read set #7150 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.9/fecbe999c7a110fbeba8aa6ba269497435b2870d/pr-7150
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Dreamsorcerer pushed a commit that referenced this pull request Feb 11, 2023
Prevent the `sock_read` timeout callback from firing by only scheduling
it afterthe payload (if any) has been fully written.

No

Fixes #7149

- [x] I think the code is well written
- [x] Unit tests for the changes exist
- [x] Documentation reflects the changes
- [x] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [x] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
  * ensure type is one of the following:
    * `.feature`: Signifying a new feature.
    * `.bugfix`: Signifying a bug fix.
    * `.doc`: Signifying a documentation improvement.
    * `.removal`: Signifying a deprecation or removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

---------

Co-authored-by: Sam Bull <[email protected]>
(cherry picked from commit fecbe99)
@Dreamsorcerer
Copy link
Member

#7206

Dreamsorcerer added a commit that referenced this pull request Feb 11, 2023
Prevent the `sock_read` timeout callback from firing by only scheduling
it afterthe payload (if any) has been fully written.

No

Fixes #7149

- [x] I think the code is well written
- [x] Unit tests for the changes exist
- [x] Documentation reflects the changes
- [x] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [x] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
* ensure type is one of the following: * `.feature`: Signifying a new
feature. * `.bugfix`: Signifying a bug fix. * `.doc`: Signifying a
documentation improvement. * `.removal`: Signifying a deprecation or
removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

---------

Co-authored-by: Sam Bull <[email protected]>
(cherry picked from commit fecbe99)

<!-- Thank you for your contribution! -->

## What do these changes do?

<!-- Please give a short brief about these changes. -->

## Are there changes in behavior for the user?

<!-- Outline any notable behaviour for the end users. -->

## Related issue number

<!-- Are there any issues opened that will be resolved by merging this
change? -->

## Checklist

- [ ] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [ ] If you provide code modification, please add yourself to
`CONTRIBUTORS.txt`
  * The format is &lt;Name&gt; &lt;Surname&gt;.
  * Please keep alphabetical order, the file is sorted by names.
- [ ] Add a new news fragment into the `CHANGES` folder
  * name it `<issue_id>.<type>` for example (588.bugfix)
* if you don't have an `issue_id` change it to the pr id after creating
the pr
  * ensure type is one of the following:
    * `.feature`: Signifying a new feature.
    * `.bugfix`: Signifying a bug fix.
    * `.doc`: Signifying a documentation improvement.
    * `.removal`: Signifying a deprecation or removal of public API.
* `.misc`: A ticket has been closed, but it is not of interest to users.
* Make sure to use full sentences with correct case and punctuation, for
example: "Fix issue with non-ascii contents in doctest text files."

Co-authored-by: daniele <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

client: unexpected timeouts with chunked uploads when setting sock_read timeout
3 participants