Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response file object (returned by InfoExtractor._request_webpage) may be closed for failed requests (matching expected_status) on Python 3.4.1+ #17195

Closed
5 of 9 tasks
puxlit opened this issue Aug 9, 2018 · 2 comments · Fixed by #17199

Comments

@puxlit
Copy link
Contributor

puxlit commented Aug 9, 2018

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.08.04. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2018.08.04

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

Since bpo-15002 (introduced in Python 3.4.1), HTTPErrors close their fp when the error's destroyed. The current implementation of InfoExtractor._request_webpage (used by InfoExtractor._download_webpage_handle and in turn by InfoExtractor._download_{webpage, xml, and json}) accommodates for expected_status by catching HTTPErrors and returning this fp. Unfortunately, this means subsequent reads against this file object by the caller are unreliable.

  • If fp is an instance of http.client.HTTPResponse, we read out an empty response body.
  • If fp is an instance of urllib.response.addinfourl (for when youtube-dl handles gzip and deflate responses), the attempted read raises a ValueError: I/O operation on closed file exception, as demonstrated in I/O operation on closed file. error on Python 3.7 #17447.
  • On Windows, tempfile._TemporaryFileCloser omits an implementation of __del__ that would close fp, so reads return successfully. This platform inconsistency has been reported as bpo-34958.

Fortunately, the number of extractors that make use of expected_status is small; as of 2018.08.04, it's just bbc, lynda, markiza, and twitch.

Issue encountered whilst debugging reports of problems running @bato3's fix for #17116 on Python 3.7.

@dstftw
Copy link
Collaborator

dstftw commented Aug 9, 2018

I don't see any problem here. By using expected_status you treat potentially failed outcomes as normal thus you should be ready for consequences like closed connection. It's the responsibility of a client code to check whether connection was closed or not in such cases.
Also if you want error then don't use expected_status and catch exception instead. The whole point of expected_status is to simplify code in cases when success and failure are both expressed in the same way. For example, when _download_json always returns JSON (e.g. for 404 and for 403) so that in 403 scenario you don't need to catch HTTPError, read output and parse it in client code.

@dstftw dstftw closed this as completed Aug 9, 2018
@puxlit
Copy link
Contributor Author

puxlit commented Aug 9, 2018

@dstftw, we're not talking about a closed connection here, we're talking about the response body being inaccessible, which defeats the point of expected_status. The expectation is that if you call _download_webpage with expected_status, you'll get the response body back if the request was successful or if the response code matches expected_status. But this is not the case; on expected errors, the response file object returned by _request_webpage will be closed by the time _download_webpage_handle tries to extract its contents with _webpage_read_content.

So unless you're saying that in usages like page = self._download_webpage('https://httpbin.org/status/418', None, expected_status=418), we're to expect an empty string if the request returns a 418, this is definitely a bug.

puxlit added a commit to puxlit/youtube-dl that referenced this issue Aug 10, 2018
…efore it can be read if it matches expected_status (resolves ytdl-org#17195)
puxlit added a commit to puxlit/youtube-dl that referenced this issue Oct 9, 2018
puxlit added a commit to puxlit/youtube-dl that referenced this issue Oct 9, 2018
…efore it can be read if it matches expected_status (resolves ytdl-org#17195)
dstftw pushed a commit that referenced this issue Nov 2, 2018
…efore it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)
Khang-NT referenced this issue in Khang-NT/youtube-dl Nov 7, 2018
* 'master' of https://github.com/rg3/youtube-dl: (186 commits)
  release 2018.11.07
  [ChangeLog] Actualize [ci skip]
  [youtube] Add another JS signature function name regex (closes #18091, closes #18093, closes #18094)
  [facebook] fix tahoe request(closes #17171)
  [cliphinter] Fix extraction (closes #18083)
  [youtube:playlist] Add support for invidio.us (closes #18077)
  [osnateltv] Update host
  [zattoo] Arrange API hosts for derived extractors (closes #18035)
  [README.md] Improve documentation on safe metadata extraction and add more examples
  [youtube] Add fallback metadata extraction from videoDetails (closes #18052)
  release 2018.11.03
  [ChangeLog] Actualize [ci skip]
  [laola1tv:embed] Set correct stream access URL scheme (closes #16341)
  [ehftv] Add extractor (closes #15408)
  [azmedien] Simplify (closes #17746)
  [azmedien] Adopt to major site redesign (closes #17745)
  [extractor/common] Ensure response handle is not prematurely closed before it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)
  [twitcasting] Improve extraction and fix issues (closes #17981)
  [twitcasting] Add extractor
  [orf:tvthek] Improve extraction and remove unused code (closes #17956, closes #18024)
  ...
lkho referenced this issue in lkho/youtube-dl Dec 24, 2018
…efore it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants