Response file object (returned by `InfoExtractor._request_webpage`) may be closed for failed requests (matching `expected_status`) on Python 3.4.1+ #17195

puxlit · 2018-08-09T12:26:47Z

Make sure you are using the latest version: run `youtube-dl --version` and ensure your version is 2018.08.04. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

I've verified and I assure that I'm running youtube-dl 2018.08.04

Before submitting an issue make sure you have:

At least skimmed through the README, most notably the FAQ and BUGS sections
Searched the bugtracker for similar issues including closed ones
Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

Bug report (encountered problems with youtube-dl)
Site support request (request for adding support for a new site)
Feature request (request for a new functionality)
Question
Other

Since bpo-15002 (introduced in Python 3.4.1), HTTPErrors close their fp when the error's destroyed. The current implementation of InfoExtractor._request_webpage (used by InfoExtractor._download_webpage_handle and in turn by InfoExtractor._download_{webpage, xml, and json}) accommodates for expected_status by catching HTTPErrors and returning this fp. Unfortunately, this means subsequent reads against this file object by the caller are unreliable.

If fp is an instance of http.client.HTTPResponse, we read out an empty response body.
If fp is an instance of urllib.response.addinfourl (for when youtube-dl handles gzip and deflate responses), the attempted read raises a ValueError: I/O operation on closed file exception, as demonstrated in I/O operation on closed file. error on Python 3.7 #17447.
On Windows, tempfile._TemporaryFileCloser omits an implementation of __del__ that would close fp, so reads return successfully. This platform inconsistency has been reported as bpo-34958.

Fortunately, the number of extractors that make use of expected_status is small; as of 2018.08.04, it's just bbc, lynda, markiza, and twitch.

Issue encountered whilst debugging reports of problems running @bato3's fix for #17116 on Python 3.7.

The text was updated successfully, but these errors were encountered:

dstftw · 2018-08-09T16:57:45Z

I don't see any problem here. By using expected_status you treat potentially failed outcomes as normal thus you should be ready for consequences like closed connection. It's the responsibility of a client code to check whether connection was closed or not in such cases.
Also if you want error then don't use expected_status and catch exception instead. The whole point of expected_status is to simplify code in cases when success and failure are both expressed in the same way. For example, when _download_json always returns JSON (e.g. for 404 and for 403) so that in 403 scenario you don't need to catch HTTPError, read output and parse it in client code.

puxlit · 2018-08-09T17:41:09Z

@dstftw, we're not talking about a closed connection here, we're talking about the response body being inaccessible, which defeats the point of expected_status. The expectation is that if you call _download_webpage with expected_status, you'll get the response body back if the request was successful or if the response code matches expected_status. But this is not the case; on expected errors, the response file object returned by _request_webpage will be closed by the time _download_webpage_handle tries to extract its contents with _webpage_read_content.

So unless you're saying that in usages like page = self._download_webpage('https://httpbin.org/status/418', None, expected_status=418), we're to expect an empty string if the request returns a 418, this is definitely a bug.

…efore it can be read if it matches expected_status (resolves ytdl-org#17195)

…efore it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)

* 'master' of https://github.com/rg3/youtube-dl: (186 commits) release 2018.11.07 [ChangeLog] Actualize [ci skip] [youtube] Add another JS signature function name regex (closes #18091, closes #18093, closes #18094) [facebook] fix tahoe request(closes #17171) [cliphinter] Fix extraction (closes #18083) [youtube:playlist] Add support for invidio.us (closes #18077) [osnateltv] Update host [zattoo] Arrange API hosts for derived extractors (closes #18035) [README.md] Improve documentation on safe metadata extraction and add more examples [youtube] Add fallback metadata extraction from videoDetails (closes #18052) release 2018.11.03 [ChangeLog] Actualize [ci skip] [laola1tv:embed] Set correct stream access URL scheme (closes #16341) [ehftv] Add extractor (closes #15408) [azmedien] Simplify (closes #17746) [azmedien] Adopt to major site redesign (closes #17745) [extractor/common] Ensure response handle is not prematurely closed before it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447) [twitcasting] Improve extraction and fix issues (closes #17981) [twitcasting] Add extractor [orf:tvthek] Improve extraction and remove unused code (closes #17956, closes #18024) ...

…efore it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)

dstftw closed this as completed Aug 9, 2018

puxlit added a commit to puxlit/youtube-dl that referenced this issue Aug 10, 2018

[extractor/common] Ensure response handle is not prematurely closed b…

8baec14

…efore it can be read if it matches expected_status (resolves ytdl-org#17195)

puxlit mentioned this issue Aug 10, 2018

[extractor/common] Ensure response handle is not prematurely closed before it can be read if it matches expected_status (resolves #17195) #17199

Merged

9 tasks

puxlit added a commit to puxlit/youtube-dl that referenced this issue Oct 9, 2018

[test_InfoExtractor] Add test case for ytdl-org#17195

9e351ab

puxlit mentioned this issue Oct 9, 2018

[test_InfoExtractor] Add test case for #17195 #17846

Closed

9 tasks

puxlit added a commit to puxlit/youtube-dl that referenced this issue Oct 9, 2018

[extractor/common] Ensure response handle is not prematurely closed b…

43403f2

…efore it can be read if it matches expected_status (resolves ytdl-org#17195)

puxlit mentioned this issue Oct 11, 2018

I/O operation on closed file. error on Python 3.7 #17447

Closed

9 tasks

dstftw pushed a commit that referenced this issue Nov 2, 2018

[extractor/common] Ensure response handle is not prematurely closed b…

95e42d7

…efore it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)

lkho referenced this issue in lkho/youtube-dl Dec 24, 2018

[extractor/common] Ensure response handle is not prematurely closed b…

555a737

…efore it can be read if it matches expected_status (resolves #17195, closes #17846, resolves #17447)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response file object (returned by `InfoExtractor._request_webpage`) may be closed for failed requests (matching `expected_status`) on Python 3.4.1+ #17195

Response file object (returned by `InfoExtractor._request_webpage`) may be closed for failed requests (matching `expected_status`) on Python 3.4.1+ #17195

puxlit commented Aug 9, 2018 •

edited

Loading

dstftw commented Aug 9, 2018

puxlit commented Aug 9, 2018 •

edited

Loading

Response file object (returned by InfoExtractor._request_webpage) may be closed for failed requests (matching expected_status) on Python 3.4.1+ #17195

Response file object (returned by InfoExtractor._request_webpage) may be closed for failed requests (matching expected_status) on Python 3.4.1+ #17195

Comments

puxlit commented Aug 9, 2018 • edited Loading

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.08.04. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?

dstftw commented Aug 9, 2018

puxlit commented Aug 9, 2018 • edited Loading

Response file object (returned by `InfoExtractor._request_webpage`) may be closed for failed requests (matching `expected_status`) on Python 3.4.1+ #17195

Response file object (returned by `InfoExtractor._request_webpage`) may be closed for failed requests (matching `expected_status`) on Python 3.4.1+ #17195

puxlit commented Aug 9, 2018 •

edited

Loading

Make sure you are using the latest version: run `youtube-dl --version` and ensure your version is 2018.08.04. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

puxlit commented Aug 9, 2018 •

edited

Loading