Prevent sticky Pypdl._kwargs['headers']['range'] reference bug when reusing Pypdl object with user-supplied headers and multiple multi-segment downloads #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A simple workaround for a bug with a complicated chain of events, due an oversight in the usage of the
Pypdl._kwargs
attribute.This PR fixes #23.
When creating a
Pypdl
object, one has the ability to pass in arbitrary headers which will then be forwarded later to anaiohttp.ClientSession()
object. These headers are collected within**kwargs
and then stored in theself._kwargs
attribute:pypdl/pypdl/pypdl_manager.py
Lines 44 to 48 in 12a20ad
This dictionary is passed to multiple objects during normal procedure, such as during the
Pypdl._get_header()
method...pypdl/pypdl/pypdl_manager.py
Lines 247 to 252 in 12a20ad
...and the
Pypdl._multi_segment()
method:pypdl/pypdl/pypdl_manager.py
Lines 259 to 270 in 12a20ad
In order to request ranges of bytes to perform multi-segment downloads,
Multidown.worker()
collects the user-supplied arguments in**kwargs
and then later adds arange
header, as this needs to be passed later to anotheraiohttp.ClientSession()
object:pypdl/pypdl/downloader.py
Lines 46 to 48 in 12a20ad
Line 65 updates the
kwargs['headers']
dictionary.pypdl/pypdl/downloader.py
Lines 63 to 66 in 12a20ad
This has an unintended side-effect that:
Pypdl(headers=headers)
, then Line 65 permanently altersPypdl._kwargs
due to Python passingPypdl._kwargs
by reference instead of by value.Pypdl._get_header()
(see Line 249 above), therange
header is still present and will be included in a request that is not meant to include arange
header.Pypdl._get_header()
only expects to receive status code200
in response, however the MDN web docs suggest that servers should respond with206
if arange
header is supplied.206
then thePypdl._get_header()
method falls off and returnsNone
. This has dire consequences when used byPypdl._get_info()
, as it sets the local attributeheader
toNone
...pypdl/pypdl/pypdl_manager.py
Lines 228 to 230 in 12a20ad
get_filepath()
will result in aNone
reference error.pypdl/pypdl/utls.py
Lines 31 to 32 in 12a20ad
Stacktrace:
This bug has likely gone unsighted because it requires:
206
because therange
header was included, instead of ignoring it and simply returning200
.This PR includes a simple addition to the
Multidown.worker()
method that creates a shallow copy ofkwargs['headers']
if it already exists, which disconnects it from the original reference and allows therange
header to be added without affectingPypdl._kwargs['header']
.This fix is faster than alternative patches which may require the use of
deepcopy
to properly disconnect or reset thePypdl._kwargs
attribute, however as a consequence this PR does not prevent the same issue from occurring again (i.e. unintentionally modifying attributes ofPypdl
that have been passed by reference to callee functions).