Error when downloading from servers that doesn't support Head requests #15

schoulten · 2024-04-05T13:33:44Z

I'm getting errors when trying to download some CSV files, even though there are no problems in the URL (tested via browser).

Is this something I'm doing wrong? Any tips on how to debug this?

Code:

from pypdl import Downloader
dl = Downloader()
dl.start('https://www.gov.br/anp/pt-br/centrais-de-conteudo/dados-abertos/arquivos/shpc/dsas/ca/ca-2004-01.csv')

Result:
ERROR:root:(ConnectionError) [Server Returned: Forbidden(403), Invalid URL]

The text was updated successfully, but these errors were encountered:

mjishnu · 2024-04-06T05:43:46Z

this is caused because the head request failed. pypdl first sends a head request to get metadata and to ensure the file exist or not apparently the link you provided doesn't implement the head request support or they don't allow it this cause the head request that pypdl send to fail giving you the error.

its quite an easy fix we just need to add code to send a get request if head request fails this will fix the issue. thanks for reporting this was a bug that i didn't anticipate.

Edit: with v1.3.2 this should be fixed. also the server you are trying to download from seems to have issue with multi segment download

schoulten · 2024-04-06T13:10:22Z

I can confirm that it's working smoothly now with v1.3.2. Thanks a lot!

deepdelirious · 2024-12-11T19:04:30Z

@mjishnu - this seems to have regressed with the move to AIOHttp. The first HEAD request raises an exception because the default raise_for_status=True kwarg, so the second request is never hit.

mjishnu · 2024-12-12T06:17:43Z

@mjishnu - this seems to have regressed with the move to AIOHttp. The first HEAD request raises an exception because the default raise_for_status=True kwarg, so the second request is never hit.

(Pypdl)  12-12-24 11:37:52 - DEBUG: Reset download manager
(Pypdl)  12-12-24 11:37:52 - DEBUG: Downloading, url: https://gamedownloads.rockstargames.com/public/installer/Rockstar-Games-Launcher.exe attempt: 1
(Pypdl)  12-12-24 11:37:52 - DEBUG: Response code: 200
(Pypdl)  12-12-24 11:37:52 - DEBUG: Header acquired from head request
(Pypdl)  12-12-24 11:37:52 - DEBUG: Size acquired from header
(Pypdl)  12-12-24 11:37:52 - DEBUG: ETag acquired from header
(Pypdl)  12-12-24 11:37:52 - DEBUG: Segment table created: {'url': 'https://gamedownloads.rockstargames.com/public/installer/Rockstar-Games-Launcher.exe', 'segments': 2, 'overwrite': True, 0: {'start': 0, 'end': 70046963, 'segment_size': 70046964, 'segment_path': 'Rockstar-Games-Launcher.exe.0'}, 1: {'start': 70046964, 'end': 140093927, 'segment_size': 70046964, 'segment_path': 'Rockstar-Games-Launcher.exe.1'}}
(Pypdl)  12-12-24 11:37:52 - DEBUG: Initiated waiting loop
(Pypdl)  12-12-24 11:37:52 - DEBUG: Multi-Segment download started
(Pypdl)  12-12-24 11:37:55 - DEBUG: Downloaded all segments
(Pypdl)  12-12-24 11:37:56 - DEBUG: Combining files
(Pypdl)  12-12-24 11:37:56 - DEBUG: Exit waiting loop, download completed

(Pypdl)  12-12-24 11:38:00 - DEBUG: Reset download manager
(Pypdl)  12-12-24 11:38:00 - DEBUG: Downloading, url: https://github.com/M2Team/NanaZip/releases/download/3.0.1000.0/NanaZip_3.0.1000.0.msixbundle attempt: 1
(Pypdl)  12-12-24 11:38:01 - DEBUG: Response code: 302
(Pypdl)  12-12-24 11:38:02 - DEBUG: Response code: 200
(Pypdl)  12-12-24 11:38:02 - DEBUG: Header acquired from get request
(Pypdl)  12-12-24 11:38:02 - DEBUG: Size acquired from header
(Pypdl)  12-12-24 11:38:02 - DEBUG: ETag acquired from header
(Pypdl)  12-12-24 11:38:02 - DEBUG: Segment table created: {'url': 'https://github.com/M2Team/NanaZip/releases/download/3.0.1000.0/NanaZip_3.0.1000.0.msixbundle', 'segments': 2, 'overwrite': True, 0: {'start': 0, 'end': 5510801, 'segment_size': 5510802, 'segment_path': 'NanaZip_3.0.1000.0.msixbundle.0'}, 1: {'start': 5510802, 'end': 11021603, 'segment_size': 5510802, 'segment_path': 'NanaZip_3.0.1000.0.msixbundle.1'}}
(Pypdl)  12-12-24 11:38:02 - DEBUG: Initiated waiting loop
(Pypdl)  12-12-24 11:38:02 - DEBUG: Multi-Segment download started
(Pypdl)  12-12-24 11:38:04 - DEBUG: Downloaded all segments
(Pypdl)  12-12-24 11:38:05 - DEBUG: Combining files
(Pypdl)  12-12-24 11:38:05 - DEBUG: Exit waiting loop, download completed

@deepdelirious thanks for reporting it's kind of weird, the URL i was testing on return 302 instead of 404 or 405 so i was not getting this error, i am working on 1.6 so in that i will fix it.

in the mean time just add a try except around the _get_header in your local machine

  async def _get_header(self, url):
      try:
          async with aiohttp.ClientSession() as session:
              async with session.head(url, **self._kwargs) as response:
                  if response.status == 200:
                      self.logger.debug("Header acquired from head request")
                      return response.headers
      except Exception:
          pass

          async with session.get(url, **self._kwargs) as response:
              if response.status == 200:
                  self.logger.debug("Header acquired from get request")
                  return response.headers

      raise Exception(
          f"Failed to get header (Status: {response.status}, Reason: {response.reason})"
      )

this should fix it

mjishnu · 2025-01-03T11:37:07Z

@mjishnu - this seems to have regressed with the move to AIOHttp. The first HEAD request raises an exception because the default raise_for_status=True kwarg, so the second request is never hit.

i have fixed this in the latest version you can check it out at https://test.pypi.org/project/pypdl/

mjishnu changed the title ~~ERROR:root:(ConnectionError) [Server Returned: Forbidden(403), Invalid URL]~~ Error when downloading from servers that doesn't support Head requests Apr 6, 2024

mjishnu added the bug Something isn't working label Apr 6, 2024

mjishnu added a commit that referenced this issue Apr 6, 2024

retrive metadata using GET if HEAD request fails(#15)

fd06bf2

schoulten closed this as completed Apr 6, 2024

mjishnu reopened this Dec 12, 2024

mjishnu linked a pull request Dec 24, 2024 that will close this issue

v1.5 #30

Merged

mjishnu closed this as completed in #30 Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when downloading from servers that doesn't support Head requests #15

Error when downloading from servers that doesn't support Head requests #15

schoulten commented Apr 5, 2024

mjishnu commented Apr 6, 2024 •

edited

Loading

schoulten commented Apr 6, 2024

deepdelirious commented Dec 11, 2024

mjishnu commented Dec 12, 2024 •

edited

Loading

mjishnu commented Jan 3, 2025

Error when downloading from servers that doesn't support Head requests #15

Error when downloading from servers that doesn't support Head requests #15

Comments

schoulten commented Apr 5, 2024

mjishnu commented Apr 6, 2024 • edited Loading

schoulten commented Apr 6, 2024

deepdelirious commented Dec 11, 2024

mjishnu commented Dec 12, 2024 • edited Loading

mjishnu commented Jan 3, 2025

mjishnu commented Apr 6, 2024 •

edited

Loading

mjishnu commented Dec 12, 2024 •

edited

Loading