Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when downloading from servers that doesn't support Head requests #15

Closed
schoulten opened this issue Apr 5, 2024 · 5 comments · Fixed by #30
Closed

Error when downloading from servers that doesn't support Head requests #15

schoulten opened this issue Apr 5, 2024 · 5 comments · Fixed by #30
Labels
bug Something isn't working

Comments

@schoulten
Copy link

I'm getting errors when trying to download some CSV files, even though there are no problems in the URL (tested via browser).

Is this something I'm doing wrong? Any tips on how to debug this?

Code:

from pypdl import Downloader
dl = Downloader()
dl.start('https://www.gov.br/anp/pt-br/centrais-de-conteudo/dados-abertos/arquivos/shpc/dsas/ca/ca-2004-01.csv')

Result:
ERROR:root:(ConnectionError) [Server Returned: Forbidden(403), Invalid URL]

@mjishnu
Copy link
Owner

mjishnu commented Apr 6, 2024

this is caused because the head request failed. pypdl first sends a head request to get metadata and to ensure the file exist or not apparently the link you provided doesn't implement the head request support or they don't allow it this cause the head request that pypdl send to fail giving you the error.

its quite an easy fix we just need to add code to send a get request if head request fails this will fix the issue. thanks for reporting this was a bug that i didn't anticipate.

Edit: with v1.3.2 this should be fixed. also the server you are trying to download from seems to have issue with multi segment download

@mjishnu mjishnu changed the title ERROR:root:(ConnectionError) [Server Returned: Forbidden(403), Invalid URL] Error when downloading from servers that doesn't support Head requests Apr 6, 2024
@mjishnu mjishnu added the bug Something isn't working label Apr 6, 2024
@schoulten
Copy link
Author

I can confirm that it's working smoothly now with v1.3.2. Thanks a lot!

@deepdelirious
Copy link

@mjishnu - this seems to have regressed with the move to AIOHttp. The first HEAD request raises an exception because the default raise_for_status=True kwarg, so the second request is never hit.

@mjishnu
Copy link
Owner

mjishnu commented Dec 12, 2024

@mjishnu - this seems to have regressed with the move to AIOHttp. The first HEAD request raises an exception because the default raise_for_status=True kwarg, so the second request is never hit.

(Pypdl)  12-12-24 11:37:52 - DEBUG: Reset download manager
(Pypdl)  12-12-24 11:37:52 - DEBUG: Downloading, url: https://gamedownloads.rockstargames.com/public/installer/Rockstar-Games-Launcher.exe attempt: 1
(Pypdl)  12-12-24 11:37:52 - DEBUG: Response code: 200
(Pypdl)  12-12-24 11:37:52 - DEBUG: Header acquired from head request
(Pypdl)  12-12-24 11:37:52 - DEBUG: Size acquired from header
(Pypdl)  12-12-24 11:37:52 - DEBUG: ETag acquired from header
(Pypdl)  12-12-24 11:37:52 - DEBUG: Segment table created: {'url': 'https://gamedownloads.rockstargames.com/public/installer/Rockstar-Games-Launcher.exe', 'segments': 2, 'overwrite': True, 0: {'start': 0, 'end': 70046963, 'segment_size': 70046964, 'segment_path': 'Rockstar-Games-Launcher.exe.0'}, 1: {'start': 70046964, 'end': 140093927, 'segment_size': 70046964, 'segment_path': 'Rockstar-Games-Launcher.exe.1'}}
(Pypdl)  12-12-24 11:37:52 - DEBUG: Initiated waiting loop
(Pypdl)  12-12-24 11:37:52 - DEBUG: Multi-Segment download started
(Pypdl)  12-12-24 11:37:55 - DEBUG: Downloaded all segments
(Pypdl)  12-12-24 11:37:56 - DEBUG: Combining files
(Pypdl)  12-12-24 11:37:56 - DEBUG: Exit waiting loop, download completed

(Pypdl)  12-12-24 11:38:00 - DEBUG: Reset download manager
(Pypdl)  12-12-24 11:38:00 - DEBUG: Downloading, url: https://github.com/M2Team/NanaZip/releases/download/3.0.1000.0/NanaZip_3.0.1000.0.msixbundle attempt: 1
(Pypdl)  12-12-24 11:38:01 - DEBUG: Response code: 302
(Pypdl)  12-12-24 11:38:02 - DEBUG: Response code: 200
(Pypdl)  12-12-24 11:38:02 - DEBUG: Header acquired from get request
(Pypdl)  12-12-24 11:38:02 - DEBUG: Size acquired from header
(Pypdl)  12-12-24 11:38:02 - DEBUG: ETag acquired from header
(Pypdl)  12-12-24 11:38:02 - DEBUG: Segment table created: {'url': 'https://github.com/M2Team/NanaZip/releases/download/3.0.1000.0/NanaZip_3.0.1000.0.msixbundle', 'segments': 2, 'overwrite': True, 0: {'start': 0, 'end': 5510801, 'segment_size': 5510802, 'segment_path': 'NanaZip_3.0.1000.0.msixbundle.0'}, 1: {'start': 5510802, 'end': 11021603, 'segment_size': 5510802, 'segment_path': 'NanaZip_3.0.1000.0.msixbundle.1'}}
(Pypdl)  12-12-24 11:38:02 - DEBUG: Initiated waiting loop
(Pypdl)  12-12-24 11:38:02 - DEBUG: Multi-Segment download started
(Pypdl)  12-12-24 11:38:04 - DEBUG: Downloaded all segments
(Pypdl)  12-12-24 11:38:05 - DEBUG: Combining files
(Pypdl)  12-12-24 11:38:05 - DEBUG: Exit waiting loop, download completed

@deepdelirious thanks for reporting it's kind of weird, the URL i was testing on return 302 instead of 404 or 405 so i was not getting this error, i am working on 1.6 so in that i will fix it.

in the mean time just add a try except around the _get_header in your local machine

  async def _get_header(self, url):
      try:
          async with aiohttp.ClientSession() as session:
              async with session.head(url, **self._kwargs) as response:
                  if response.status == 200:
                      self.logger.debug("Header acquired from head request")
                      return response.headers
      except Exception:
          pass

          async with session.get(url, **self._kwargs) as response:
              if response.status == 200:
                  self.logger.debug("Header acquired from get request")
                  return response.headers

      raise Exception(
          f"Failed to get header (Status: {response.status}, Reason: {response.reason})"
      )

this should fix it

@mjishnu mjishnu reopened this Dec 12, 2024
@mjishnu mjishnu linked a pull request Dec 24, 2024 that will close this issue
Merged
@mjishnu
Copy link
Owner

mjishnu commented Jan 3, 2025

@mjishnu - this seems to have regressed with the move to AIOHttp. The first HEAD request raises an exception because the default raise_for_status=True kwarg, so the second request is never hit.

i have fixed this in the latest version you can check it out at https://test.pypi.org/project/pypdl/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants