Skip to content

Commit

Permalink
Make greg work with broken enclosure links
Browse files Browse the repository at this point in the history
Taken from
http://stackoverflow.com/questions/120951/how-can-i-normalize-a-url-in-python,
referring to http://bugs.python.org/issue918368.

Enclosure links with spaces in them were causing greg to fail when
attempting to download them, or store them in the list of seen
enclosures. This fix uses 'quote' from 'urllib' to replace funny
characters in the url with escape sequences. I.e. " " becomes "%20".
  • Loading branch information
Filip Balos committed Jun 22, 2016
1 parent 3a079e4 commit f8c7c97
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions greg/greg.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from itertools import filterfalse
from urllib.request import urlretrieve
from urllib.parse import urlparse
from urllib.parse import quote
from urllib.error import URLError
from lxml import etree as ET

Expand Down Expand Up @@ -291,8 +292,10 @@ def download_entry(self, entry):
downloaded = False
ignoreenclosures = self.retrieve_config('ignoreenclosures', 'no')
notype = self.retrieve_config('notype', 'no')
# Clean up urls
if ignoreenclosures == 'no':
for enclosure in entry.enclosures:
enclosure["href"] = quote(enclosure["href"], safe="%/:=&?~#+!$,;'@()*[]") #Clean up url
if notype == 'yes':
downloadlinks[urlparse(enclosure["href"]).path.split(
"/")[-1]] = enclosure["href"]
Expand All @@ -313,6 +316,7 @@ def download_entry(self, entry):
"option in your greg.conf", file=sys.stderr,
flush=True)
else:
entry.link = quote(entry.link, safe="%/:=&?~#+!$,;'@()*[]")
downloadlinks[urlparse(entry.link).query.split(
"/")[-1]] = entry.link
for podname in downloadlinks:
Expand Down

0 comments on commit f8c7c97

Please sign in to comment.