[web, url] Move header-based encoding detection to web.get

url.py did this really good thing where it would look for the encoding in the HTTP headers and use them to .decode() the string returned from web.get. web.get() gained (not too long ago) the ability to decode text itself, but it was doing it to utf8 by default. So instead of hardcoding utf8 in web.get, I moved that detection functionality from url to web, so now all modules using web.get will be able to enjoy it.
maxpowa · Aug 6, 2014 · 228827e · 228827e
1 parent bbed119
commit 228827e
Showing 1 changed file with 4 additions and 16 deletions.
diff --git a/url.py b/url.py
@@ -207,22 +207,10 @@ def check_callbacks(bot, trigger, url, run=True):
 
 def find_title(url):
     """Return the title for the given URL."""
-    content, headers = web.get(url, return_headers=True, limit_bytes=max_bytes,
-                               dont_decode=True)
-    content_type = headers.get('Content-Type') or ''
-    encoding_match = re.match('.*?charset *= *(\S+)', content_type)
-    # If they gave us something else instead, try that
-    if encoding_match:
-        try:
-            content = content.decode(encoding_match.group(1))
-        except:
-            encoding_match = None
-    # They didn't tell us what they gave us, so go with UTF-8 or fail silently.
-    if not encoding_match:
-        try:
-            content = content.decode('utf-8')
-        except:
-            return
+    try:
+        content, headers = web.get(url, return_headers=True, limit_bytes=max_bytes)
+    except UnicodeDecodeError:
+        return # Fail silently when data can't be decoded
 
     # Some cleanup that I don't really grok, but was in the original, so
     # we'll keep it (with the compiled regexes made global) for now.