Skip to content

Commit

Permalink
Document encoding detection for unknown encodings. (#2733)
Browse files Browse the repository at this point in the history
Sometimes, cchardet might detect encodings that Python doesn't know. In
such cases, `.text()` function might raise a `LookupError`, and
`get_encoding` may return values that are unsafe to pass to `bytes.decode()`
or to `.text()` functions.

Closes #2732
  • Loading branch information
gilbsgilbs authored and asvetlov committed Feb 13, 2018
1 parent 185e3f7 commit c54f1a8
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGES/2732.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Document behavior when cchardet detects encodings that are unknown to Python.
7 changes: 7 additions & 0 deletions docs/client_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1170,6 +1170,9 @@ Response object

:return str: decoded *BODY*

:raise LookupError: if the encoding detected by chardet or cchardet is
unknown by Python (e.g. VISCII).

.. note::

If response has no ``charset`` info in ``Content-Type`` HTTP
Expand Down Expand Up @@ -1223,6 +1226,10 @@ Response object
are no appropriate codecs for encoding then :term:`cchardet` /
:term:`chardet` is used.

Beware that it is not always safe to use the result of this function to
decode a response. Some encodings detected by cchardet are not known by
Python (e.g. VISCII).

.. versionadded:: 3.0


Expand Down

0 comments on commit c54f1a8

Please sign in to comment.