diff --git a/CHANGES/5930.feature b/CHANGES/5930.feature new file mode 100644 index 00000000000..17cecee40d9 --- /dev/null +++ b/CHANGES/5930.feature @@ -0,0 +1 @@ +Switched ``chardet`` to ``charset-normalizer`` for guessing the HTTP payload body encoding -- :user:`Ousret`. diff --git a/CONTRIBUTORS.txt b/CONTRIBUTORS.txt index d6d9197cccd..bef1c39b6be 100644 --- a/CONTRIBUTORS.txt +++ b/CONTRIBUTORS.txt @@ -7,6 +7,7 @@ Adam Horacek Adam Mills Adrian Krupa Adrián Chaves +Ahmed Tahri Alan Tse Alec Hanefeld Alejandro Gómez diff --git a/README.rst b/README.rst index 7ee16ecb365..d057acbe2f1 100644 --- a/README.rst +++ b/README.rst @@ -161,14 +161,14 @@ Requirements - Python >= 3.6 - async-timeout_ - attrs_ -- chardet_ +- charset-normalizer_ - multidict_ - yarl_ Optionally you may install the cChardet_ and aiodns_ libraries (highly recommended for sake of speed). -.. _chardet: https://pypi.python.org/pypi/chardet +.. _charset-normalizer: https://pypi.org/project/charset-normalizer .. _aiodns: https://pypi.python.org/pypi/aiodns .. _attrs: https://github.com/python-attrs/attrs .. _multidict: https://pypi.python.org/pypi/multidict diff --git a/aiohttp/client_reqrep.py b/aiohttp/client_reqrep.py index db1f379ab55..f69e2242a48 100644 --- a/aiohttp/client_reqrep.py +++ b/aiohttp/client_reqrep.py @@ -69,7 +69,7 @@ try: import cchardet as chardet except ImportError: # pragma: no cover - import chardet # type: ignore[no-redef] + import charset_normalizer as chardet # type: ignore[no-redef] __all__ = ("ClientRequest", "ClientResponse", "RequestInfo", "Fingerprint") diff --git a/docs/client_reference.rst b/docs/client_reference.rst index 13697a3a718..340f7adf5b2 100644 --- a/docs/client_reference.rst +++ b/docs/client_reference.rst @@ -1374,10 +1374,10 @@ Response object specified *encoding* parameter. If *encoding* is ``None`` content encoding is autocalculated - using ``Content-Type`` HTTP header and *chardet* tool if the + using ``Content-Type`` HTTP header and *charset-normalizer* tool if the header is not provided by server. - :term:`cchardet` is used with fallback to :term:`chardet` if + :term:`cchardet` is used with fallback to :term:`charset-normalizer` if *cchardet* is not available. Close underlying connection if data reading gets an error, @@ -1389,14 +1389,14 @@ Response object :return str: decoded *BODY* - :raise LookupError: if the encoding detected by chardet or cchardet is + :raise LookupError: if the encoding detected by cchardet is unknown by Python (e.g. VISCII). .. note:: If response has no ``charset`` info in ``Content-Type`` HTTP - header :term:`cchardet` / :term:`chardet` is used for content - encoding autodetection. + header :term:`cchardet` / :term:`charset-normalizer` is used for + content encoding autodetection. It may hurt performance. If page encoding is known passing explicit *encoding* parameter might help:: @@ -1411,7 +1411,7 @@ Response object a ``read`` call will be done, If *encoding* is ``None`` content encoding is autocalculated - using :term:`cchardet` or :term:`chardet` as fallback if + using :term:`cchardet` or :term:`charset-normalizer` as fallback if *cchardet* is not available. if response's `content-type` does not match `content_type` parameter @@ -1449,11 +1449,11 @@ Response object Automatically detect content encoding using ``charset`` info in ``Content-Type`` HTTP header. If this info is not exists or there are no appropriate codecs for encoding then :term:`cchardet` / - :term:`chardet` is used. + :term:`charset-normalizer` is used. Beware that it is not always safe to use the result of this function to decode a response. Some encodings detected by cchardet are not known by - Python (e.g. VISCII). + Python (e.g. VISCII). *charset-normalizer* is not concerned by that issue. :raise RuntimeError: if called before the body has been read, for :term:`cchardet` usage diff --git a/docs/glossary.rst b/docs/glossary.rst index bc5e1169c33..42b063a95e0 100644 --- a/docs/glossary.rst +++ b/docs/glossary.rst @@ -32,11 +32,12 @@ Any object that can be called. Use :func:`callable` to check that. - chardet + charset-normalizer - The Universal Character Encoding Detector + The Real First Universal Charset Detector. + Open, modern and actively maintained alternative to Chardet. - https://pypi.python.org/pypi/chardet/ + https://pypi.org/project/charset-normalizer/ cchardet diff --git a/docs/index.rst b/docs/index.rst index 4091c001993..c6e7086be33 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -34,7 +34,7 @@ Library Installation $ pip install aiohttp You may want to install *optional* :term:`cchardet` library as faster -replacement for :term:`chardet`: +replacement for :term:`charset-normalizer`: .. code-block:: bash @@ -51,7 +51,7 @@ This option is highly recommended: Installing speedups altogether ------------------------------ -The following will get you ``aiohttp`` along with :term:`chardet`, +The following will get you ``aiohttp`` along with :term:`charset-normalizer`, :term:`aiodns` and ``Brotli`` in one bundle. No need to type separate commands anymore! @@ -149,11 +149,11 @@ Dependencies - Python 3.6+ - *async_timeout* - *attrs* -- *chardet* +- *charset-normalizer* - *multidict* - *yarl* - *Optional* :term:`cchardet` as faster replacement for - :term:`chardet`. + :term:`charset-normalizer`. Install it explicitly via: diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index 1d9a374e47a..728beebb0cf 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -122,6 +122,7 @@ canonicalization canonicalize cchardet ceil +Chardet charset charsetdetect chunked @@ -226,6 +227,7 @@ namespace netrc nginx noop +normalizer nowait optimizations os diff --git a/requirements/base.txt b/requirements/base.txt index 90c77b083ac..51c1e4a5705 100644 --- a/requirements/base.txt +++ b/requirements/base.txt @@ -8,7 +8,7 @@ asynctest==0.13.0; python_version<"3.8" attrs==21.2.0 Brotli==1.0.9 cchardet==2.1.7 -chardet==4.0.0 +charset-normalizer==2.0.4 frozenlist==1.2.0 gunicorn==20.1.0 idna-ssl==1.1.0; python_version<"3.7" diff --git a/requirements/dev.txt b/requirements/dev.txt index 58a87943713..d036984fda5 100644 --- a/requirements/dev.txt +++ b/requirements/dev.txt @@ -49,7 +49,7 @@ cfgv==3.2.0 # via # -r requirements/lint.txt # pre-commit -chardet==4.0.0 +charset-normalizer==2.0.4 # via # -r requirements/base.txt # requests diff --git a/setup.py b/setup.py index 61462129b40..56d8814c631 100644 --- a/setup.py +++ b/setup.py @@ -64,7 +64,7 @@ def build_extension(self, ext): install_requires = [ "attrs>=17.3.0", - "chardet>=2.0,<5.0", + "charset-normalizer>=2.0,<3.0", "multidict>=4.5,<7.0", "async_timeout>=4.0.0a3,<5.0", 'asynctest==0.13.0; python_version<"3.8"',