Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump chardet from 3.0.4 to 4.0.0 #5334

Merged
merged 1 commit into from
Dec 11, 2020
Merged

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 11, 2020

Bumps chardet from 3.0.4 to 4.0.0.

Release notes

Sourced from chardet's releases.

chardet 4.0.0

⚠️ This will be the last release of chardet to support Python 2.7. chardet 5.0 will only support 3.6+ ⚠️

Major Changes

This release is multiple years in the making, and provides so quality of life improvements to chardet. The primary user-facing changes are:

  1. Single-byte charset probers now use nested dictionaries under the hood, so they are usually a little faster than before. (See #121 for details)
  2. The CharsetGroupProber class now properly short-circuits when one of the probers in the group is considered a definite match. This lead to a substantial speedup.
  3. There is now a chardet.detect_all function that returns a list of possible encodings for the input with associated confidences.
  4. We have dropped support for Python 2.6, 3.4, and 3.5 as they are all past end-of-life.

The changes in this release have also laid the groundwork for retraining the models to make them more accurate, and to support some more encodings/languages (see #99 for progress). This is our main focus for chardet 5.0 (beyond dropping Python 2 support).

Benchmarks

Running on a MacBook Pro (15-inch, 2018) with 2.2GHz 6-core i7 processor and 32GB RAM

old version (chardet 3.0.4)

Benchmarking chardet 3.0.4 on CPython 3.7.5 (default, Sep  8 2020, 12:19:42)
[Clang 11.0.3 (clang-1103.0.32.62)]
--------------------------------------------------------------------------------
Calls per second for each encoding:
ascii: 25559.439366240098
big5: 7.187002209518091
cp932: 4.71090956645177
cp949: 2.937256786994428
euc-jp: 4.870580412090848
euc-kr: 6.6910755971933416
euc-tw: 87.71098043480079
gb2312: 6.614302607154443
ibm855: 27.595893549680685
ibm866: 29.93483661732791
iso-2022-jp: 3379.5052775763434
iso-2022-kr: 26181.67290886392
iso-8859-1: 120.63424740403983
iso-8859-5: 32.65106262196898
iso-8859-7: 62.480089080556084
koi8-r: 13.72481001727257
maccyrillic: 33.018537255804496
shift_jis: 4.996013583677438
tis-620: 14.323112928341818
utf-16: 166771.53081510935
utf-32: 198782.18009478672
utf-8: 13.966236809766901
utf-8-sig: 193732.28637413395
windows-1251: 23.038910006925768
</tr></table> 

... (truncated)

Commits
  • a808ed1 Merge pull request #140 from chardet/master
  • 53854fb Add language to detect_all output
  • 1e208b7 Properly set CharsetGroupProber.state to FOUND_IT (#203)
  • a9286f7 Try to switch from Travis to GitHub Actions (#204)
  • 1db0347 Handle weird logging edge case in universaldetector.py
  • 056a2a4 Remove shebang and executable bit from chardet/cli/chardetect.py (#171)
  • 55ef330 Update links (#152)
  • e4290b6 Remove unnecessary numeric placeholders from format strings (#176)
  • 6a59c4b Remove use of deprecated 'setup.py test' (#187)
  • 4650dbf Remove shebang from nonexecutable script (#192)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added autosquash dependencies Pull requests that update a dependency file labels Dec 11, 2020
@asvetlov asvetlov merged commit 0c4e57f into 3.8 Dec 11, 2020
@asvetlov asvetlov deleted the dependabot/pip/3.8/chardet-4.0.0 branch December 11, 2020 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autosquash dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant