Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode everywhere #442

Closed
pombredanne opened this issue Jan 10, 2017 · 3 comments
Closed

Unicode everywhere #442

pombredanne opened this issue Jan 10, 2017 · 3 comments

Comments

@pombredanne
Copy link
Member

As a prep for #295 and for general sanity we should use unicode everywhere by default.

Today several pieces of code are dealing with a hodgepodge of bytes or unicode strings in a semi-organized way (this is an euphemism).

We should instead have a simpler approach:

  • use unicode as default everywhere (e.g. for now using from __future__ import unicode_literals on every Python file and using six text types or similar to handle proper Python 2/3 compat where needed)
  • always convert and decode to unicode at the boundaries when ingesting content or files or paths. By boundaries I mean when files or paths are ingested or reported. And from then on only process unicode.
  • and convert back to encoded bytes at the boundaries when needed (e.g UTF-8 or filesystem encoding for external paths consumption) at the boundaries when reporting out.
  • deal with bytes rather than unicode explicitly and by exception whenever low level byte handling is required. There are only a few places that should be needing this
@mjherzog
Copy link
Member

ScanCode TK v 2.0.0rc2 incorrectly reports "@" sign from a scan as "%40". This occurs with Angular sub-components from https://github.com/angular/angular/tree/2.0.x/modules/%40angular and may be related to the string of "%40angular " in the URL for "https://github.com/angular/angular/tree/2.0.x/modules/@angular"

@pombredanne
Copy link
Member Author

@mjherzog this "reports "@" sign from a scan as "%40"" is fixed in develop this was tracked in #542

@pombredanne
Copy link
Member Author

We now dropped support for Python 2 and are now using unicode inside everywhere. At last!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants