You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a prep for #295 and for general sanity we should use unicode everywhere by default.
Today several pieces of code are dealing with a hodgepodge of bytes or unicode strings in a semi-organized way (this is an euphemism).
We should instead have a simpler approach:
use unicode as default everywhere (e.g. for now using from __future__ import unicode_literals on every Python file and using six text types or similar to handle proper Python 2/3 compat where needed)
always convert and decode to unicode at the boundaries when ingesting content or files or paths. By boundaries I mean when files or paths are ingested or reported. And from then on only process unicode.
and convert back to encoded bytes at the boundaries when needed (e.g UTF-8 or filesystem encoding for external paths consumption) at the boundaries when reporting out.
deal with bytes rather than unicode explicitly and by exception whenever low level byte handling is required. There are only a few places that should be needing this
The text was updated successfully, but these errors were encountered:
As a prep for #295 and for general sanity we should use unicode everywhere by default.
Today several pieces of code are dealing with a hodgepodge of bytes or unicode strings in a semi-organized way (this is an euphemism).
We should instead have a simpler approach:
from __future__ import unicode_literals
on every Python file and usingsix
text types or similar to handle proper Python 2/3 compat where needed)decode
to unicode at the boundaries when ingesting content or files or paths. By boundaries I mean when files or paths are ingested or reported. And from then on only process unicode.The text was updated successfully, but these errors were encountered: