Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving 10 years of unicode problem? #4563

Closed
xoriole opened this issue Jun 13, 2019 · 7 comments · Fixed by #5025
Closed

Solving 10 years of unicode problem? #4563

xoriole opened this issue Jun 13, 2019 · 7 comments · Fixed by #5025

Comments

@xoriole
Copy link
Contributor

xoriole commented Jun 13, 2019

Investigate possible libraries for universal support of unicode in Tribler. The library should work on all platforms we support (Windows, MacOS and Linux).

Note that there are issues in windows with availability of codecs and encoding. We already remap the codecs to support 'mbcs and cp65001. So this should be tested with various encodings to avoid future issues.

I suggest creating a utility to encode and decode the python strings (and paths) and using it across Tribler. This utility could depend on some other library but Tribler should not directly depend on the library itself so we can switch the underlying library later if necessary.

Note that UTF-8 character set should be utf8mb4 where a character is represented by one to four bytes instead of utf8mb3 which misses some supplementary characters.

@ichorid
Copy link
Contributor

ichorid commented Jun 13, 2019

I propose using pathlib to handle the path-related part of the problem. There are lots of peculiar specifics in the path handling on different OSes. pathlib abstracts these out for us by separating "OS Path" type from "character string" type.

Otherwise, we must conform to the "unicode sandwich" ideology.

@qstokkink
Copy link
Contributor

qstokkink commented Jun 13, 2019

I support pathlib.

There is a backport available here: https://pypi.org/project/pathlib2/

@xoriole
Copy link
Contributor Author

xoriole commented Jun 13, 2019

Since Python 3.6, pathlib.Path objects work nearly everywhere you’re already using path strings. So I see no reason not to use pathlib if you’re on Python 3.6 (or higher).
https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/

Since the default python version for Ubuntu 18.04 is Python 3.6.7 and we bundle python in MacOS and Windows, we could use after some tests.

@synctext
Copy link
Member

How should this be scheduled with PY3 upgrade we believe?

@ichorid
Copy link
Contributor

ichorid commented Jun 13, 2019

I believe PY3 upgrade will go smoother if we first move to pathlib.

@egbertbouman
Copy link
Member

egbertbouman commented Jun 14, 2019

@ichorid I have a working Python 3 branch (https://github.com/egbertbouman/tribler/commits/py3tests, ~1400 line changes). We can merge this into devel once v7.3 is out.

@synctext
Copy link
Member

As Egbert has PY3 GUI+core operational, please build Unicode on top of that work once it is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants