-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: input file encoding #596
Conversation
Now the characters are read in bytes, what encoding they have is evaluated and converted to a string with it. Signed-off-by: JCHacking <[email protected]> Refs: #448
why did you use a niche library https://pypi.org/project/faust-cchardet if it was for the py3.6 compatibility, |
That was one of the reasons, the other one is because some encodings like cp1252 are returned as Windows-1252 so they are not exactly the same to pass it to the decode method. Also faust-cchardet is a maintained fork of cchardet which is written in C so it has better performance. |
Performance should not be a concern at this point. see #448 (comment) could you test the following on your system?
|
Perfect, then I change the library to chardet with a version that supports python 3.6 and then I try the other option that you mention of only validating the coding in requirements.txt |
Changed to use the chardet library and now only the encoding in requirements.txt is inspected. Signed-off-by: JCHacking <[email protected]> Refs: #448
The change is already done Basically, if the open is done in byte mode the encoding will be inspected, otherwise it will be assumed that everything is OK. And regarding chardet I have made a replace to make it work with windows, since it returns Windows-1252 but python only understands cp1252. |
Changed to use the chardet library and now only the encoding in requirements.txt is inspected. Signed-off-by: JCHacking <[email protected]> Refs: #448
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
I had to do minor version range adjustments and other chores. I will add a regression test, and then, this fix is ready to go. |
Signed-off-by: Jan Kowalleck <[email protected]>
Thanks to you for letting me collaborate in its solution. |
fix is available as of v3.11.3 |
input files in lock-format are expected in a certain encoding,
other input file encodings are detected.
fixes #448