Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1: we should always be writing out bytes, in specific encoding (utf8).
What exactly is the problem/error this is trying to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to segmentize a wiki and write results to file. But I've got error:
I can make a convertation to bytes but sys.stdout requires str, not bytes and I'd like to keep this flexible approach for writing.
I have Python v3.5, for Python2 is all good.
Please, explain why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's what I/O layers understand: bits and bytes. It makes the logic more explicit and simpler to have "unicode inside" and "bytes on I/O".
I didn't look in detail, but to me that
json.dumps() + "\n"
looks like a bug. @menshikh-iv shouldn't that be encoded into a bytestring (utf8) before writing to a binary file?@horpto thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @horpto suggested,
sys.stdout
opened in 'w' mode (not 'wb') (for python3 addedencoding='UTF-8'
explicitly)I can convert this line to bytes explicitly, but potentially, we'll have problems with
sys.stdout
(or need to split this two cases).@horpto can you test your code with 'wb' mode and explicit conversion for
json.dumps() + "\n"
(I can test this only for linux, sometimes, encoding problems on windows behaves not obviously)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@menshikh-iv
It's OK and should work fine for python2 and python3.
I think, it's uselessly when we are writing to text file, as file-object does it already inside. When we are reading content from file - it's OK, files can contain some trash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not useless, it's a Python best practice.
Newlines are messed up on Windows, we always want to have full control over what we write. For this reason, explicit conversions between string and byte are preferred on all I/O boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, interesting use case.
ok.