-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #35: fd{Read,Write} with ByteString payload (Breaking change) #219
Fixes #35: fd{Read,Write} with ByteString payload (Breaking change) #219
Conversation
For the record: I favor this one. |
If this is merged, the AFPP variant also needs fixing. |
Could someone review: is it the only place where |
I think only |
I think this PR is fine as-is and doesn't need a wider scope. @Bodigrim @hs-viktor let's vote on this issue:
I clearly vote for 1. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're introducing an API break, we could also deprecate the String
variants, they are semantically dubious.
-- | Read data from an 'Fd' and convert it to a 'String' using the locale encoding. | ||
-- Throws an exception if this is an invalid descriptor, or EOF has been | ||
-- reached. | ||
fdRead :: Fd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth noting that unless the input is known to be ASCII data, or an 8-bit locale is in use, reading n
bytes and expecting a String
is rather dubious. One can read and decode lines (in an UTF-8 locale) because UTF-8 is self-synchronising, and LF or CRLF is never in the middle of a multi-byte sequence. Otherwise, reading "n-bytes" from a file into a string is a recipe for failure. Use of this function should be discouraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow.
This issues seems to be present everywhere, including in the handling of filepaths. It's broken: https://gist.github.com/hasufell/c600d318bdbe010a7841cc351c835f92
Anything that uses getLocaleEncoding
, getFileSystemEncoding
or getForeignEncoding
to convert CString
to String
potentially runs into bugs, no?
Following that, we'd need to deprecate half of base and most of the unix String based API.
(note: I'm up for that... but it's probably out of scope of this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that uses
getLocaleEncoding
,getFileSystemEncoding
orgetForeignEncoding
to convertCString
toString
potentially runs into bugs, no?
No, the issue here is that we're not reading a complete string, we're reading n bytes
from a file. These n bytes
may well end in the middle of a multi-byte grapheme.
Converting entire file names or similar octet data (rather than n byte
fragments of strings) to Strings is much safer under reasonable assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These n bytes may well end in the middle of a multi-byte grapheme.
Right, got it. And when you append the strings, you get garbage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These n bytes may well end in the middle of a multi-byte grapheme.
Right, got it. And when you append the strings, you get garbage.
Yes, the description of the action says it all: read n bytes. That's a ByteString
operation, not a String
operation. String
s don't consist of bytes.
I'm tempted to accept this PR, but have no idea how much breakage would ensue downstream. How widely used are these functions? |
I think this is an acceptable level of breakage. The usage is quite limited, e. g., https://hackage-search.serokell.io/?q=%5CbfdRead%5Cb - and most of these are for non-ByteString version. I vote to accept. |
I will accept this PR, modulo additional documentation discouraging the |
Let's deprecate |
I can make a follow-up PR. I'd argue the scope of this PR is just about the ByteString variants. |
Version of #186 with API breakage. Feel free to close this one or the other one.