Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #35: fd{Read,Write} with ByteString payload (Breaking change) #219

Merged
merged 1 commit into from
Jul 16, 2022

Conversation

mmhat
Copy link
Contributor

@mmhat mmhat commented Jul 5, 2022

Version of #186 with API breakage. Feel free to close this one or the other one.

@mmhat mmhat changed the title Fixes #35: fd{Read,Write} with ByteString payload Fixes #35: fd{Read,Write} with ByteString payload (Breaking change) Jul 5, 2022
@mmhat
Copy link
Contributor Author

mmhat commented Jul 5, 2022

For the record: I favor this one.

@hasufell hasufell added this to the 2.8.0.0 milestone Jul 10, 2022
@hasufell
Copy link
Member

If this is merged, the AFPP variant also needs fixing.

@Bodigrim
Copy link
Contributor

Could someone review: is it the only place where *.ByteString modules use String?

@hasufell
Copy link
Member

I think this PR is fine as-is and doesn't need a wider scope.

@Bodigrim @hs-viktor let's vote on this issue:

  1. accept this PR
  2. accept PR Fixes #35: fd{Read,Write} with ByteString payload #186
  3. accept none of them

I clearly vote for 1. 👍

Copy link
Contributor

@hs-viktor hs-viktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're introducing an API break, we could also deprecate the String variants, they are semantically dubious.

-- | Read data from an 'Fd' and convert it to a 'String' using the locale encoding.
-- Throws an exception if this is an invalid descriptor, or EOF has been
-- reached.
fdRead :: Fd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth noting that unless the input is known to be ASCII data, or an 8-bit locale is in use, reading n bytes and expecting a String is rather dubious. One can read and decode lines (in an UTF-8 locale) because UTF-8 is self-synchronising, and LF or CRLF is never in the middle of a multi-byte sequence. Otherwise, reading "n-bytes" from a file into a string is a recipe for failure. Use of this function should be discouraged.

Copy link
Member

@hasufell hasufell Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow.

This issues seems to be present everywhere, including in the handling of filepaths. It's broken: https://gist.github.com/hasufell/c600d318bdbe010a7841cc351c835f92

Anything that uses getLocaleEncoding, getFileSystemEncoding or getForeignEncoding to convert CString to String potentially runs into bugs, no?

Following that, we'd need to deprecate half of base and most of the unix String based API.


(note: I'm up for that... but it's probably out of scope of this PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that uses getLocaleEncoding, getFileSystemEncoding or getForeignEncoding to convert CString to String potentially runs into bugs, no?

No, the issue here is that we're not reading a complete string, we're reading n bytes from a file. These n bytes may well end in the middle of a multi-byte grapheme.

Converting entire file names or similar octet data (rather than n byte fragments of strings) to Strings is much safer under reasonable assumptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These n bytes may well end in the middle of a multi-byte grapheme.

Right, got it. And when you append the strings, you get garbage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These n bytes may well end in the middle of a multi-byte grapheme.

Right, got it. And when you append the strings, you get garbage.

Yes, the description of the action says it all: read n bytes. That's a ByteString operation, not a String operation. Strings don't consist of bytes.

@hs-viktor
Copy link
Contributor

I'm tempted to accept this PR, but have no idea how much breakage would ensue downstream. How widely used are these functions?

@Bodigrim
Copy link
Contributor

I think this is an acceptable level of breakage. The usage is quite limited, e. g., https://hackage-search.serokell.io/?q=%5CbfdRead%5Cb - and most of these are for non-ByteString version.

I vote to accept.

@hs-viktor
Copy link
Contributor

I will accept this PR, modulo additional documentation discouraging the String variants, if not outright deprecating them.

@Bodigrim
Copy link
Contributor

Let's deprecate String variants as unsafe then. @mmhat could you please add this to the PR?

@hasufell
Copy link
Member

Let's deprecate String variants as unsafe then. @mmhat could you please add this to the PR?

I can make a follow-up PR. I'd argue the scope of this PR is just about the ByteString variants.

@Bodigrim Bodigrim merged commit e0f4580 into haskell:master Jul 16, 2022
@mmhat mmhat deleted the issue-35-bytestring-payload-breaking branch July 16, 2022 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants