-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conversion routines for FilePath #403
Conversation
Despite FilePath being a type alias for String, the type is actually quite distinct. An argument of type FilePath is expected to be encoded using the file system encoding, and can be converted to a bytestring and back exactly.
Makes sense to me. @sjakobi, what is your take on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already exist many incorrect re-implementations of these functions, usually assuming that the bytestrings should be UTF-8 encoded. [1]
Is this [1]
supposed to be a reference to something? I'd like to see it.
The addition makes sense to me. I think the documentation should explicitly mention the filesystem encoding though.
Indeed, I was going to post a list, but decided it was too hard. It's made worse by people wanting to avoid I'll expand on the documentation. shelly for example ends up going via cabal-install has it's own function for encoding which seems to be the replacement for Though, I haven't followed the code all the way through, so maybe it's important still. tar-conduit assumes utf-8 encoding: https://github.com/snoyberg/tar-conduit/blob/master/src/Data/Conduit/Tar/Types.hs#L150-L156 (general confusion about bytestring arguments on unix systems) pcapriotti/optparse-applicative#368 I know of a few more for some closed source projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers!
Looks good, but I cannot make my mind whether it belongs to |
I chose not to put it in |
Thanks @luke-clifton! |
* Add conversion routines for FilePath Despite FilePath being a type alias for String, the type is actually quite distinct. An argument of type FilePath is expected to be encoded using the file system encoding, and can be converted to a bytestring and back exactly. * Expand on to/fromFilePath documentation
I'm late to this but it would be more convenient if these new functions did not need IO. My own versions of these use unsafePerformIO. When I'm writing code that uses ByteString filepaths, I often have places where conversion is often needed in pure code, or in places where lifting to IO would be awkward. |
It's probably not too late to redesign; I assume |
Well it's possible that someone uses setFileSystemEncoding. Which would break referential transparency, in theory, I suppose.. |
Yeah, the very existence of setFileSystemEncoding makes this require IO. It's unfortunate, as it's something that is pretty likely to stay constant. I did consider adding an |
Let's keep it as is then. Client apps, which are in control of encoding (and presumably set it only once), may wish to wrap it into |
Perhaps we could add to the documentation that it's safe to use |
@luke-clifton a PR, improving documentation in this aspect, would be much appreciated. |
Not sure if GitHub alerts on that reference, so... I raised a PR to mention unsafePerformIO in the docs. |
@luke-clifton I've been recently running |
Ugh.
Given that there are systems that don't follow that rule, perhaps selectively disabling those tests might make sense? The other option would be to change the QuickCheck tests to just a handful of unit tests covering some fairly basic cases or modifying it to produce ASCII strings only. I don't think we really need to be testing it too much given the simplicity of the functions. We can hopefully rely on GHC to be testing the encoding more thoroughly upstream anyway. |
I filed a documentation issue upstream https://gitlab.haskell.org/ghc/ghc/-/issues/20344 and pushed a fix for test suite at #419 |
* Add conversion routines for FilePath Despite FilePath being a type alias for String, the type is actually quite distinct. An argument of type FilePath is expected to be encoded using the file system encoding, and can be converted to a bytestring and back exactly. * Expand on to/fromFilePath documentation
Despite FilePath being a type alias for String, the type is actually
quite distinct. An argument of type FilePath is expected to be encoded
using the file system encoding, which, by default, can be converted to a bytestring and
back exactly.
There already exist many incorrect re-implementations of these functions, usually assuming that the bytestrings should be UTF-8 encoded. [1]
I feel like having something more discoverable here might provide some value. What do you think?