-
Notifications
You must be signed in to change notification settings - Fork 513
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bring back agi::fs::path to ensure UTF-8 paths
On Windows, std::filesystem::path internally stores paths in UTF-16, but constructing an std::filesystem::path from a string reads that string in Windows-1252 or some other non-UTF-8 narrow encoding. This breaks all kinds of code that previously assumed that one could simply convert between UTF-8 strings, wstrings, and paths freely. Before the switch from boost::filesystem to std::filesystem, this was solved by using boost::filesystem::path::imbue to configure boost::filesystem to always use UTF-8. However, there is no equivalent function for std::filesystem. It seems that the encoding used can be controlled to some degree using the C and C++ locales, but changing these to UTF-8 breaks other things (and global locales are a headache in general. I won't pull a wm4 here but you probably know what I mean). So, there does not seem to be any easy solution to this. Aegisub also isn't the only program to have this problem, see e.g. https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/ As far as I can see, the three options are - Somehow mess with the global locales until everything magically works. This feels risky, might not work on all systems, and could break in the future. - Audit the entire code base and check every single conversion between strings and paths (Yeah, no) - Reinvent the wheel and write a wrapper class that fixes std::filesystem::path by forcing all conversions from and to std::string to use UTF-8. So, here we are. It doesn't feel great to have another reinvention of something that shouldn't be Aegisub's responsibility in the first place, and we *just* got rid of all the agi::fs wrapper code, but this seems like the only sane way to be sure that all conversions happen the way we expect. I guess since agi::fs wraps std::filesystem and not boost::filesystem this time, it's still better than before. Incidentally, std::u8string seems to be kind of a meme too. The idea of being explicit about your string being UTF-8 is great, but how is there not even a standard function to reinterpret a string as UTF-8 or vice-versa?? Let alone support in any other string handling or I/O functions. The changeset is pretty big, but the main changes are in fs.h/fs.cpp . The rest is just a few find&replace calls and a handful of manual fixes. Finally, it should be noted that conversion between std::filesystem::paths and std::wstrings is broken on gcc <= 11: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048 This is what currently causes the added lagi_mru.add_entry_utf8 test to fail on the Ubuntu CI. Clang and newer versions of gcc work, though. Fixes #219.
- Loading branch information
1 parent
a99092d
commit b005c20
Showing
121 changed files
with
442 additions
and
368 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.