Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opening subtitles will open an empty program: 3.4.0 #219

Open
AdventurerRussia opened this issue Dec 19, 2024 · 5 comments · May be fixed by #231
Open

opening subtitles will open an empty program: 3.4.0 #219

AdventurerRussia opened this issue Dec 19, 2024 · 5 comments · May be fixed by #231

Comments

@AdventurerRussia
Copy link

AdventurerRussia commented Dec 19, 2024

video
https://disk.yandex.ru/i/_xgeiAsF2hWIRg

I deleted the settings of the previous program from the user's folder.

Выпуск Windows 11 Pro
Версия 24H2
Дата установки ‎16.‎06.‎2024
Сборка ОС 26100.2454

@AdventurerRussia AdventurerRussia changed the title opening subtitles will open an empty program opening subtitles will open an empty program: 3.4.0 Dec 19, 2024
@AdventurerRussia
Copy link
Author

we found that if there are Russian letters in the path to the file, that is, Cyrillic, then the file does not open,

@arch1t3cht
Copy link
Member

Does opening the subtitles in Aegisub via File > Open work? Does opening the subtitles via "Open With" from the Explorer work? Did dragging those subtitles into Aegisub work on 3.2.2?

@AdventurerRussia
Copy link
Author

AdventurerRussia commented Dec 20, 2024

Does opening the subtitles in Aegisub via File > Open work? Does opening the subtitles via "Open With" from the Explorer work? Did dragging those subtitles into Aegisub work on 3.2.2?

in 3.4.0 version, if you open via (file-open new subtitles), then the subtitles open.
.
subtitle dragging works fine in version 3.2.2, I downloaded the portable version. Everything is fine in it.
.
again, the problem with not opening subtitles directly is observed if they are in folders with Russian characters in their names

@0tkl
Copy link
Contributor

0tkl commented Dec 20, 2024

I can reproduce the bug.

  • Open in Aegisub via File > Open Subtitles: works
  • "Open With" from the Explorer: fail in 3.4.0, ok in 3.2.2
  • Drag and drop: fail in 3.4.0, ok in 3.2.2
  • Access mru.json from different version: 3.4.0 cannot handle 3.2.2; 3.2.2 cannot handle 3.4.0 either

@geezeng
Copy link

geezeng commented Dec 21, 2024

I also encountered the same problem,and when the video is loaded, there will be an error.

arch1t3cht added a commit that referenced this issue Dec 21, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::imbue to configure boost::filesystem
to always use UTF-8. However, there is no equivalent function for
std::filesystem. It seems that the encoding used can be controlled to
some degree using the C and C++ locales, but changing these to UTF-8
breaks other things (and global locales are a headache in general. I
won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Fixes #219.
arch1t3cht added a commit that referenced this issue Dec 21, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::imbue to configure boost::filesystem
to always use UTF-8. However, there is no equivalent function for
std::filesystem. It seems that the encoding used can be controlled to
some degree using the C and C++ locales, but changing these to UTF-8
breaks other things (and global locales are a headache in general. I
won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Fixes #219.
arch1t3cht added a commit that referenced this issue Dec 22, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Fixes #219.
arch1t3cht added a commit that referenced this issue Dec 22, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Fixes #219.
arch1t3cht added a commit that referenced this issue Dec 22, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Finally, it should be noted that conversion between
std::filesystem::paths and std::wstrings is broken on gcc <= 11:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048
This is what currently causes the added lagi_mru.add_entry_utf8 test
to fail on the Ubuntu CI. Clang and newer versions of gcc work, though.

Fixes #219.
@arch1t3cht arch1t3cht linked a pull request Dec 22, 2024 that will close this issue
arch1t3cht added a commit that referenced this issue Dec 22, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Finally, it should be noted that conversion between
std::filesystem::paths and std::wstrings is broken on gcc <= 11:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048
This is what currently causes the added lagi_mru.add_entry_utf8 test
to fail on the Ubuntu CI. Clang and newer versions of gcc work, though.

Fixes #219.
arch1t3cht added a commit that referenced this issue Dec 26, 2024
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Finally, it should be noted that conversion between
std::filesystem::paths and std::wstrings is broken on gcc <= 11:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048
This is what currently causes the added lagi_mru.add_entry_utf8 test
to fail on the Ubuntu CI. Clang and newer versions of gcc work, though.

Fixes #219.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants