-
-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reader bugs out if Japanese characters are in archive folder name #145
Comments
And here I thought I'd be done with unicode problems. Looks like the reader is extracting your archive correctly, but doesn't have the correct path to the extracted folder, leading to it not finding anything. When you open a unicode archive, does it extract in some form to your file system? |
Says windows cannon't access that folder after pasting it in exactly into file explorer. What's the full path? |
The The alternative is to lookup the files directly in the |
After navigating to %APPDATA%\LANraragi\Distro I find a temp folder but it just contains a bunch of 0kb files, the whole folder is only 213 bytes. There definitely is a temp folder somewhere because when i open an English archive the path shows up in the logs and everything opens fine. I still cant access \wsl$\ at all for some reason, says network error. I tried microsoft/WSL#4027 (comment) but it didn't do anything. Next step is reinstall everything i guess. Let me know if there any any specific logs or anything that would help. |
Providing some useful logs. Can debug further if necessary.
|
Turns out I'm retarded and wasn't using the most recent version of windows. I have now updated to 1903 and \wls$\ now works perfectly. To answer your original question, yes both the English and Japanese titled folders are extracted there and are perfectly readable from the explorer. I found another archive with English + kanji only and it opened just fine, logs look just like the edit screenshot in the OP. |
So i think i figured it out. Before i used LRR i had all my works stored in plain folders. When i switched i used a batch script and 7zip to zip up every folder individually. I'm just now noticing that the photos were placed in a folder in the zip. When i upload a new archive with the photos directly in the zip it all works fine even with Japanese characters. I don't know what could be causing it but seems like a good clue. Edit: |
That's a good clue and temporary fix, however it shouldn't really be needed. LANraragi seems to handle images nested in directories just fine, but it seems to bug out with Unicode characters in directory names. Definitely a bug to fix. |
Okay, I've found where the issue lies. The Archive utility is always saving extracted archive in Unicode but Reader model then uses File::Find and forgets that it returns byte-encoded path, not Unicode. The solution would be:
|
Hmm, I thought I had fixed unicode in folder names for good with commit 030229d, but I guess it's worth taking another look at it. |
Yes indeed your solution was good, however removal of Find::utf8 was unnecessary. Find expects and returns byte-encoded strings but code further treats returned value from it as decoded string, which obviously fails. Either decode path as UTF-8 (as we are sure by now that it IS UTF-8 after renaming it) or use Find::utf8 variant that does it automatically. |
You're right on that point, but the thing is that the extracted files should not be UTF-8 at all, but pure ASCII. Commit 030229d uses encode's coderef CHECK to automatically translate every non-ascii character to its However, you're still getting folder and files chock full of utf8 characters, which means the |
I fail to see how simple die statement would help here. Indeed, the conversion fails and module dies, taking whole application with itself - subsequent requests after failure result in mojolicious errors that can easily make you look in a rabbit hole such as 'public/themes' not found.
|
It was not really meant to help, just to see why the moving operation fails. It's weird that the program can't open a file in the log directory as well, however. Maybe some file permissions are wrong? |
No, the logs work fine otherwise. It seems to be an issue caused by dying, a rabbit hole that's better left unexplored lest we start digging into Mojo source code. |
^ The issue is because it dies inside finddepth, which internally does chdir. |
Found it. You can verify it by doing
|
Re-read my docs on perl encoding, and indeed I was mistaken: My understanding was that I still don't want to assume what we're receiving is utf-8 (I've gotten issues in the past from systems that weren't utf-8 at all), so I've added a layer of Encode::Guess with the major Japanese encodings. |
Guess can be used but you must consider that it will fail in many cases where encoding is ambiguous - https://perldoc.perl.org/Encode/Guess.html
|
Didn't see that in the perldoc indeed; I probably have to add some extra error checking here. (And in the variant I used in Edit.pm) Didn't think shift-jis and utf8 used the same symbols though - I'll probably default to utf8/ascii when they appear in the guesses. |
Technically something like this should work
decode_utf8 is not only an alias for decode("UTF-8, $_), it's also different in that it's not strict, so it won't do any harm to ascii encoded stuff and won't croak. There is still open issue of finddepth not coming back to original directory if it dies mid-way - this really has to be handled as it crashes whole LRR permanently until restarted as Mojo is trying to access and write to files in a directory where archive was extracted to (assuming it even exists). There is also an issue with your attempt to convert UTF-8 to Here is example of archive (with only cover) with encoding issue as well as one that would trigger filename too long issue: https://u.gensokyo.re/d/C03V4o2v |
I've been following this closely and I'm happy to report whatever you did in the latest nightly fixed my problem! All of my previously broken archives now work perfectly! Amazing work! |
I was writing something more complex, but in hindsight a basic guess + fallback to utf8 is going to be good for 98% of setups. I'll go with that. Saving the current working directory and I had the character limit for the U+ conversion in the back of my mind too: I thought about cutting the max string, but that could cause issues for archives that have multiple folders with slight differences in name at the end like japanese characters ch.1/2/3/etc. It's super minor and wouldn't crash the app however, so it's no big deal. |
hm, left an |
It works! I'll report if I find any that cause issue but so far looks like we managed to finally solve it. |
I want to start by saying thanks for this amazing program. It's been invaluable for organizing and tagging my collection in the wake of sadpanda dying.
I saw a couple older issues dealing with similar unicode problems but this seems different.
LRR Version and OS
0.6.0. beta 2
Latest Win 10 Pro
Whatever the powershell script installed
Bug Details
The program works great for any archive in english but if there are any unicode characters in the title the reader bugs out. When you open an affected archive it says no thumbnail and if you try to advance it the loading gear comes up and freezes. I've gone and individually opened the affected archives and both the folder titles and images inside seem fine and uncorrupted. The affected archives also have correct thumbnails on the main page.
Matching Logs
The only actual error message I saw was when i pressed "regenerate archive thumbnail":
[2019-07-27 20:27:41] [Hash Computation] [error] Error building hash for /home/koyomi/lanraragi/script/../public/temp/5236544cd1d197486aa09129866935ff71883bef/(C91) [������ (�����)] �森峰�辱 (����&�����)/01.png -- Open failed: No such file or directory at /home/koyomi/lanraragi/script/../lib/LANraragi/Utils/Generic.pm line 93. [2019-07-27 20:27:41] [LANraragi] [debug] Thumbnail not found at /mnt/d/pron/Hentai/Doujins/thumb/5236544cd1d197486aa09129866935ff71883bef.jpg ! (force-thumb flag = 1) [2019-07-27 20:27:41] [LANraragi] [debug] Regenerating from /home/koyomi/lanraragi/script/../public/temp/5236544cd1d197486aa09129866935ff71883bef/(C91) [������ (�����)] �森峰�辱 (����&�����)/01.png
Error log after opening Japanese titled archive
Screenshots
Reader page of an archive with Japanese characters in it
Trying to advance the reader
Looking at all pages
Logs of opening an English archive vs a Japanese one
EDIT:
Found one archive with a kanji title that would open, logs look different.
The text was updated successfully, but these errors were encountered: