-
-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File.absolute_path does not work correctly on Windows when dir_string is specified and contains non-ASCII characters #7750
Comments
Notes so far. 9.2 worked. 9.3+ regressed. I think we started properly transcoding to terminal at some point and perhaps we are doing that prematurely. If you examine the output of Dir.pwd it is not aligning with Java charset and transcoding to '??' instead of the kanji. If you look at Pathname#absolute it is preserving the unicode escapes but still displaying them as that. |
More notes. I can fix this by removing logic which should exist. We started encoding path strings (e.g. Dir#pwd) in 9.3 using the file system encoding. Specs show that is what we should be doing. Once we do that with non-7bit chars things get odd and it is trying to mix UTF-8 strings with CP-1252 strings and in that process it fails and replaces the multibyte codepoints with '?'. I am inclined to take two steps back and fix Dir#pwd and the internals of expand_path to not use filesystem encoding (which should have zero effect on unixy systems) and will fix the reported issues here. A significant part of the problem is our file paths ultimately when used for real file operations from Java need to be in the Java default charset. If we call native methods they need to be in windows appropriate code page. When it is just ruby -> ruby strings it presumably shouldn't matter since we should be capable of transcoding back and forth. This last part does appear broken but I think it is not transcoding that is broken but us mangling these cp-1252 Java strings back into a ByteList. I am still looking and maybe will untangle this a bit more but the fallback is to just go back to 9.2 which beside returning variables as UTF-8 instead of CP-1252 will play much nicer with everything. |
I will also add a bunch of methods which are working like Dir.chdir is not converting the path to file system encoding to change the directory location and it is only working because it is not doing what we do with things like Dir.pwd. If I do untangle this then I am hoping we can audit these other file system forward functions to all do the right thing (Dir#home also is UTF-8 and not file system encoding). |
Reconfirmed with JRuby master (9.4.8.0) on Windows 11 in Parallels on M1 MacOS (whew!) Both (2) and (3) produce the following error:
As I now have a Windows env to test with, I may just try to fix this. But not for 9.4.8.0. |
This change was applied to CRuby in 3.0. See https://bugs.ruby-lang.org/issues/12654 Fixes jruby#7750
As of Ruby 3.0, CRuby defaults to UTF-8 for Windows filesystem path encodings, so we should follow suit. With that change in place, all three versions work properly on Windows. |
Nice!! |
Workaround is to set JVM property |
Environment Information
Expected Behavior
File.absolute_path
should correctly resolve a path ifdir_string
(the second argument) is specified and that string contains non-ASCII characters.Actual Behavior
File.absolute_path
returns a mangled string that is not recognized as a valid path.Test Case
The line at (1) produces the correct behavior (since no
dir_string
is specified). The lines at (2) and (3) do not.If the
dir_string
is specified, and it contains non-ASCII characters, JRuby is somehow mangling the path so it is not recognized as a valid path on the system.Workaround
The text was updated successfully, but these errors were encountered: