Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.absolute_path does not work correctly on Windows when dir_string is specified and contains non-ASCII characters #7750

Closed
mojavelinux opened this issue Apr 10, 2023 · 7 comments · Fixed by #8309
Labels
Milestone

Comments

@mojavelinux
Copy link

Environment Information

  • JRuby version: 9.4.2.0
  • JRUBY_OPTS: none
  • Operation system: Windows (works fine on *nix)

Expected Behavior

File.absolute_path should correctly resolve a path if dir_string (the second argument) is specified and that string contains non-ASCII characters.

Actual Behavior

File.absolute_path returns a mangled string that is not recognized as a valid path.

Test Case

dirname = %(\u6d4b\u8bd5)
filename = 'foo.txt'
filepath = [dirname, filename].join (File::ALT_SEPARATOR || File::SEPARATOR)
File.unlink filepath rescue nil
Dir.rmdir dirname rescue nil
Dir.mkdir dirname
Dir.chdir dirname do
  File.write filename, 'contents'
  abs_filepath = File.absolute_path filename # (1)
  #abs_filepath = File.absolute_path filename, Dir.pwd # (2)
  #abs_filepath = File.absolute_path filename, (File.absolute_path '') # (3)
  puts File.file? abs_filepath
  puts File.read abs_filepath
end

The line at (1) produces the correct behavior (since no dir_string is specified). The lines at (2) and (3) do not.

If the dir_string is specified, and it contains non-ASCII characters, JRuby is somehow mangling the path so it is not recognized as a valid path on the system.

Workaround

File.singleton_class.prepend (Module.new do
  def absolute_path path, dir = nil
    return super unless dir && !(absolute_path? path)
    super File.join dir, path
  end
end) if RUBY_ENGINE == 'jruby'
@enebo enebo added this to the JRuby 9.4.3.0 milestone Apr 10, 2023
@enebo
Copy link
Member

enebo commented Apr 18, 2023

Notes so far. 9.2 worked. 9.3+ regressed. I think we started properly transcoding to terminal at some point and perhaps we are doing that prematurely. If you examine the output of Dir.pwd it is not aligning with Java charset and transcoding to '??' instead of the kanji. If you look at Pathname#absolute it is preserving the unicode escapes but still displaying them as that.

@enebo
Copy link
Member

enebo commented Apr 20, 2023

More notes. I can fix this by removing logic which should exist. We started encoding path strings (e.g. Dir#pwd) in 9.3 using the file system encoding. Specs show that is what we should be doing. Once we do that with non-7bit chars things get odd and it is trying to mix UTF-8 strings with CP-1252 strings and in that process it fails and replaces the multibyte codepoints with '?'.

I am inclined to take two steps back and fix Dir#pwd and the internals of expand_path to not use filesystem encoding (which should have zero effect on unixy systems) and will fix the reported issues here.

A significant part of the problem is our file paths ultimately when used for real file operations from Java need to be in the Java default charset. If we call native methods they need to be in windows appropriate code page. When it is just ruby -> ruby strings it presumably shouldn't matter since we should be capable of transcoding back and forth. This last part does appear broken but I think it is not transcoding that is broken but us mangling these cp-1252 Java strings back into a ByteList.

I am still looking and maybe will untangle this a bit more but the fallback is to just go back to 9.2 which beside returning variables as UTF-8 instead of CP-1252 will play much nicer with everything.

@enebo
Copy link
Member

enebo commented Apr 20, 2023

I will also add a bunch of methods which are working like Dir.chdir is not converting the path to file system encoding to change the directory location and it is only working because it is not doing what we do with things like Dir.pwd. If I do untangle this then I am hoping we can audit these other file system forward functions to all do the right thing (Dir#home also is UTF-8 and not file system encoding).

@headius
Copy link
Member

headius commented Jun 27, 2024

Reconfirmed with JRuby master (9.4.8.0) on Windows 11 in Parallels on M1 MacOS (whew!)

Both (2) and (3) produce the following error:

C:\Users\headius\jruby>bin\jruby.exe test.rb
false
Errno::ENOENT: No such file or directory - C:/Users/headius/jruby/??/foo.txt
  sysopen at org/jruby/RubyIO.java:1278
     read at org/jruby/RubyIO.java:4264
   <main> at test.rb:13
    chdir at org/jruby/RubyDir.java:472
   <main> at test.rb:7

As I now have a Windows env to test with, I may just try to fix this. But not for 9.4.8.0.

@headius headius modified the milestones: JRuby 9.4.8.0, JRuby 9.4.9.0 Jun 27, 2024
headius added a commit to headius/jruby that referenced this issue Jun 28, 2024
@headius
Copy link
Member

headius commented Jun 28, 2024

As of Ruby 3.0, CRuby defaults to UTF-8 for Windows filesystem path encodings, so we should follow suit. With that change in place, all three versions work properly on Windows.

@mojavelinux
Copy link
Author

Nice!!

@headius
Copy link
Member

headius commented Jun 28, 2024

Workaround is to set JVM property file.encoding to "UTF-8" since that's what we use to choose the filesystem encoding. Prior to Java 18, it defaults to Cp1252 on Windows, which can't handle multibyte characters. In Java 18+ JDK also defaults to UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants