-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
darwin: download_and_extract fails on archive containing file with unicode name #7055
Comments
I was able to reproduce this by adding a file with unicode filename |
It's not only macOS. I have a similar issue on Linux
WORKSPACE
Filesystem
Log
|
Update: I managed to workaround the problem by making these changes:
Ignore FileNotFound in } else {
Files.copy(
tarStream, filename.getPathFile().toPath(), StandardCopyOption.REPLACE_EXISTING);
try {
filename.chmod(entry.getMode());
// This can only be done on real files, not links, or it will skip the reader to
// the next "real" file to try to find the mod time info.
Date lastModified = entry.getLastModifiedDate();
filename.setLastModifiedTime(lastModified.getTime());
} catch (FileNotFoundException e) {
// ignore
}
}
} |
Bazel's VFS classes make the assumption that all filenames are encoded with latin-1. That theoretically allows roundtripping any sort of horrible byte pattern a Unix filesystem can produce through Bazel's Path class. This scheme falls apart, though, when trying to use the JDK I/O libraries. The filename encoding assumed by the JDK I/O libraries comes from the sun.jnu.encoding property, which can't be overriden with the normal -D JVM command line syntax. The Bazel client still tries quite hard to force this property to be latin-1: https://github.com/bazelbuild/bazel/blob/6641ad986f436926a75b31b47314c193a9a7e032/src/main/cpp/blaze.cc#L1467-L1473 But even a fusillade of 4 environmental variables is sometimes not enough. On macOS, the JDK simply hardcodes UTF-8 as sun.jnu.encoding. Even on Linux, if a the en_US.ISO-8859-1 locale isn't installed, glibc will fall back to an ASCII encoding. Since there's no public way to create a JDK FileOutputStream from either a byte[] filename or a raw file descriptor, I conclude the only workaround is to implement open() and write() in Bazel's unix_jni. This CL does that. We should probably implement a native file InputStream, too, for completeness. However, as merely implementing OutputStream fixes the relevant issue, I'm only doing that in this CL. Fixes bazelbuild#7055.
Bazel's VFS classes make the assumption that all filenames are encoded with latin-1. That theoretically allows roundtripping any sort of horrible byte pattern a Unix filesystem can produce through Bazel's Path class. This scheme falls apart, though, when trying to use the JDK I/O libraries. The filename encoding assumed by the JDK I/O libraries comes from the sun.jnu.encoding property, which can't be overriden with the normal -D JVM command line syntax. The Bazel client still tries quite hard to force this property to be latin-1: https://github.com/bazelbuild/bazel/blob/6641ad986f436926a75b31b47314c193a9a7e032/src/main/cpp/blaze.cc#L1467-L1473 But even a fusillade of 4 environmental variables is sometimes not enough. On macOS, the JDK simply hardcodes UTF-8 as sun.jnu.encoding. Even on Linux, if a the en_US.ISO-8859-1 locale isn't installed, glibc will fall back to an ASCII encoding. Since there's no public way to create a JDK FileOutputStream from either a byte[] filename or a raw file descriptor, I conclude the only workaround is to implement open() and write() in Bazel's unix_jni. This CL does that. We should probably implement a native file InputStream, too, for completeness. However, as merely implementing OutputStream fixes the relevant issue, I'm only doing that in this CL. Fixes bazelbuild#7055.
That test is also causing problems on my non-corp linux machine. $ bazel info java-runtime
OpenJDK Runtime Environment (build 11.0.2+7-LTS) by Azul Systems, Inc.
$ bazel version
Build label: 0.23.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Mar 11 16:47:03 2019 (1552322823)
Build timestamp: 1552322823
Build timestamp as int: 1552322823
$ bazel build --config clang-asan //kube/server/...
ERROR: While resolving toolchains for target //kube/server:kube-frontproxy: invalid registered toolchain '@go_sdk//:go_darwin_386': no such package '@go_sdk//': Traceback (most recent call last):
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
ERROR: Analysis of target '//kube/server:binaries' failed; build aborted: invalid registered toolchain '@go_sdk//:go_darwin_386': no such package '@go_sdk//': Traceback (most recent call last):
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
INFO: Elapsed time: 3.790s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (10 packages loaded, 49 targets configured)
Fetching @go_sdk; fetching
$ bazel --host_jvm_args="-Dsun.jnu.encoding=en_US.UTF-8 -Dfile.encoding=en_US.UTF-8" build --config clang-asan //kube/server/...
WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE.
ERROR: While resolving toolchains for target //kube/server:kube-frontproxy: invalid registered toolchain '@go_sdk//:go_android_386': no such package '@go_sdk//': Traceback (most recent call last):
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
ERROR: Analysis of target '//kube/server:binaries' failed; build aborted: invalid registered toolchain '@go_sdk//:go_android_386': no such package '@go_sdk//': Traceback (most recent call last):
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
INFO: Elapsed time: 2.772s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
Fetching @go_sdk; fetching My locale is set to en_US.utf-8 and the looks fine when I poked at it. I've tried getting bazel to use UTF-8 to no affect. |
@mikedanese If you can, use a newer bazelbuild/rules_go. Release 0.18.1 has a workaround: |
Bazel's VFS classes make the assumption that all filenames are encoded with latin-1. That theoretically allows roundtripping any sort of horrible byte pattern a Unix filesystem can produce through Bazel's Path class. This scheme falls apart, though, when trying to use the JDK I/O libraries. The filename encoding assumed by the JDK I/O libraries comes from the sun.jnu.encoding property, which can't be overriden with the normal -D JVM command line syntax. The Bazel client still tries quite hard to force this property to be latin-1: https://github.com/bazelbuild/bazel/blob/6641ad986f436926a75b31b47314c193a9a7e032/src/main/cpp/blaze.cc#L1467-L1473 But even a fusillade of 4 environmental variables is sometimes not enough. On macOS, the JDK simply hardcodes UTF-8 as sun.jnu.encoding. Even on Linux, if a the en_US.ISO-8859-1 locale isn't installed, glibc will fall back to an ASCII encoding. Since there's no public way to create a JDK FileOutputStream from either a byte[] filename or a raw file descriptor, I conclude the only workaround is to implement open() and write() in Bazel's unix_jni. This CL does that. We should probably implement a native file InputStream, too, for completeness. However, as merely implementing OutputStream fixes the relevant issue, I'm only doing that in this CL. Fixes bazelbuild#7055.
Thanks! I'll upgrade.
…On Sun, Mar 24, 2019, 22:52 bttk ***@***.***> wrote:
@mikedanese <https://github.com/mikedanese> If you can, use a newer
bazelbuild/rules_go. Release 0.18.1 has a workaround:
***@***.***
<bazel-contrib/rules_go@a477529>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7055 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtFIbvsZZp1QjJZ_x9a26NdDPRHMB36ks5vaGQMgaJpZM4Z0cNI>
.
|
Is this bug fixed? |
Description of the problem / feature request:
The Starlark
repository_ctx.download_and_extract
method fails when it's asked to extract an archive that contains files with unusual unicode characters in their name. Specifically, in the Go 1.12b1 SDK, there is a test file namedtest/fixedbugs/issue27836.dir/Äfoo.go
. When Bazel is asked to extract the SDK, it fails:This file is not part of the build. It's simply part of the archive we need to extract.
This seems to affect macOS. I'm on an APFS file system; not sure about HFS+. On Linux, the file is extracted as
''$'\304''foo.go'
(at least that's whatls
spits out). Windows works, but the Windows SDK is a .zip file, so probably a different code path.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Create a WORKSPACE file like this:
Run this command:
What operating system are you running Bazel on?
macOS 10.14.2
What's the output of
bazel info release
?release 0.21.0
Have you found anything relevant by searching the web?
Relevant issues:
download_and_extract
specifically.Any other information, logs, or outputs that you want to share?
This will break rules_go when Go 1.12 ships in February. We'll add a workaround to avoid calling
ctx.download_and_extract
on macOS.The text was updated successfully, but these errors were encountered: