Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darwin: download_and_extract fails on archive containing file with unicode name #7055

Closed
jayconrod opened this issue Jan 7, 2019 · 7 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@jayconrod
Copy link
Contributor

jayconrod commented Jan 7, 2019

Description of the problem / feature request:

The Starlark repository_ctx.download_and_extract method fails when it's asked to extract an archive that contains files with unusual unicode characters in their name. Specifically, in the Go 1.12b1 SDK, there is a test file named test/fixedbugs/issue27836.dir/Äfoo.go. When Bazel is asked to extract the SDK, it fails:

hello $ bazel fetch @go_sdk//...
INFO: Invocation ID: ef42f18b-4f19-4508-8e2c-268f4b2ac830
Loading: 0 packages loaded
ERROR: Traceback (most recent call last):
	File "/private/var/tmp/_bazel_jayconrod/8b040ee1d8994791263025bc89f96607/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
		_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
	File "/private/var/tmp/_bazel_jayconrod/8b040ee1d8994791263025bc89f96607/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
		ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
java.io.FileNotFoundException: /private/var/tmp/_bazel_jayconrod/8b040ee1d8994791263025bc89f96607/external/go_sdk/test/fixedbugs/issue27836.dir/A?foo.go (No such file or directory)
Loading: 0 packages loaded
Loading: 0 packages loaded

This file is not part of the build. It's simply part of the archive we need to extract.

This seems to affect macOS. I'm on an APFS file system; not sure about HFS+. On Linux, the file is extracted as ''$'\304''foo.go' (at least that's what ls spits out). Windows works, but the Windows SDK is a .zip file, so probably a different code path.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a WORKSPACE file like this:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "io_bazel_rules_go",
    sha256 = "7be7dc01f1e0afdba6c8eb2b43d2fa01c743be1b9273ab1eaf6c233df078d705",
    urls = ["https://github.com/bazelbuild/rules_go/releases/download/0.16.5/rules_go-0.16.5.tar.gz"],
)

load("@io_bazel_rules_go//go:def.bzl", "go_download_sdk", "go_register_toolchains", "go_rules_dependencies")

go_download_sdk(
    name = "go_sdk",
    sdks = {
        "darwin_amd64": ("go1.12beta1.darwin-amd64.tar.gz", "e49bf83ae10b2232d2efa918f0e9df1d76f93a0c6b0ea18c11edd9ef9defa505"),
        "linux_amd64": ("go1.12beta1.linux-amd64.tar.gz", "65bfd4a99925f1f85d712f4c1109977aa24ee4c6e198162bf8e819fdde19e875"),
    },
)

go_rules_dependencies()

go_register_toolchains()

Run this command:

bazel fetch @go_sdk//...

What operating system are you running Bazel on?

macOS 10.14.2

What's the output of bazel info release?

release 0.21.0

Have you found anything relevant by searching the web?

Relevant issues:

Any other information, logs, or outputs that you want to share?

This will break rules_go when Go 1.12 ships in February. We'll add a workaround to avoid calling ctx.download_and_extract on macOS.

@evie404
Copy link
Contributor

evie404 commented Jan 10, 2019

I was able to reproduce this by adding a file with unicode filename ÄfooUnicodeNamedFile in this test fixture: https://github.com/bazelbuild/bazel/blob/87943084a2fcf080804b740ac7bcbb5350b29332/src/test/java/com/google/devtools/build/lib/rules/repository/test_decompress_archive.tar.gz

jayconrod added a commit to bazel-contrib/rules_go that referenced this issue Jan 19, 2019
@bttk
Copy link
Contributor

bttk commented Jan 21, 2019

It's not only macOS. I have a similar issue on Linux

  • OS: Chrome OS Crostini on 11316.82.0 (Official Build) beta-channel eve
  • Bazel: 0.21.0

WORKSPACE

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

git_repository(
    name = "io_bazel_rules_go",
    commit = "72dff4e33d24aac4953ee794815dc569af0d8093",
    remote = "https://github.com/bazelbuild/rules_go.git",
)


load("@io_bazel_rules_go//go:def.bzl", "go_register_toolchains", "go_rules_dependencies", "go_download_sdk")

go_download_sdk(
    name = "go_sdk",
    sdks = {
        "linux_amd64": ("go1.12beta2.linux-amd64.tar.gz",
            "9e4884b46a72e0558187a8af6e8733e039432df1b755f14b361f18b63fa5a63e"),
    }
)

go_rules_dependencies()

go_register_toolchains()

Filesystem

/dev/vdb on / type btrfs (rw,relatime,discard,space_cache,user_subvol_rm_allowed,subvolid=266,subvol=/lxd/storage-pools/default/containers/penguin/rootfs)

Log

$ (bazel fetch @go_sdk//...) 2>&1 | cat
Starting local Bazel server and connecting to it...
INFO: Invocation ID: e1a6658c-bf71-4a2e-8570-99fea7a399c0
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
INFO: Rule 'io_bazel_rules_go' modified arguments {"shallow_since": "1548088207 -0500"}
Loading: 0 packages loaded
Loading: 0 packages loaded
ERROR: Traceback (most recent call last):
        File "/home/bttk/.cache/bazel/_bazel_bttk/93bb7f0c033ed05a01395d82f0271371/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
                _remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
        File "/home/bttk/.cache/bazel/_bazel_bttk/93bb7f0c033ed05a01395d82f0271371/external/io_bazel_rules_go/go/private/sdk.bzl", line 129, in _remote_sdk
                ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/bttk/.cache/bazel/_bazel_bttk/93bb7f0c033ed05a01395d82f0271371/external/go_sdk/test/fixedbugs/issue27836.dir/�foo.go
Loading: 0 packages loaded
Loading: 0 packages loaded

@bttk
Copy link
Contributor

bttk commented Jan 30, 2019

Update: I managed to workaround the problem by making these changes:

$HOME/.bazelrc

startup --host_jvm_args="-Dsun.jnu.encoding=UTF-8" --host_jvm_args="-Dfile.encoding=UTF-8"

Ignore FileNotFound in CompressedTarFunction.java

          } else {
            Files.copy(
                tarStream, filename.getPathFile().toPath(), StandardCopyOption.REPLACE_EXISTING);
            try {
              filename.chmod(entry.getMode());

              // This can only be done on real files, not links, or it will skip the reader to
              // the next "real" file to try to find the mod time info.
              Date lastModified = entry.getLastModifiedDate();
              filename.setLastModifiedTime(lastModified.getTime());
            } catch (FileNotFoundException e) {
              // ignore
            }
          }
        }

@aiuto aiuto added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. and removed team-Bazel General Bazel product/strategy issues labels Feb 21, 2019
@dslomov dslomov added type: bug P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Mar 5, 2019
benjaminp added a commit to benjaminp/bazel that referenced this issue Mar 19, 2019
Bazel's VFS classes make the assumption that all filenames are encoded with latin-1. That theoretically allows roundtripping any sort of horrible byte pattern a Unix filesystem can produce through Bazel's Path class. This scheme falls apart, though, when trying to use the JDK I/O libraries. The filename encoding assumed by the JDK I/O libraries comes from the sun.jnu.encoding property, which can't be overriden with the normal -D JVM command line syntax. The Bazel client still tries quite hard to force this property to be latin-1: https://github.com/bazelbuild/bazel/blob/6641ad986f436926a75b31b47314c193a9a7e032/src/main/cpp/blaze.cc#L1467-L1473 But even a fusillade of 4 environmental variables is sometimes not enough. On macOS, the JDK simply hardcodes UTF-8 as sun.jnu.encoding. Even on Linux, if a the en_US.ISO-8859-1 locale isn't installed, glibc will fall back to an ASCII encoding. Since there's no public way to create a JDK FileOutputStream from either a byte[] filename or a raw file descriptor, I conclude the only workaround is to implement open() and write() in Bazel's unix_jni. This CL does that.

We should probably implement a native file InputStream, too, for completeness. However, as merely implementing OutputStream fixes the relevant issue, I'm only doing that in this CL.

Fixes bazelbuild#7055.
benjaminp added a commit to benjaminp/bazel that referenced this issue Mar 19, 2019
Bazel's VFS classes make the assumption that all filenames are encoded with latin-1. That theoretically allows roundtripping any sort of horrible byte pattern a Unix filesystem can produce through Bazel's Path class. This scheme falls apart, though, when trying to use the JDK I/O libraries. The filename encoding assumed by the JDK I/O libraries comes from the sun.jnu.encoding property, which can't be overriden with the normal -D JVM command line syntax. The Bazel client still tries quite hard to force this property to be latin-1: https://github.com/bazelbuild/bazel/blob/6641ad986f436926a75b31b47314c193a9a7e032/src/main/cpp/blaze.cc#L1467-L1473 But even a fusillade of 4 environmental variables is sometimes not enough. On macOS, the JDK simply hardcodes UTF-8 as sun.jnu.encoding. Even on Linux, if a the en_US.ISO-8859-1 locale isn't installed, glibc will fall back to an ASCII encoding. Since there's no public way to create a JDK FileOutputStream from either a byte[] filename or a raw file descriptor, I conclude the only workaround is to implement open() and write() in Bazel's unix_jni. This CL does that.

We should probably implement a native file InputStream, too, for completeness. However, as merely implementing OutputStream fixes the relevant issue, I'm only doing that in this CL.

Fixes bazelbuild#7055.
@mikedanese
Copy link

That test is also causing problems on my non-corp linux machine.

$ bazel info java-runtime
OpenJDK Runtime Environment (build 11.0.2+7-LTS) by Azul Systems, Inc.
$ bazel version
Build label: 0.23.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Mar 11 16:47:03 2019 (1552322823)
Build timestamp: 1552322823
Build timestamp as int: 1552322823
$ bazel build --config clang-asan //kube/server/...   
ERROR: While resolving toolchains for target //kube/server:kube-frontproxy: invalid registered toolchain '@go_sdk//:go_darwin_386': no such package '@go_sdk//': Traceback (most recent call last):
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
                _remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
                ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
ERROR: Analysis of target '//kube/server:binaries' failed; build aborted: invalid registered toolchain '@go_sdk//:go_darwin_386': no such package '@go_sdk//': Traceback (most recent call last):
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
                _remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
                ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
INFO: Elapsed time: 3.790s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (10 packages loaded, 49 targets configured)
    Fetching @go_sdk; fetching
$ bazel --host_jvm_args="-Dsun.jnu.encoding=en_US.UTF-8 -Dfile.encoding=en_US.UTF-8" build --config clang-asan //kube/server/...
WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE.
ERROR: While resolving toolchains for target //kube/server:kube-frontproxy: invalid registered toolchain '@go_sdk//:go_android_386': no such package '@go_sdk//': Traceback (most recent call last):
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
                _remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
                ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
ERROR: Analysis of target '//kube/server:binaries' failed; build aborted: invalid registered toolchain '@go_sdk//:go_android_386': no such package '@go_sdk//': Traceback (most recent call last):
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 51
                _remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
        File "/home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/io_bazel_rules_go/go/private/sdk.bzl", line 113, in _remote_sdk
                ctx.download_and_extract(url = urls, stripPrefix = strip_pr..., ...)
Malformed input or input contains unmappable characters: /home/mike/.cache/bazel/_bazel_mike/1a5c0881e2f20911e25c4b3113a4557d/external/go_sdk/test/fixedbugs/issue27836.dir/oo.go
INFO: Elapsed time: 2.772s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
    Fetching @go_sdk; fetching

My locale is set to en_US.utf-8 and the looks fine when I poked at it. I've tried getting bazel to use UTF-8 to no affect.

@bttk
Copy link
Contributor

bttk commented Mar 25, 2019

@mikedanese If you can, use a newer bazelbuild/rules_go. Release 0.18.1 has a workaround:
bazel-contrib/rules_go@a477529

benjaminp added a commit to benjaminp/bazel that referenced this issue Mar 25, 2019
Bazel's VFS classes make the assumption that all filenames are encoded with latin-1. That theoretically allows roundtripping any sort of horrible byte pattern a Unix filesystem can produce through Bazel's Path class. This scheme falls apart, though, when trying to use the JDK I/O libraries. The filename encoding assumed by the JDK I/O libraries comes from the sun.jnu.encoding property, which can't be overriden with the normal -D JVM command line syntax. The Bazel client still tries quite hard to force this property to be latin-1: https://github.com/bazelbuild/bazel/blob/6641ad986f436926a75b31b47314c193a9a7e032/src/main/cpp/blaze.cc#L1467-L1473 But even a fusillade of 4 environmental variables is sometimes not enough. On macOS, the JDK simply hardcodes UTF-8 as sun.jnu.encoding. Even on Linux, if a the en_US.ISO-8859-1 locale isn't installed, glibc will fall back to an ASCII encoding. Since there's no public way to create a JDK FileOutputStream from either a byte[] filename or a raw file descriptor, I conclude the only workaround is to implement open() and write() in Bazel's unix_jni. This CL does that.

We should probably implement a native file InputStream, too, for completeness. However, as merely implementing OutputStream fixes the relevant issue, I'm only doing that in this CL.

Fixes bazelbuild#7055.
@mikedanese
Copy link

mikedanese commented Mar 25, 2019 via email

@emranbm
Copy link
Contributor

emranbm commented Dec 7, 2019

Is this bug fixed?
Here is a TODO that depends on fixing this bug:
https://github.com/bazelbuild/rules_go/blob/3762b89ad8b1d71007a4a07b194a48d505613c15/go/private/sdk.bzl#L141

@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants