-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add openshift/installer 'bazel build tarball' test to prow #1178
add openshift/installer 'bazel build tarball' test to prow #1178
Conversation
rerun_command: "/test build-tarball" | ||
context: ci/prow/build-tarball | ||
always_run: true | ||
skip_report: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the default value, so it's not really necessary.
/lgtm |
@sallyom: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'm looking into it, but thought I'd post to this PR in case someone more familiar with the Prow execution environment understood what was going on without digging ;). Here's a successful run in Travis. |
The error message is coming from this block, where Bazel is finding an unset |
@wking: from the man page:
It means there's no entry for that uid in the passwd file, which makes sense That seems like a strange requirement for a build system (but Bazel never |
It looks like the container is explicitly setting a user: # podman pull quay.io/coreos/tectonic-builder:bazel-v0.3
# podman history quay.io/coreos/tectonic-builder:bazel-v0.3 | grep USER
<missing> 5 months ago /bin/sh -c #(nop) USER [bazel] 0B I'm not quite sure where that version of the image comes from, but I think the associated # podman run --rm -it quay.io/coreos/tectonic-builder:bazel-v0.3 id
uid=1000(bazel) gid=1000(bazel) groups=1000(bazel)
# podman run --rm -it quay.io/coreos/tectonic-builder:bazel-v0.3 env | grep USER |
That is ignored when the pod is assigned a uid by the security policy in |
Why would OpenShift care about the container-side UID (I can understand caring about the host-side UID)?
Grepping, it looks like the value may only be used for an output directory name. We're explicitly setting the output directory, and we aren't installing anything, so I expect we are safe to explicitly set |
Once we have a fix, I expect we'll want to update the Prow job config in this repo. But for testing fixes, I'll use local adjustments in the hack script itself. Setting
I'll keep adding command-line arguments and environment variables until I get the script to pass... |
Setting both
Python should be looking at |
@smarterclayton can comment on security implications of container-side UUID and why OpenShift cares @BenTheElder might have some comments on running |
Bazel's |
With |
k/k prow jobs do typically set |
This allows callers to set build options [1] if needed. The Prow tests need this to set HOME to avoid [2]: ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1) Traceback (most recent call last): File "/usr/lib/python2.7/site.py", line 554, in <module> main() File "/usr/lib/python2.7/site.py", line 536, in main known_paths = addusersitepackages(known_paths) File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages user_site = getusersitepackages() File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "/usr/lib/python2.7/site.py", line 237, in getuserbase USER_BASE = get_config_var('userbase') File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var return get_config_vars().get(name) File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars _CONFIG_VARS['userbase'] = _getuserbase() File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase return env_base if env_base else joinuser("~", ".local") File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser return os.path.expanduser(os.path.join(*args)) File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser userhome = pwd.getpwuid(os.getuid()).pw_dir KeyError: 'getpwuid(): uid not found: 1000130000' The chain for that crash is: 1. Bazel invokes build_tar via run_shell with use_default_shell_env=True [3], so only environment variables declared with --action_env [1] are exposed to build_tar. 2. Python sees $HOME is empty and falls back to pwd.getpwuid(os.getuid()).pw_dir [4]. 3. OpenShift Prow containers execute with arbitrary container-side UIDs [5], so the getpwuid call fails. With this commit, Prow can setup --action_env to work around its arbitrary container-side UIDs [6]. [1]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt [3]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89 [4]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262 [5]: openshift/release#1178 (comment) [6]: openshift/release#1185
…ball OpenShift Prow containers execute with arbitrary container-side UIDs [1] for some security reason I don't understand. That makes it hard for Bazel to figure out where to put things, and we'd die with [2]: + bazel --output_base=/tmp build tarball Error: $USER is not set, and unable to look up name of current user: (error: 0): Success as Bazel [3]: 1. Checked $USER and found it unset. 2. Fell back to getpwuid(getuid()) and found no entry matching the arbitrary container-side UID. Setting USER is not sufficient; it results in errors like [4]: + USER=bazel-testing bazel --output_base=/tmp build tarball Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied as Bazel tries to expand its default ~/.cache/bazel [5]. Setting HOME addresses that, but then actions like tarball creation die with [6]: ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1) Traceback (most recent call last): File "/usr/lib/python2.7/site.py", line 554, in <module> main() File "/usr/lib/python2.7/site.py", line 536, in main known_paths = addusersitepackages(known_paths) File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages user_site = getusersitepackages() File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "/usr/lib/python2.7/site.py", line 237, in getuserbase USER_BASE = get_config_var('userbase') File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var return get_config_vars().get(name) File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars _CONFIG_VARS['userbase'] = _getuserbase() File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase return env_base if env_base else joinuser("~", ".local") File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser return os.path.expanduser(os.path.join(*args)) File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser userhome = pwd.getpwuid(os.getuid()).pw_dir KeyError: 'getpwuid(): uid not found: 1000130000' This is the same issue we saw for Bazel, but is now because of Python code [7]. Bazel's build_tar invocation uses run_shell with use_default_shell_env=True [8], so to get HOME set there as well we need to use --action_env [9] (I'm updating test-bazel-build-tarball.sh to pass that argument through to Bazel). [1]: openshift#1178 (comment) [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/1/build-log.txt [3]: https://github.com/bazelbuild/bazel/blob/0.16.1/src/main/cpp/blaze_util_posix.cc#L654-L664 [4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/3/build-log.txt [5]: https://docs.bazel.build/versions/master/output_directories.html#documentation-of-the-current-bazel-output-directory-layout [6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt [7]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262 [8]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89 [9]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options
These have moved into Prow presubmit jobs: * Terraform lint: openshift/release@82e00346 (Prow: Add Terraform Lint to openshift/installer, 2018-08-06, openshift/release#1124). * YAML lint: openshift/release@457be2cd (Added prow yaml-lint job description for installer repo, 2018-08-02, openshift/release#1138). * ShellCheck: openshift/release@e12a7a06 (Prow: Add shellcheck to openshift/installer, 2018-08-08, openshift/release#1131). * Terraform format: openshift/release@f67e06e4 (openshift installer: add terraform fmt, 2018-08-04, openshift/release#1152). * Go vet: openshift/release@71afdcca (Added go-vet prow job, 2018-08-14, openshift/release#1181). * Building the tarball: openshift/release@42a5a0d0 (add openshift/installer 'bazel build tarball' test to prow, 2018-08-13, openshift/release#1178).
Ah, I think I found the reason over here:
That blog post is from 2016; so I'm not sure the user-namespace mainstream-ness caveat still applies. But that's probably still why OpenShift is doing this. |
Looks like it does, since Kubernetes still has an open feature for user-namespace support: kubernetes/enhancements#127. That feature has been open for a while, but there's recent related activity in kubernetes/kubernetes#64005. Maybe it will land in 1.12... |
Yeah, user namespaces are still not real (in that kube doesn't fully support them being enabled, or deal with the implications correctly when you try to run host level workloads). We could grant the ability to run root containers to the ci infra, but I'm also hesitant to do that because a) that's like giving up on security) and b) would encourage jobs that are dependent on that particular tweak. Seths team is driving the fixes as you note. |
All of the |
You should be able to run without privileged containers if you disabling the sandboxing IIRC, something like |
Adds the option to override the default path via the BAREMETAL_OPERATOR_PATH variable. This is useful when you want to modify the checkout consumed via BAREMETAL_OPERATOR_LOCAL_IMAGE, which ends up in $HOME not the regular GOPATH location.
* Make capbm path configurable Similar to openshift#1178 - this means you can use the same local checkout for local-capbm.sh and BAREMETAL_MACHINE_CONTROLLERS_LOCAL_IMAGE * Update local-capbm.sh healthaddr port When running local-bmo.sh and local-capbm.sh at the same time the result is a port conflict on tcp/9440
@bbguimaraes another installer job for prow, thanks