Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add openshift/installer 'bazel build tarball' test to prow #1178

Conversation

sallyom
Copy link
Contributor

@sallyom sallyom commented Aug 14, 2018

@bbguimaraes another installer job for prow, thanks

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 14, 2018
rerun_command: "/test build-tarball"
context: ci/prow/build-tarball
always_run: true
skip_report: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the default value, so it's not really necessary.

@bbguimaraes
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2018
@openshift-merge-robot openshift-merge-robot merged commit 7805bae into openshift:master Aug 14, 2018
@openshift-ci-robot
Copy link
Contributor

@sallyom: Updated the job-config configmap using the following files:

  • key openshift-installer-presubmits.yaml using file ci-operator/jobs/openshift/installer/openshift-installer-presubmits.yaml

In response to this:

@bbguimaraes another installer job for prow, thanks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Aug 14, 2018

This job is failing with:

+ bazel --output_base=/tmp build tarball
Error: $USER is not set, and unable to look up name of current user: (error: 0): Success

I'm looking into it, but thought I'd post to this PR in case someone more familiar with the Prow execution environment understood what was going on without digging ;). Here's a successful run in Travis.

@wking
Copy link
Member

wking commented Aug 14, 2018

The error message is coming from this block, where Bazel is finding an unset $USER, falling back on getpwuid(getuid()) and getting either a NULL struct pointer or pw_name. I'm not sure how that would happen in Prow but not Travis, because in both cases Bazel is running inside a container. Maybe Prow has stricter seccomp settings or some such?

@bbguimaraes
Copy link
Contributor

@wking: from the man page:

The getpwnam() and getpwuid() functions return a pointer to a passwd structure,
or NULL if the matching entry is not found or an error occurs.  If an error
occurs, errno is set appropriately.

It means there's no entry for that uid in the passwd file, which makes sense
because containers in Openshift execute with arbitrary uids.

That seems like a strange requirement for a build system (but Bazel never
ceases to amaze me).

@wking
Copy link
Member

wking commented Aug 14, 2018

It looks like the container is explicitly setting a user:

# podman pull quay.io/coreos/tectonic-builder:bazel-v0.3
# podman history quay.io/coreos/tectonic-builder:bazel-v0.3 | grep USER
<missing>      5 months ago   /bin/sh -c #(nop) USER [bazel]                  0B

I'm not quite sure where that version of the image comes from, but I think the associated Dockerfile line is here. The USER Dockerfile command does not result in the USER environment variable being set:

# podman run --rm -it quay.io/coreos/tectonic-builder:bazel-v0.3 id
uid=1000(bazel) gid=1000(bazel) groups=1000(bazel)
# podman run --rm -it quay.io/coreos/tectonic-builder:bazel-v0.3 env | grep USER

@bbguimaraes
Copy link
Contributor

That is ignored when the pod is assigned a uid by the security policy in
Openshift (not sure about podman). I'm not familiar with Bazel, but you can
always set the USER variable yourself in the job/script if its value is
irrelevant.

@wking
Copy link
Member

wking commented Aug 14, 2018

That is ignored when the pod is assigned a uid by the security policy in Openshift...

Why would OpenShift care about the container-side UID (I can understand caring about the host-side UID)?

I'm not familiar with Bazel, but you can always set the USER variable yourself in the job/script if its value is irrelevant.

Grepping, it looks like the value may only be used for an output directory name. We're explicitly setting the output directory, and we aren't installing anything, so I expect we are safe to explicitly set USER. I'll work up a PR.

@wking
Copy link
Member

wking commented Aug 14, 2018

... so I expect we are safe to explicitly set USER...

Once we have a fix, I expect we'll want to update the Prow job config in this repo. But for testing fixes, I'll use local adjustments in the hack script itself. Setting USER did get us past the earlier error. Now we're dying with:

+ USER=bazel-testing bazel --output_base=/tmp build tarball
Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied

I'll keep adding command-line arguments and environment variables until I get the script to pass...

@wking
Copy link
Member

wking commented Aug 14, 2018

Setting both USER and HOME was sufficient to get Bazel up off the ground. But then a Python invocation crashed with:

ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site.py", line 554, in <module>
    main()
  File "/usr/lib/python2.7/site.py", line 536, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/lib/python2.7/site.py", line 237, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
    return get_config_vars().get(name)
  File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
    _CONFIG_VARS['userbase'] = _getuserbase()
  File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
    return env_base if env_base else joinuser("~", ".local")
  File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
    return os.path.expanduser(os.path.join(*args))
  File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
    userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 1000130000'

Python should be looking at $HOME first as well, so I'm not sure yet why it's falling back to the getpwuid call. Maybe we need to do something to get that environment variable through when Bazel drops to Python.

@stevekuznetsov
Copy link
Contributor

@smarterclayton can comment on security implications of container-side UUID and why OpenShift cares

@BenTheElder might have some comments on running bazel in containers for testing -- much of upstream k8s testing does that, but I am not familiar with it

@wking
Copy link
Member

wking commented Aug 14, 2018

Maybe we need to do something to get that environment variable through when Bazel drops to Python.

Bazel's build_tar invocation sets use_default_shell_env=True. That seems (at least from the design docs) to mean that only environment variables declared with --action_env are passed through to the action executable. I'll update the script to set HOME that way as well.

@wking
Copy link
Member

wking commented Aug 14, 2018

With --action_env as well, the build succeeds. I'll have a PR in both repos for this issue shortly.

@BenTheElder
Copy link

k/k prow jobs do typically set $HOME and $USER. For local bazel containers I've also found a fake passwd entry to be useful for eg python and git.

wking added a commit to wking/openshift-installer that referenced this pull request Aug 14, 2018
This allows callers to set build options [1] if needed.  The Prow
tests need this to set HOME to avoid [2]:

  ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site.py", line 554, in <module>
      main()
    File "/usr/lib/python2.7/site.py", line 536, in main
      known_paths = addusersitepackages(known_paths)
    File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
      user_site = getusersitepackages()
    File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
      user_base = getuserbase() # this will also set USER_BASE
    File "/usr/lib/python2.7/site.py", line 237, in getuserbase
      USER_BASE = get_config_var('userbase')
    File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
      return get_config_vars().get(name)
    File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
      _CONFIG_VARS['userbase'] = _getuserbase()
    File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
      return env_base if env_base else joinuser("~", ".local")
    File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
      return os.path.expanduser(os.path.join(*args))
    File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
      userhome = pwd.getpwuid(os.getuid()).pw_dir
  KeyError: 'getpwuid(): uid not found: 1000130000'

The chain for that crash is:

1. Bazel invokes build_tar via run_shell with
   use_default_shell_env=True [3], so only environment variables
   declared with --action_env [1] are exposed to build_tar.
2. Python sees $HOME is empty and falls back to
   pwd.getpwuid(os.getuid()).pw_dir [4].
3. OpenShift Prow containers execute with arbitrary container-side
   UIDs [5], so the getpwuid call fails.

With this commit, Prow can setup --action_env to work around its
arbitrary container-side UIDs [6].

[1]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options
[2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt
[3]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89
[4]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262
[5]: openshift/release#1178 (comment)
[6]: openshift/release#1185
wking added a commit to wking/openshift-release that referenced this pull request Aug 14, 2018
…ball

OpenShift Prow containers execute with arbitrary container-side UIDs
[1] for some security reason I don't understand.  That makes it hard
for Bazel to figure out where to put things, and we'd die with [2]:

  + bazel --output_base=/tmp build tarball
  Error: $USER is not set, and unable to look up name of current user: (error: 0): Success

as Bazel [3]:

1. Checked $USER and found it unset.
2. Fell back to getpwuid(getuid()) and found no entry matching the
   arbitrary container-side UID.

Setting USER is not sufficient; it results in errors like [4]:

  + USER=bazel-testing bazel --output_base=/tmp build tarball
  Error: mkdir('/.cache/bazel/_bazel_bazel-testing'): (error: 13): Permission denied

as Bazel tries to expand its default ~/.cache/bazel [5].  Setting HOME
addresses that, but then actions like tarball creation die with [6]:

  ERROR: /home/prow/go/src/github.com/openshift/installer/BUILD.bazel:106:1: error executing shell command: 'bazel-out/host/bin/external/bazel_tools/tools/build_defs/pkg/build_tar --flagfile=bazel-out/k8-fastbuild/bin/tf_bin.args' failed (Exit 1)
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site.py", line 554, in <module>
      main()
    File "/usr/lib/python2.7/site.py", line 536, in main
      known_paths = addusersitepackages(known_paths)
    File "/usr/lib/python2.7/site.py", line 272, in addusersitepackages
      user_site = getusersitepackages()
    File "/usr/lib/python2.7/site.py", line 247, in getusersitepackages
      user_base = getuserbase() # this will also set USER_BASE
    File "/usr/lib/python2.7/site.py", line 237, in getuserbase
      USER_BASE = get_config_var('userbase')
    File "/usr/lib/python2.7/sysconfig.py", line 582, in get_config_var
      return get_config_vars().get(name)
    File "/usr/lib/python2.7/sysconfig.py", line 533, in get_config_vars
      _CONFIG_VARS['userbase'] = _getuserbase()
    File "/usr/lib/python2.7/sysconfig.py", line 210, in _getuserbase
      return env_base if env_base else joinuser("~", ".local")
    File "/usr/lib/python2.7/sysconfig.py", line 196, in joinuser
      return os.path.expanduser(os.path.join(*args))
    File "/usr/lib/python2.7/posixpath.py", line 262, in expanduser
      userhome = pwd.getpwuid(os.getuid()).pw_dir
  KeyError: 'getpwuid(): uid not found: 1000130000'

This is the same issue we saw for Bazel, but is now because of Python
code [7].  Bazel's build_tar invocation uses run_shell with
use_default_shell_env=True [8], so to get HOME set there as well we
need to use --action_env [9] (I'm updating test-bazel-build-tarball.sh
to pass that argument through to Bazel).

[1]: openshift#1178 (comment)
[2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/1/build-log.txt
[3]: https://github.com/bazelbuild/bazel/blob/0.16.1/src/main/cpp/blaze_util_posix.cc#L654-L664
[4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/3/build-log.txt
[5]: https://docs.bazel.build/versions/master/output_directories.html#documentation-of-the-current-bazel-output-directory-layout
[6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/123/ci-pull-openshift-installer-bazel-build-tarball/4/build-log.txt
[7]: https://github.com/python/cpython/blob/1f34aece28d143edb94ca202e661364ca394dc8c/Lib/posixpath.py#L260-L262
[8]: https://github.com/bazelbuild/bazel/blob/0.16.1/tools/build_defs/pkg/pkg.bzl#L82-L89
[9]: https://docs.bazel.build/versions/master/command-line-reference.html#build-options
wking added a commit to wking/openshift-installer that referenced this pull request Aug 14, 2018
These have moved into Prow presubmit jobs:

* Terraform lint: openshift/release@82e00346 (Prow: Add Terraform Lint
  to openshift/installer, 2018-08-06, openshift/release#1124).
* YAML lint: openshift/release@457be2cd (Added prow yaml-lint job
  description for installer repo, 2018-08-02, openshift/release#1138).
* ShellCheck: openshift/release@e12a7a06 (Prow: Add shellcheck to
  openshift/installer, 2018-08-08, openshift/release#1131).
* Terraform format: openshift/release@f67e06e4 (openshift installer:
  add terraform fmt, 2018-08-04, openshift/release#1152).
* Go vet: openshift/release@71afdcca (Added go-vet prow job,
  2018-08-14, openshift/release#1181).
* Building the tarball: openshift/release@42a5a0d0 (add
  openshift/installer 'bazel build tarball' test to prow, 2018-08-13,
  openshift/release#1178).
@wking
Copy link
Member

wking commented Aug 22, 2018

@smarterclayton can comment on security implications of container-side UUID and why OpenShift cares

Ah, I think I found the reason over here:

By default, all containers that we try and launch within OpenShift, are set blocked from “RunAsAny” which basically means that they are not allowed to use a root user within the container. This prevents root actions such as chown or chmod from being run and is a sensible security precaution as, should a user be able to perform a local exploit to break out of the container, then they would not be running as root on the underlying container host. NB what about user-namespaces some of you are no doubt asking, these are definitely coming but the testing/hardening process is taking a while and whilst companies such as Red Hat are working hard in this space, there is still a way to go until they are ready for the mainstream.

That blog post is from 2016; so I'm not sure the user-namespace mainstream-ness caveat still applies. But that's probably still why OpenShift is doing this.

@wking
Copy link
Member

wking commented Aug 22, 2018

That blog post is from 2016; so I'm not sure the user-namespace mainstream-ness caveat still applies.

Looks like it does, since Kubernetes still has an open feature for user-namespace support: kubernetes/enhancements#127. That feature has been open for a while, but there's recent related activity in kubernetes/kubernetes#64005. Maybe it will land in 1.12...

@smarterclayton
Copy link
Contributor

Yeah, user namespaces are still not real (in that kube doesn't fully support them being enabled, or deal with the implications correctly when you try to run host level workloads). We could grant the ability to run root containers to the ci infra, but I'm also hesitant to do that because a) that's like giving up on security) and b) would encourage jobs that are dependent on that particular tweak.

Seths team is driving the fixes as you note.

@stevekuznetsov
Copy link
Contributor

All of the bazel jobs upstream use privileged containers FWIW. This gives us more reason to run a separate build cluster :)

@BenTheElder
Copy link

You should be able to run without privileged containers if you disabling the sandboxing IIRC, something like --spawn_strategy=local either as an extra flag to the build or in a .bazelrc as build --spawn_strategy=local. We don't do this because the sandboxing is important to ensuring correctness of known inputs for each step, without enforcing that we may cache incorrectly; and we use cluster-level isolation where we are concerned.

derekhiggins pushed a commit to derekhiggins/release that referenced this pull request Oct 24, 2023
Adds the option to override the default path via the
BAREMETAL_OPERATOR_PATH variable.

This is useful when you want to modify the checkout consumed
via BAREMETAL_OPERATOR_LOCAL_IMAGE, which ends up in $HOME
not the regular GOPATH location.
derekhiggins pushed a commit to derekhiggins/release that referenced this pull request Oct 24, 2023
* Make capbm path configurable

Similar to openshift#1178 - this means you can use the same local checkout
for local-capbm.sh and BAREMETAL_MACHINE_CONTROLLERS_LOCAL_IMAGE

* Update local-capbm.sh healthaddr port

When running local-bmo.sh and local-capbm.sh at the same time
the result is a port conflict on tcp/9440
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants