Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase out overwriting of image's HOME variable. Remove /tekton/home. #2013

Closed
5 of 8 tasks
ghost opened this issue Feb 6, 2020 · 29 comments · Fixed by #4587
Closed
5 of 8 tasks

Phase out overwriting of image's HOME variable. Remove /tekton/home. #2013

ghost opened this issue Feb 6, 2020 · 29 comments · Fixed by #4587
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@ghost
Copy link

ghost commented Feb 6, 2020

Expected Behavior

Images with an expectation that HOME is set to something specific should work as intended.

Actual Behavior

Those images do not work as expected - Pipelines overwrites the HOME variable of an image with /tekton/home.

See #1836 for the original report on this issue.

So the plan is as follows:

  • ASAP add an opt-out flag to prevent $HOME being set to /tekton/home. This should be configurable through a ConfigMap.
  • Perform a release, announce deprecation.
  • Wait 9 months for beta deprecation policy to expire (~4 Dec 2020)
  • Update the code to make the flag opt-in instead of opt-out.
  • Perform a release, announcing the change in flag behaviour.
  • Wait 9 months for beta deprecation policy to expire (~10 Feb 2022)
  • Remove feature flag completely.
  • Perform a release, announcing the complete removal.
@ghost ghost self-assigned this Feb 6, 2020
@ghost ghost added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Feb 6, 2020
@ghost
Copy link
Author

ghost commented Feb 11, 2020

Adding a configmap field to allow the HOME overwrite to be disabled: https://github.com/tektoncd/pipeline/compare/master...sbwsg:opt-out-home-overwrite?expand=1

@ghost ghost mentioned this issue Feb 13, 2020
3 tasks
@chanseokoh
Copy link
Contributor

chanseokoh commented Feb 13, 2020

Thanks for doing this! This has been very annoying for Java people on Knative and GCB, because almost all the time, most Java applications get the genuine, actual user home through a platform-independent way that does not read $HOME. For example, Tekton pipelines run as root, and because the OS declares that the home of the user root is /root, the Maven build system creates a local artifact repository under /root/.m2 on Tekton (and GCB). BTW, the local Maven repository is crucial to cache in the Java world. (Gradle has the same problem for, e.g., ~/.gradle/caches.)

After all, changing the $HOME environment variable for a certain set of processes or shell sessions doesn't alter the fact that the user home of root declared by OS is /root. Some tools and programs (e.g., cd, understandably) make use of $HOME, but most Java applications don't.

Yet, Tekton auth supplies a Docker config file at /tekton/home/.docker/config.json, even if the actual home is /root. Java applications generally cannot find this kind of files based on user home.

That said, I am actually more interested in where Tekton will create the .docker/config.json file when this opt-out flag is set or not set. Will this still be at /tekton/home/.docker/config.json?

But I should say I don't like having to deal with the inconsistency between the actual home and the value of $HOME. As long as the OS says the home is /root, Maven will create /root/.m2. And Java applications expect a Docker config file at /root/.docker/config.json. As a stop-gap, I had to manually override a JVM property to make JVM believe the user home is (previously) /builder/home. And it got broken at some point, so I had to update it to /tekton/home. But these directory names are implementation details that can continue to change in the future and result in another breakage.

@chanseokoh
Copy link
Contributor

That said, I am actually more interested in where Tekton will create the .docker/config.json file when this opt-out flag is set or not set. Will this still be at /tekton/home/.docker/config.json?

What's the answer going forward, when the flag is set or not set?

@ghost
Copy link
Author

ghost commented Feb 19, 2020

Yet, Tekton auth supplies a Docker config file at /tekton/home/.docker/config.json, even if the actual home is /root

With this new flag Tekton will no longer interfere with HOME - it will be whatever you expect it to be when the container runs in a Pod.

Previously $HOME would have been set to /tekton/home but now it won't be. So I would expect $HOME/.docker/config.json to be written to /root/.docker/config.json if the user is root and the image doesn't specify its own HOME.

@ghost
Copy link
Author

ghost commented Mar 2, 2020

Moving this issue over to next milestone. In the next release we plan to flip the behaviour of the feature flag so that Tekton no longer overwrites the HOME environment variable by default.

@chanseokoh
Copy link
Contributor

Previously $HOME would have been set to /tekton/home but now it won't be. So I would expect $HOME/.docker/config.json to be written to /root/.docker/config.json if the user is root and the image doesn't specify its own HOME.

This doesn't seem true. I think we are not ready to flip the disable-home-env-overwrite flag. See #2165.

@ghost
Copy link
Author

ghost commented Mar 5, 2020

It sounds like the HOME env var isn't set at all by the image in that case. The go code for dockercreds simply does os.Getenv("HOME") and the absence of it here implies that this variable isn't defined at all. I'll take a look into the creds-init image and see what it defaults to.

Edit: ran the latest creds-init image locally and I do get a HOME var of /root in my env. Not sure yet what's happening. Will keep looking into it.

@chanseokoh
Copy link
Contributor

Edit: ran the latest creds-init image locally and I do get a HOME var of /root in my env. Not sure yet what's happening. Will keep looking into it.

I think creds-init should place the file at the home directories of my task images, not the HOME of the creds-init image. Multiple copies if the home directories are different.

@chanseokoh
Copy link
Contributor

chanseokoh commented Mar 5, 2020

I've given some more thoughts on this core HOME issue. Before we can actually flip disable-home-env-overwrite, I think this entire auth mechanism should be revisited and reworked.

In most contexts (including ours), I think the notion of a user home is really a runtime one. An image may suggest a certain directory as a "home" and a certain UID/user as a running user, but runtime platforms (as well as user configurations) can override the "suggested user". The home will then be "determined" by the user running the container.

For example, If I run an image as root, the home is /root, obviously. If I run it as nobody, the home is often defined as /home/nonexistent (and the directory doesn't physically exist.). If I run it as UID 12345, I actually don't know what I should call "home." (Different apps and OSes may have different notions of "home" in that case?)

This actually matters to me, because I am trying to figure out how to fix a related problem on OpenShift, which runs images as a random UID. Not only on OpenShift, but users can also change the running user on Tekton (e.g., using runAsNonRoot or runAsUser).

So, the following quotes in the Tekton auth doc becomes ambiguous.

In their native form, these secrets are unsuitable for consumption by Git and Docker. For Git, they need to be turned into (some form of) .gitconfig. For Docker, they need to be turned into a ~/.docker/config.json file.

When the Run executes, before steps execute, a ~/.ssh/config will be generated

When this Run executes, before steps execute, a ~/.gitconfig will be generated

When the Run executes, before steps execute, a ~/.docker/config.json will be generated

@chanseokoh
Copy link
Contributor

chanseokoh commented Mar 5, 2020

Another related issue?

[git-source-source-qxt27] 
{"level":"warn",
"ts":1583449549.563559,
"caller":"git/git.go:149",
"msg":"Unexpected error: creating symlink: symlink /tekton/home/.ssh /root/.ssh: file exists"}

This is a warning, so it doesn't break my task at least.

@ghost
Copy link
Author

ghost commented Mar 9, 2020

I've written up a design doc covering the solution you proposed @chanseokoh of having a fixed directory. Doc is here: https://docs.google.com/document/d/1SVuDt-SXPHymz41dveSXFSPrx5Z-Wzb9eHliJAyYg4o

I plan to present this to the tekton working group on Wednesday and will update my existing PR with any changes that get agreed on during that call.

@ghost
Copy link
Author

ghost commented Aug 14, 2020

/reopen
/lifecycle frozen

@tekton-robot
Copy link
Collaborator

@sbwsg: Reopened this issue.

In response to this:

/reopen
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Aug 14, 2020
@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 14, 2020
@vyom-soft
Copy link

Hello,
I an new to Tekton, I am facing the problem. How should I over this is there any approach? sample?

"image-digest-exporter-d6kxs] 2020/08/21 08:55:40 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied"

$ tkn version
Client version: 0.11.0
Pipeline version: v0.15.2
Triggers version: unknown
Thank you
Sanjeev

@didier-durand
Copy link

Hello,

Same here

"image-digest-exporter-d6kxs] 2020/08/21 08:55:40 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied"

How can this get fixed

Didier

@ghost
Copy link
Author

ghost commented Sep 29, 2020

Could you explain a bit about what is broken? What is the Task you're trying to run and what are its Steps? What version of k8s and tekton are you running? What are the PipelineResources you've configured? What does the Pod look like after the TaskRun finishes? What is the incorrect observed behaviour of your PipelineRun or TaskRun?

@didier-durand
Copy link

@sbwsg Scott, in the meantime since I wrote the above, I fixed my problem : see my comment in #2616 (comment). When I replaced 'docker.io' by 'index.docker.io' in the secret creation, the initial credential checks (before building) went ok. Then, I could build and push to Docker Hub. So, my problem is fixed.

@ghost
Copy link
Author

ghost commented Sep 29, 2020

OK, thanks for the update!

@ghost ghost mentioned this issue Mar 9, 2021
17 tasks
@ghost
Copy link
Author

ghost commented Apr 6, 2021

From owners meeting just now: next step is to loudly announce this upcoming change on the mailing list and slack.

tekton-robot pushed a commit that referenced this issue May 10, 2021
Prior to this commit Steps were given a default HOME
env var and a default workingDir. These defaults collide
with any value set by the Step's image Dockerfile.

This commit removes the default home and workingDir
overrides (except in those few cases where they're
still expected, like PipelineResources).

See https://groups.google.com/g/tekton-dev/c/C-PL8VYN51E/m/el5Fca_PDAAJ
for our tekton-dev announcement of this change.

See #1836 for the
original problem description and workingDir tracking issue.

See #2013 for the
HOME change tracking issue.

See https://github.com/tektoncd/pipeline/blob/main/docs/deprecations.md
for our documented dates for these deprecations.

See
https://github.com/tektoncd/pipeline/blob/main/api_compatibility_policy.md#alpha-beta-and-ga
for our beta deprecation policy.
,
@tmshn
Copy link

tmshn commented Sep 3, 2021

@sbwsg Let me ask one thing to clarify my concern; do you intended to stop mounting /tekton/home too? Or is it just about overwriting environmental variable?

The current implementation is the latter, but the the issue title "Remove /tekton/home." sounds implying former change to me.

pipeline/pkg/pod/pod.go

Lines 112 to 117 in f041ef5

if b.OverrideHomeEnv {
implicitEnvVars = append(implicitEnvVars, corev1.EnvVar{
Name: "HOME",
Value: pipeline.HomeDir,
})
}

@ghost
Copy link
Author

ghost commented Sep 3, 2021

@tmshn that's a very interesting question! I completely forgot that the issue title includes "Remove /tekton/home." And I didn't include a work item in the list at the top to remove the folder from our codebase.

I expect that we will completely remove the /tekton/home directory with the feature flag and environment variable on or after Feb 2022. However this is open to discussion - if there are critical reasons to keep the directory we should make sure we are aware of them. What factors are motivating your concerns?

@tmshn
Copy link

tmshn commented Sep 3, 2021

Simply because some of the docker images do not have home directory properly set, and we need to take extra care to examine each images (maybe not so big blocking issue, though).

For example, alpine/git image (used by official git-cli tasks) have nobody user, but its home directory is / where the user does not have write permission.

$ docker run --rm --entrypoint cat docker.io/alpine/git:v2.26.2@sha256:23618034b0be9205d9cc0846eb711b12ba4c9b468efdd8a59aac1d7b1a23363f /etc/passwd
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/mail:/sbin/nologin
news:x:9:13:news:/usr/lib/news:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
man:x:13:15:man:/usr/man:/sbin/nologin
postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin
cron:x:16:16:cron:/var/spool/cron:/sbin/nologin
ftp:x:21:21::/var/lib/ftp:/sbin/nologin
sshd:x:22:22:sshd:/dev/null:/sbin/nologin
at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin
squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin
xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin
games:x:35:35:games:/usr/games:/sbin/nologin
cyrus:x:85:12::/usr/cyrus:/sbin/nologin
vpopmail:x:89:89::/var/vpopmail:/sbin/nologin
ntp:x:123:123:NTP:/var/empty:/sbin/nologin
smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin
guest:x:405:100:guest:/dev/null:/sbin/nologin
nobody:x:65534:65534:nobody:/:/sbin/nologin

https://github.com/tektoncd/catalog/blob/4bf19372b791f1d9cd4c764ee2055ebfa7cdbd3f/task/git-cli/0.2/git-cli.yaml#L50

@ghost
Copy link
Author

ghost commented Sep 3, 2021

Ahh I see. We will need to audit the catalog tasks that use our images to make sure we don't break any existing. For those tasks which would be broken we can manually add a volumeMount with an emptyDir at /tekton/home?

@tmshn
Copy link

tmshn commented Sep 4, 2021

That sound fine!

jimmykarily pushed a commit to epinio/epinio that referenced this issue Oct 6, 2021
because in the latest tekton pipeline the `HOME` env variable is not
overridden by default (and in the future, not at all):

tektoncd/pipeline#2013
jimmykarily pushed a commit to epinio/epinio that referenced this issue Oct 11, 2021
because in the latest tekton pipeline the `HOME` env variable is not
overridden by default (and in the future, not at all):

tektoncd/pipeline#2013
@lbernick lbernick moved this to In Progress in Pipelines V1 Jan 11, 2022
@lbernick lbernick moved this from In Progress to Blocked in Pipelines V1 Jan 11, 2022
Repository owner moved this from Blocked to Done in Pipelines V1 Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants