Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLNSRVCE-769: Add minio tenant for tekton results #443

Merged

Conversation

gabemontero
Copy link
Collaborator

Trying to help out @AndrienkoAleksandr propagate #419 while he has electricity issues

Simply rebased off of main and squashed the commits.

Let's see how CI does here.

@adambkaplan @Roming22 FYI

@gabemontero
Copy link
Collaborator Author

I'll start working on the shellcheck / yamllint items

@gabemontero
Copy link
Collaborator Author

/assign @AndrienkoAleksandr

@gabemontero
Copy link
Collaborator Author

static checks clean

@gabemontero
Copy link
Collaborator Author

hmmm can't seem to assign @AndrienkoAleksandr nor add him as a Reviewer

@gabemontero
Copy link
Collaborator Author

OK ci failed with

task plnsvc-setup has the status "Failed":

[ERROR] Pod app=minio not found by timeout 
+ exit 1
command terminated with exit code 1

There is an issue with my github ID and the pipeline-service CI cluster, and I cannot see the full logs. @Roming22 is working with me on slack to see if we can sort it out.

Depending on the outcome, we'll see then about local repro's, others looking at the logs and suggesting changes, etc.

developer/config.yaml Outdated Show resolved Hide resolved
@Roming22 Roming22 changed the title Add minio tenant for tekton results [WIP] Add minio tenant for tekton results Jan 18, 2023
@gabemontero gabemontero changed the title [WIP] Add minio tenant for tekton results Add minio tenant for tekton results Jan 18, 2023
@gabemontero
Copy link
Collaborator Author

have a debug commit up based on the full logs @Roming22 provided me (he's called out to someone else to see if they can fix my ci cluster auth issues)

@gabemontero
Copy link
Collaborator Author

Well the fact we can't dump events from these scripts should be adjusted for debug down the road:

- pod with label app=minio: ........................................Error from server (Forbidden): events is forbidden: User "system:serviceaccount:pipeline-service:pipeline-service-manager" cannot list resource "events" in API group "" in the namespace "tekton-results"
command terminated with exit code 1

I'll see about getting the rest of the logs tomorrow either with my hopefully fixed ID on the test cluster, or getting somebody to pastebin again.

@gabemontero
Copy link
Collaborator Author

gabemontero commented Jan 19, 2023

Looks like @AndrienkoAleksandr 's work around from #419 (comment)
did not work in CI:

  2m37s       Warning   FailedCreate           statefulset/storage-pool-0                 create Pod storage-pool-0-0 in StatefulSet storage-pool-0 failed error: pods "storage-pool-0-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.runAsUser: Invalid value: 1000610001: must be in the ranges: [1000630000, 1000639999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

I'll see about bumping the minio level, pick up that change that got merged up there.

That way, we can leverage openshift's randomized setting of the UID.

Other various items were also noted as forbidden in that event. Looks like we'll need to associated an SCC with some additional permissions to get that Pod's creation to pass, but I'll fix the UID first, then see.

@adambkaplan FYI ^^

@gabemontero
Copy link
Collaborator Author

Looks like @AndrienkoAleksandr 's work around from #419 (comment) did not work in CI:

  2m37s       Warning   FailedCreate           statefulset/storage-pool-0                 create Pod storage-pool-0-0 in StatefulSet storage-pool-0 failed error: pods "storage-pool-0-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.runAsUser: Invalid value: 1000610001: must be in the ranges: [1000630000, 1000639999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

I'll see about bumping the minio level, pick up that change that got merged up there.

That way, we can leverage openshift's randomized setting of the UID.

Other various items were also noted as forbidden in that event. Looks like we'll need to associated an SCC with some additional permissions to get that Pod's creation to pass, but I'll fix the UID first, then see.

@adambkaplan FYI ^^

UPDATE - only 4.5.7 is available through the normal channel. At least for now you have to pay for 4.5.8 (which has the PR 1403 change), per when I try to install from the console on my personal 4.11 cluster for the day.

Conversely, @AndrienkoAleksandr alternative minio PR has not been merged.

IIRC the uid range itself may be somewhat random, but I'll try something withing [1000630000, 1000639999] from the event and see what happens.

Otherwise, we may have to bump permissions on this thing beyond non-root.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from 553366e to 1b05848 Compare January 19, 2023 18:08
@gabemontero
Copy link
Collaborator Author

OK we have passing tests @adambkaplan @Roming22 @AndrienkoAleksandr !!

I'm going to squash a couple of commits specific to minio, and then generalize my debug additions a bit, and then post here about final reviews / merging.

I suspect minimally we'll want more testing perhaps, but first blush I suggest doing that in follow up PRs, as this one and @AndrienkoAleksandr #419 have been the lucky recipients of multiple rebases with other PRs merging ahead.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from 1b05848 to ea760cf Compare January 19, 2023 19:07
@gabemontero
Copy link
Collaborator Author

OK this is ready for review @adambkaplan @Roming22 @AndrienkoAleksandr

Copy link
Contributor

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. I ask that the debug commits be squashed into a single one so it's clearer that we're adding RBAC to debug failing tests.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from ea760cf to 7664797 Compare January 19, 2023 21:08
@gabemontero
Copy link
Collaborator Author

Generally looks good. I ask that the debug commits be squashed into a single one so it's clearer that we're adding RBAC to debug failing tests.

thanks and done @adambkaplan

@AndrienkoAleksandr
Copy link
Contributor

AndrienkoAleksandr commented Jan 20, 2023

@gabemontero Hello. Thanks a lot for help. But, sorry, I think my workaround with security context was wrong. If I understood docs correctly, Openshift provides different range for userId and groupId. So manually provided group and user id can work on the one cluster and it won't work in the another. Looks like we need to provide more complex security context stuff... But I need to read more docs and play with this staff to figure out how to provide more universal solution...
Ideally and the simplier it would be nice if we provide empty security context and Openshift generate default one, but minio operator broke this stuff minio/operator#1403 . And looks like my proposal to turn it back stuck minio/operator#1407 ...

@Roming22
Copy link
Contributor

With @AndrienkoAleksandr latest comment, I'm going to put a "WIP" in the PR description so we know not to merge it at the moment.

@Roming22 Roming22 changed the title Add minio tenant for tekton results [WIP] Add minio tenant for tekton results Jan 20, 2023
@gabemontero
Copy link
Collaborator Author

@gabemontero Hello. Thanks a lot for help. But, sorry, I think my workaround with security context was wrong. If I understood docs correctly, Openshift provides different range for userId and groupId. So manually provided group and user id can work on the one cluster and it won't work in the another. Looks like we need to provide more complex security

Yeah I think you may be correct @AndrienkoAleksandr, and that ties in with the suspicion I noted in #443 (comment)

So I'll engage in a couple of items for this, short term / today
0) bring up the operator catalog today, confirm if 4.5.7 is still the version in the stable channel (which does not have the upstream 1403 is in), or if it has gone up to 4.5.8 (which does have the 1403 change)

  1. then locally, simply comment out the ID override, and confirm the behavior at the version currently in the stable channel
  2. if need be, based on whatever the behavior / version is, look to see if escalated permissions for the tenant can get us past any hacks like this; i.e. I'll create a specific serviceaccount, add more privileged SCCs to it, possible turn off the non-root bit / add the privileged bit, etc. i.e. create a short term work around while we sort things out upstream

context stuff... But I need to read more docs and play with this staff to figure out how to provide more universal solution... Ideally and the simplier it would be nice if we provide empty security context and Openshift generate default one, but minio operator broke this stuff minio/operator#1403 . And looks like my proposal to turn it back stuck minio/operator#1407 ...

Yeah if 1403/4.5.8 is in fact not addressing things, then we push on your PR 1407, given them the errors we see on 4.11 with 4.5.8, provide yamls so that they can reproduce, perhaps given them instructions for bringing up a 4.11 cluster through OKD or one of our free tiers at cloud.openshift.com if that is possible, whatever.

Now, to "officially" determine the state of 4.5.8, until it lands in stable channel, I'm not sure how we test it in our CI here. But maybe there is some operator hub nuance I'm missing that @adambkaplan may be aware of (he has messed with that a lot more than I).

@Roming22 FYI and based on where the findings land, if we have an acceptable work around for this pod permission item, I'll remove the WIP tag and we'll go from there.

thanks everyone

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from afe03e4 to f725e67 Compare January 20, 2023 15:58
@gabemontero
Copy link
Collaborator Author

hmmm .... now it is failing in the "- Setup working directory:" step in dev_setup.sh before we even get to the tenant verification:

[plnsvc-setup : run-plnsvc-setup]   - Generating shared manifests:
[plnsvc-setup : run-plnsvc-setup]     - tekton-chains manifest:
[plnsvc-setup : run-plnsvc-setup]       + chains_kustomize=/tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/kustomization.yaml
[plnsvc-setup : run-plnsvc-setup]       + chains_namespace=/tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/namespace.yaml
[plnsvc-setup : run-plnsvc-setup]       + chains_secret=/tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/signing-secrets.yaml
[plnsvc-setup : run-plnsvc-setup]       + '[' '!' -e /tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/kustomization.yaml ']'
[plnsvc-setup : run-plnsvc-setup]       ++ dirname /tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/kustomization.yaml
[plnsvc-setup : run-plnsvc-setup]       + chains_tmp_dir=/tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/tmp
[plnsvc-setup : run-plnsvc-setup]       + mkdir -p /tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/tmp
[plnsvc-setup : run-plnsvc-setup]       ++ head -c 12 /dev/urandom
[plnsvc-setup : run-plnsvc-setup]       ++ base64
[plnsvc-setup : run-plnsvc-setup]       + cosign_passwd=PASiWB5j53bH1mS5
[plnsvc-setup : run-plnsvc-setup]       + echo -n PASiWB5j53bH1mS5
[plnsvc-setup : run-plnsvc-setup]       + cosign_image=quay.io/redhat-appstudio/appstudio-utils:eb94f28fe2d7c182f15e659d0fdb66f87b0b3b6b
[plnsvc-setup : run-plnsvc-setup]       + podman run --rm --env COSIGN_PASSWORD=PASiWB5j53bH1mS5 --volume /tmp/tmp.9tCe0PagE2/credentials/manifests/compute/tekton-chains/tmp:/workspace:z --workdir /workspace --entrypoint /usr/bin/cosign quay.io/redhat-appstudio/appstudio-utils:eb94f28fe2d7c182f15e659d0fdb66f87b0b3b6b generate-key-pair
[plnsvc-setup : run-plnsvc-setup]       Error: crun: writing file `/proc/994/setgroups`: Operation not permitted: OCI permission denied
[plnsvc-setup : run-plnsvc-setup] command terminated with exit code 126

there has been 1 commit since yesterday, though I'm not sure if it has any bearing here

sidetracked looking into this after lunch

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from f725e67 to e17ed2c Compare January 20, 2023 18:26
@AndrienkoAleksandr
Copy link
Contributor

AndrienkoAleksandr commented Jan 25, 2023

+1 to control operator version. It will save us from unexpected bugs.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from 7b5feda to 784f5c2 Compare January 26, 2023 14:20
@gabemontero
Copy link
Collaborator Author

fyi @adambkaplan @AndrienkoAleksandr and myself finally had an opportunity to have a voice to voice conversation. Net:

  • moving off helm, we we can survive with older minio version for now
  • this PR will not set anything security context related, and establish use of SCC and a dedicated SA for the tenant

Assuming CI passes, I'll remove the WIP designation, make sure @Roming22 and team have no more comments, and we'll merge.

@AndrienkoAleksandr can than pursue any tuning / post start up work arounds needed in follow up PRs.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from 784f5c2 to a9c0a08 Compare January 26, 2023 14:45
@gabemontero gabemontero changed the title [WIP] Add minio tenant for tekton results Add minio tenant for tekton results Jan 26, 2023
@gabemontero
Copy link
Collaborator Author

OK all green

@Roming22 - from api team perspective we are good here per #443 (comment); do you want one more round of infra team review, or can one of us hit the merge button?

@adambkaplan
Copy link
Contributor

@gabemontero perhaps squash commits, then I'll let @Roming22 review so we're in sync.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from a9c0a08 to 473f2d5 Compare January 26, 2023 16:42
@gabemontero
Copy link
Collaborator Author

@gabemontero perhaps squash commits, then I'll let @Roming22 review so we're in sync.

commits squashed @adambkaplan

@gabemontero gabemontero changed the title Add minio tenant for tekton results PLNSRVCE-769: Add minio tenant for tekton results Jan 26, 2023
Comment on lines +3 to +6
kind: Tenant
metadata:
name: storage
namespace: tekton-results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to make the Tentant a dev overlay?

Copy link
Contributor

@Roming22 Roming22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what is the role of operator/gitops/argocd/pipeline-service/tekton-results/overlays/dev.

DEPENDENCIES.md Outdated Show resolved Hide resolved
DEPENDENCIES.md Outdated Show resolved Hide resolved
developer/config.yaml Outdated Show resolved Hide resolved
developer/openshift/dev_setup.sh Outdated Show resolved Hide resolved
developer/openshift/dev_setup.sh Outdated Show resolved Hide resolved
operator/gitops/sre/credentials/secrets/README.md Outdated Show resolved Hide resolved
operator/gitops/sre/credentials/secrets/README.md Outdated Show resolved Hide resolved
if [ ! -e "$results_kustomize" ]; then
results_dir="$(dirname "$results_kustomize")"
mkdir -p "$results_dir"
if [[ -z $TEKTON_RESULTS_DATABASE_USER || -z $TEKTON_RESULTS_DATABASE_PASSWORD ]]; then
printf "[ERROR] Tekton results database variable is not set, either set the variables using \n \
the config.yaml under tekton_results_db \n \
Or create '%s' \n" "$results_minio_secret" >&2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you mix up $results_minio_secret and $results_secret in this block and following block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again I'll need to defer to @AndrienkoAleksandr to respond and comment on his original intent here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabemontero remove this line, please.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed / pushed

Or create '%s' \n" "$results_secret" >&2
exit 1
fi

kubectl create namespace tekton-results --dry-run=client -o yaml > "$results_namespace"
kubectl create secret generic -n tekton-results tekton-results-database --from-literal=DATABASE_USER="$TEKTON_RESULTS_DATABASE_USER" --from-literal=DATABASE_PASSWORD="$TEKTON_RESULTS_DATABASE_PASSWORD" --dry-run=client -o yaml > "$results_secret"

yq e -n '.resources += ["namespace.yaml", "tekton-results-secret.yaml"]' > "$results_kustomize"
echo "---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid inlining yaml files as it prevents linting. Have a template that you modify on the fly, preferably using yq.

@@ -23,6 +23,11 @@ SCRIPT_DIR="$(
pwd
)"

ROOT_DIR=$(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible please call it PROJECT_DIR.

@gabemontero
Copy link
Collaborator Author

ok I've attempted to address, though I'm not crazy confident, @adambkaplan 's #443 (comment)

but given @Roming22 's additional comments, where some of those are getting even more into @AndrienkoAleksandr 's original changes/commits from #419 (which had received prior reviews from various folks) that I just pulled over, and with the new sprint and my new assignment there, plus the openshift work I have to pick up during this sprint's timeframe, it is beginning to feel like my offer to assist @AndrienkoAleksandr may have reached the end of its viability.

either he takes this back over full time, assuming he has the cycles, or we apply other resources to help, or something else

but I'll report back one more time today after the current CI run completes, and we'll go from there.

@gabemontero
Copy link
Collaborator Author

It's unclear to me what is the role of operator/gitops/argocd/pipeline-service/tekton-results/overlays/dev.

yeah that is a @AndrienkoAleksandr question to answer

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from 5409f1d to 7d0ee47 Compare January 26, 2023 22:03
…ot security context constraint, and dumping subscription information

add access to events for debug
add access to pod/logs for debug
add warning events to deployment/pod error debug
@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from 7d0ee47 to c24b691 Compare January 26, 2023 22:14
@AndrienkoAleksandr
Copy link
Contributor

AndrienkoAleksandr commented Jan 26, 2023

It's unclear to me what is the role of operator/gitops/argocd/pipeline-service/tekton-results/overlays/dev

We wanted to provide minio only for dev flow. For production we wanted to use amazon S3 log storage. So for me it was natural to split all minio related changes and move it to dev folder. I thought that we will create one more overlay folder in the another staging cluster project, where will be described aws S3 storage secret and env variables for tekton-results api server to connect to this storage. If I misunderstood how it should be, you can simply revert overlays for tekton-results. Then we can define aws changes in the follow up prs...

@gabemontero
Copy link
Collaborator Author

My attempt to move the application of overlays/dev did not work:

Name: "tekton-results", Namespace: "tekton-results"
from server for: "/source/operator/gitops/argocd/pipeline-service/tekton-results/overlays/dev": routes.route.openshift.io "tekton-results" is forbidden: User "system:serviceaccount:pipeline-service:pipeline-service-manager" cannot get resource "routes" in API group "route.openshift.io" in the namespace "tekton-results"
command terminated with exit code 1

I'll try just reverting the folder altogether per @AndrienkoAleksandr comment ^^ and we can see where we are at tommorrow AM.

@gabemontero gabemontero force-pushed the local-upstream-419-copy branch from c24b691 to a1635f1 Compare January 27, 2023 01:14
@gabemontero
Copy link
Collaborator Author

My attempt to move the application of overlays/dev did not work:

Name: "tekton-results", Namespace: "tekton-results"
from server for: "/source/operator/gitops/argocd/pipeline-service/tekton-results/overlays/dev": routes.route.openshift.io "tekton-results" is forbidden: User "system:serviceaccount:pipeline-service:pipeline-service-manager" cannot get resource "routes" in API group "route.openshift.io" in the namespace "tekton-results"
command terminated with exit code 1

I'll try just reverting the folder altogether per @AndrienkoAleksandr comment ^^ and we can see where we are at tommorrow AM.

So reverting of overlay/devs, along with reverting the bitwarden related changes, still passes CI, meaning the tenant pod at least comes up.

I'm about to push some of the minor of @Roming22 comments.

On a couple of @Roming22 's questions, I have had to defer to @AndrienkoAleksandr to explain his original intent. I made comments in those threads when that was needed.

The in-line yaml and renaming a bash env var are an uptick up, I'll address those if time permits today while I get back to my official assignments.

@Roming22
Copy link
Contributor

@AndrienkoAleksandr If minio is for dev only, and is used to satisfy a pre-requisite to deploy pipeline-service, then the manifests should be in developer/openshift/operators, and deployed before deploying pipeline-service. If minio is used as fallback during deployment if another storage backend is not not found, then it can stay where it is.

Your comment on adding an overlay for the staging cluster is worrisome. You are mistaking this repository for a gitops repository managing pipeline-service clusters, which it is not. This repository should not be aware of any cluster operating pipeline-service. Any configuration related to the staging cluster should go in infra-deployments.

@AndrienkoAleksandr
Copy link
Contributor

AndrienkoAleksandr commented Jan 27, 2023

Your comment on adding an overlay for the staging cluster is worrisome. You are mistaking this repository for a gitops repository managing pipeline-service clusters, which it is not. This repository should not be aware of any cluster operating pipeline-service. Any configuration related to the staging cluster should go in infra-deployments.

@Roming22, for staging cluster configuration we have separated issue. And we are not going to save staging configuration in the pipeline-service project. I only wanted to describe flow in general - what we want to see in the end after some amount of the pull requests. For dev purpose we want to have minio, for staging we want to use another S3 storage. In both cases we need overlay tekton-results to apply desired storage.

@Roming22 Roming22 merged commit 3e3506d into openshift-pipelines:main Jan 27, 2023
@gabemontero gabemontero deleted the local-upstream-419-copy branch January 27, 2023 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants