Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from 2.12.2 to 2.13.1 failed - mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied #1775

Open
3 tasks done
craph opened this issue Mar 15, 2024 · 23 comments

Comments

@craph
Copy link
Contributor

craph commented Mar 15, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

Unable to upgrade from 2.12.2 to 2.13.1

mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

CrashLoopBackOff in the Postgres container.

AWX Operator version

2.13.1

AWX version

24.0.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

v1.25.16+rke2r1

Modifications

no

Steps to reproduce

upgrade from 2.12.2 to 2.13.1

Expected results

migration from postgres 13 to 15 should work without any permissions issues

Actual results

Unable to upgrade from 2.12.2 to 2.13.1

mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

CrashLoopBackOff in the Postgres container.

Additional information

No response

Operator Logs

No response

@craph
Copy link
Contributor Author

craph commented Mar 15, 2024

Looks similar to #1770

@craph
Copy link
Contributor Author

craph commented Mar 15, 2024

In the documentation from here there is this paragraph :

Notice: When mouting a directory from the host into the container, ensure that the mounted directory has the appropriate permissions and that the owner and group of the directory matches the user UID or name which is running inside the container.

Typically processes in container run under UID 26, so -- on GNU/Linux

is there any reason to change container image from official postgres to sclorg one ?

Thank you very much for your help.

@craph
Copy link
Contributor Author

craph commented Mar 15, 2024

In the Dockerfile here we can see :

# This image must forever use UID 26 for postgres user so our volumes are
# safe in the future. This should *never* change, the last test is there
# to make sure of that.
RUN  { yum -y module enable postgresql:15 || :; } && \
    INSTALL_PKGS="rsync tar gettext bind-utils nss_wrapper postgresql-server postgresql-contrib" && \
    INSTALL_PKGS="$INSTALL_PKGS pgaudit" && \
    yum -y --setopt=tsflags=nodocs install $INSTALL_PKGS && \
    rpm -V $INSTALL_PKGS && \
    postgres -V | grep -qe "$POSTGRESQL_VERSION\." && echo "Found VERSION $POSTGRESQL_VERSION" && \
    yum -y clean all --enablerepo='*' && \
    localedef -f UTF-8 -i en_US en_US.UTF-8 && \
    test "$(id postgres)" = "uid=26(postgres) gid=26(postgres) groups=26(postgres)" && \
    mkdir -p /var/lib/pgsql/data && \
    /usr/libexec/fix-permissions /var/lib/pgsql /var/run/postgresql

So if I understand it correctly, by default in the postgres 15 container the postgres user have it's uid equals to 26.

So, I don't understand why, by default during the installation of postgres 15 we have this kind of issue :

mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

to run the installation of awx or upgrade awx I'm using this script :

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  # Find the latest tag here: https://github.com/ansible/awx-operator/releases
  - github.com/ansible/awx-operator/config/default?ref=2.13.1
  # Add this extra line:
  - awx-demo.yaml

# Set the image tags to match the git version from above
images:
  - name: quay.io/ansible/awx-operator
    newTag: 2.13.1

# Specify a custom namespace in which to install AWX
namespace: awx

and here is the awx-demo.yml file :

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx-demo
  namespace: awx
spec:
  ingress_type: ingress
  hostname: myawxserver
  postgres_storage_class: longhorn

And to apply :

kustomize build . | kubectl apply -f -

@craph craph changed the title Upgrade from 2.12.2 to 2.13.1 failed Upgrade from 2.12.2 to 2.13.1 failed - mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied Mar 15, 2024
@craph
Copy link
Contributor Author

craph commented Mar 15, 2024

@fosterseth is your patch from #1770 (comment) can work for my issue too ?

Is it possible to have an auto init for the postgres data path / config ? because it looks like UID = 26 is a prerequisites for postgres in the repository from sclorg so users shouldn't have to add custom spec to have it working in case of upgrade from 2.12.2 to 2.13.1, right ?

Thank you very much for your help

@craph
Copy link
Contributor Author

craph commented Mar 18, 2024

@kurokobo, @TheRealHaoLiu , @fosterseth , how can I proceed to solve the issue in my case because I have postgres_storage_class: longhorn and the PVC is created automatically. So I can't manage it.

How can we solve this issue when the default storage class is : longhorn ?

Thank you very much for your help.

@craph
Copy link
Contributor Author

craph commented Mar 18, 2024

After creating a dedicated pod to the same PVC to fix the permissions issue, I lost all my AWX data 😢 😢 Why ??
image

Moreover, the admin user and password as been reset because all my data haven't been migrated.

Why that ?

I still have the old postgres 13 pvc, is it possible to redeploy awx-operator in version 2.12.2 to use the old pvc ?

> kubectl get pvc -n awx
NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
postgres-13-awx-demo-postgres-13-0   Bound    pvc-eac8a8d5-6d74-4d37-819b-ee89154cd60a   8Gi        RWO            longhorn       213d
postgres-15-awx-demo-postgres-15-0   Bound    pvc-1ebb7d11-d2f5-4f50-9bc6-ca2e045f6031   8Gi        RWO            longhorn       3d6h

@mateuszdrab
Copy link
Contributor

Hey @craph

I also suffered this so I rolled back to the previous operator version.
Had to remove the Postgres 15 deployment and restore my awx and Postgres Volumes from backup.

@craph
Copy link
Contributor Author

craph commented Mar 18, 2024

@mateuszdrab Thank you very much for the update.

One question, if the old pvc is still there, do we have to do restore from backup because normally all the previous data should be inside the longhorn pvc, right ?

@mateuszdrab
Copy link
Contributor

I'd just use the old PVC, in my case it was gone so I had no choice as after the rollback it was recreated with an empty database.

@craph
Copy link
Contributor Author

craph commented Mar 19, 2024

@mateuszdrab How do you proceed to have an AWX backup ? Do you use https://github.com/ansible/awx-operator/tree/devel/roles/backup ?

@craph
Copy link
Contributor Author

craph commented Mar 19, 2024

@fosterseth , @TheRealHaoLiu , do you have any updates about how to upgrade safely awx from 2.12.2 to future version of awx-operator without loosing any data (postgres data migration..., documentation) ?

@mateuszdrab
Copy link
Contributor

@mateuszdrab How do you proceed to have an AWX backup ? Do you use https://github.com/ansible/awx-operator/tree/devel/roles/backup ?

No, I just had the backup of the PVCs of both awx and Postgres. I rolled back and I restored those

@rooftopcellist
Copy link
Member

Could you give this PR a try and see if it solves your issue?

#1799

@Dafuznmehed
Copy link

I was completely stuck with this error until I used quay.io/fosterseth/awx-operator:postgres_init & the init container commands I tried the postgres_security_context_settings first but that alone didn't do it. Haven't tried without the security context settings/postgres_data_path and just the init container.

I won't be surprised if I have some superfluous settings. I had to add the crd edit and init postgres commands after using kustomize as the github ref for 2.14 didn't include them. Just figured I'd post what worked for me and I'm appreciative of everyone's work and comments that helped get this back and running for my deployments.

CustomResourceDefinition

          init_postgres_extra_commands:
            description: Extra commands for the init postgres container
            type: string

spec:
postgres_data_path: /var/lib/pgsql/data/pgdata
postgres_security_context_settings:
fsGroup: 26
fsGroupChangePolicy: Always
runAsGroup: 26
runAsNonRoot: true
runAsUser: 26
supplementalGroups:
- 26
init_postgres_extra_commands: |
mkdir /var/lib/pgsql/data/userdata
chown 26:26 /var/lib/pgsql/data/userdata
chmod 700 /var/lib/pgsql/data/userdata

@RaceFPV
Copy link

RaceFPV commented May 2, 2024

This issue is still ongoing for me as well. Even when trying to spin up a fresh awx instance im still stuck with the database crashlooping on error 'mkdir: cannot create directory '/var/lib/pgsql/data/userdata': permission denied'

I've tried the following "fixes" mentioned in other issues with no change

@jyanesnotariado
Copy link

jyanesnotariado commented May 14, 2024

Me too. Attempted to fresh install AWX 24.3.1 with AWX operator 2.16.1
$ kubectl logs awx-postgres-15-0
mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied

@fosterseth
Copy link
Member

@RaceFPV @jyanesnotariado I'm guessing you tried this out

#1805

any idea why that method doesn't work in your case?

@RaceFPV
Copy link

RaceFPV commented May 15, 2024

I'm thinking its because I didn't upgrade the CRDs first before running the upgrade process for the helm chart itself, and that's why that option wasn't working for me, still testing if that's the case.

@craph
Copy link
Contributor Author

craph commented May 22, 2024

Hi @rooftopcellist , @fosterseth ,

PR #1805 solved my issue but after the migration is done, the pod "migration" is still present with status completed.
Can I delete the pod "migration" ?
image

Thank you very much.
Best regards,

@jyanesnotariado
Copy link

@RaceFPV @jyanesnotariado I'm guessing you tried this out

#1805

any idea why that method doesn't work in your case?

I did! but even though it downloaded and ran, I kept running into the same error. What I did to fix it was chown 26:root on the mapped volume and it worked for me.

@serhanekicii
Copy link

serhanekicii commented Jun 1, 2024

image Issue still persists as of today, Chart version `2.17.0`. Upgrade from Postgres 13 to 15 might be the issue.

@craph
Copy link
Contributor Author

craph commented Jun 6, 2024

Hi @rooftopcellist , @fosterseth ,

PR #1805 solved my issue but after the migration is done, the pod "migration" is still present with status completed. Can I delete the pod "migration" ? image

Thank you very much. Best regards,

@edward2a
Copy link

edward2a commented Sep 5, 2024

Just hit this and checking the source for the postgres deployment I come to this line:

This is the container security context, which does not allow fsGroup setting:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#securitycontext-v1-core

Ideally, the awx CR should allow to update the postgres pod security context to allow setting fsGroup:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#podsecuritycontext-v1-core

While the implemented solution of the init container might sort out the issue, it can overlap with functionality in the storage provider (like the NFS CSI) where the provider can update permissions IF fsGroup is declared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants