Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awx-ee k8s container log showing warnings "Could not close connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection #1203

Open
3 tasks done
shrutebattlestargalactica opened this issue Jan 26, 2023 · 2 comments

Comments

@shrutebattlestargalactica

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

Hello,

We have a 3 node eks cluster(s) running on EKS v1.24 and an external postgresql RDS db. We recently updated the AMIs and k8s version from 1.23->1.24 couple weeks ago, and last week upgraded the awx-operator version from 1.1.0 -> 1.1.3. I discovered that inventory syncing for all of my sources source from project and EC2 AWS source types were in error from alerts i receive. I logged into the console and could see that the jobs would finish, but throw an error at the very end (not failed, cancelled or successful). I discovered from some of the conversation here: https://groups.google.com/g/awx-project/c/AuL9wmPMvA4 that this could be related to the recent version updates or the awx-ee quay image (https://quay.io/repository/ansible/awx-ee).

I am using different image back from this other related problem: ansible/awx-ee#134 (comment) which has temporarily solved the inventory sync errors (they are now successful). However in the awx-ee logs I am still seeing this error consistently about receptor: ansible/awx-ee#134 (comment)

thanks for your help

AWX Operator version

1.1.4

AWX version

12.11.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

AWS EKS 1.24

Modifications

yes

Steps to reproduce

Hello,

We have a 3 node eks cluster(s) running on EKS v1.24 and an external postgresql RDS db. We recently updated the AMIs and k8s version from 1.23->1.24 couple weeks ago, and last week upgraded the awx-operator version from 1.1.0 -> 1.1.3. I discovered that inventory syncing for all of my sources source from project and EC2 AWS source types were in error from alerts i receive. I logged into the console and could see that the jobs would finish, but throw an error at the very end (not failed, cancelled or successful). I discovered from some of the conversation here: https://groups.google.com/g/awx-project/c/AuL9wmPMvA4 that this could be related to the recent version updates or the awx-ee quay image (https://quay.io/repository/ansible/awx-ee).

I am using different image back from this other related problem: ansible/awx-ee#134 (comment) which has temporarily solved the inventory sync errors (they are now successful). However in the awx-ee logs I am still seeing this error consistently about receptor: ansible/awx-ee#134 (comment)

Expected results

quay.io/repository/ansible/awx-ee stability so the controller can use the correct container image

Actual results

WARNING 2023/01/26 15:33:17 Could not read in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection

WARNING 2023/01/26 15:33:17 Could not close connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection

other traceback logs from email notifications attached
errors.txt

Additional information

awx-ee container logs
awx-prod-6dcd4964bc-hd6tb.log

Operator Logs

awx-operator-controller-manager-65b875f4bf-nhbrf.log

@fosterseth
Copy link
Member

fosterseth commented Feb 1, 2023

those receptor warnings are expected and probably not related to your problem

I'm wondering if you are running into this ansible/awx#13469 (comment)

basically -- k8s kube api version and kubelet version have a drift on EKS

add this to your AWX resource yaml file and try deploying again

  ee_extra_env: |
    - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
      value: disabled

from ansible/receptor#683

This will disable the reconnect support feature, which is probably causing your jobs to fail

Note -- you may experience issues if your jobs run > 4 hours (they will fail)

@emoshaya
Copy link

those receptor warnings are expected and probably not related to your problem

I'm wondering if you are running into this ansible/awx#13469 (comment)

basically -- k8s kube api version and kubelet version have a drift on EKS

add this to your AWX resource yaml file and try deploying again

  ee_extra_env: |
    - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
      value: disabled

from ansible/receptor#683

This will disable the reconnect support feature, which is probably causing your jobs to fail

Note -- you may experience issues if your jobs run > 4 hours (they will fail)

How do I set RECEPTOR_KUBE_SUPPORT_RECONNECT to disabled for a custom pod spec?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants