Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolated runs choke on rsync's .~tmp~ directories #6675

Closed
domq opened this issue Apr 13, 2020 · 4 comments
Closed

Isolated runs choke on rsync's .~tmp~ directories #6675

domq opened this issue Apr 13, 2020 · 4 comments

Comments

@domq
Copy link

domq commented Apr 13, 2020

ISSUE TYPE
  • Bug Report
SUMMARY

In some circumstances, running an Ansible playbook as an isolated job causes a spurious failure (despite all individual Ansible tasks succeeding), because a stray .~tmp~ directory pops up in an unexpected place.

ENVIRONMENT
  • AWX version: AWX 10.0.0
  • AWX install method: openshift
  • Ansible version: 2.9.5
STEPS TO REPRODUCE
  • Set AWX_RESOURCE_PROFILING_ENABLED = True
  • Keep running isolated jobs until the problem shows up
EXPECTED RESULTS

All such jobs should succeed or fail depending solely on the underlying Ansible tasks' outcomes

ACTUAL RESULTS

Here is an example stack trace (also found in #6280):

Traceback (most recent call last): 
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1468, in run
    ident=str(self.instance.pk)) 
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/isolated/manager.py", line 422, in run   status, rc = self.check() 
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/isolated/manager.py", line 241, in check
     self.consume_events() 
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/isolated/manager.py", line 293, in consume_events
    open(os.path.join(events_path, event), 'r')
IsADirectoryError: [Errno 21] Is a directory: '/tmp/awx_754_hs8ylxru/artifacts/754/job_events/.~tmp~' 
@domq
Copy link
Author

domq commented Apr 13, 2020

Analysis:

Suggested fix: add .~tmp~ to the rsync_exclude list

@domq
Copy link
Author

domq commented Apr 13, 2020

Upon inspecting this further, I believe I have the root cause wrong; now I am suspecting an rsync deadlock — According to the logs of my awx-task container on Kubernetes, the error pops up exactly 10 minutes after rsync starts

I shall close this issue and reopen a new one if I manage to get to the bottom of this.

@domq domq closed this as completed Apr 13, 2020
@ryanpetrello
Copy link
Contributor

Thanks @domq - the traceback you reported does look legitimate, though, so I merged the PR I opened.

@domq
Copy link
Author

domq commented Apr 14, 2020

I wrote up the proper root cause analysis of my issue in #6692.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants