Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add test that verifies job fails if attachment is accidentally deleted from bucket #436

Merged
merged 5 commits into from
Oct 10, 2024

Conversation

YutongLi291
Copy link
Contributor

What was the problem/requirement? (What/Why)

If in the case that an user accidentally deletes a job attachment from the job attachments bucket (or clears the job attachments bucket completely), if a job is running that depends on any of the deleted job attachments, the job should fail at the syncInputJobAttachments stage since the job attachment is not able to be synced from the bucket to the worker.

We should test that this is indeed what happens, and that the worker is still able to process other jobs after this failure.

What was the solution? (How)

Add a test that

  1. Submits a job with input job attachments in SUSPENDED status, this will ensure that the input job attachments are uploaded to the Job Attachments bucket but not synced to the worker yet, as the job has yet to start.
  2. Using the manifest, find any of the hashed input job attachment files in the job attachments bucket, and delete it
  3. Set the job status to READY, ensuring that the worker will start working on the job
  4. Wait for the job to finish, in which case it should be in FAILED status since the syncInputJobAttachments step failed
  5. Check that syncInputJobAttachments session action failed using ListSessionActions
  6. Submit another test job that sleeps, ensuring it passes without fail to verify that the worker is still running as expected.

What is the impact of this change?

Better testing for worker agent code and handling failures

How was this change tested?

# Linux
source .e2e_linux_infra.sh
hatch run e2e-test

# Windows
source .e2e_windows_infra.sh
hatch run e2e-test

Was this change documented?

No

Is this a breaking change?

No

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@YutongLi291 YutongLi291 requested a review from a team as a code owner October 9, 2024 20:23
Copy link

sonarqubecloud bot commented Oct 9, 2024

@YutongLi291 YutongLi291 enabled auto-merge (squash) October 9, 2024 21:10
Comment on lines +2050 to +2052
"parameterValues": [
{"name": "deadline:targetTaskRunStatus", "value": "SUSPENDED"},
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicated above in job_parameters variable?


# Delete one of the input files from the Job Attachments bucket after confirming that it exists

s3_client.get_object(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeadObject is a lighter API call if we just want to check for existence, but if the role doesn't have permission to call it already, don't worry about it.

sessionId=session["sessionId"],
).get("sessionActions")

logging.info(f"Session actions: {session_actions}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info(f"Session actions: {session_actions}")
LOG.info(f"Session actions: {session_actions}")

Should use same logger as above for consistency. I believe this method goes to root logger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in next PR

@YutongLi291 YutongLi291 merged commit 22934f1 into aws-deadline:mainline Oct 10, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants