Skip to content

Commit

Permalink
fix: handle non-existent queue jobRunAsUser on worker host (#176)
Browse files Browse the repository at this point in the history
Problem:
Worker crashed if trying to run as an non-valid user (or group)

Solution:
- Handle exceptions when trying to resolve the user's home directory.
- Fail the session action(s) with the error message amended to
  suggest the possibility of a non-existent user on the worker host.

Signed-off-by: Matt Authement <[email protected]>
  • Loading branch information
matta-aws authored Feb 26, 2024
1 parent 28d7364 commit 1049a48
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions src/deadline_worker_agent/scheduler/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
from openjd.sessions import ActionState, ActionStatus
from deadline.job_attachments.asset_sync import AssetSync


from ..aws.deadline import update_worker
from ..aws_credentials import QueueBoto3Session, AwsCredentialsRefresher
from ..boto import DeadlineClient, Session as BotoSession
Expand Down Expand Up @@ -714,13 +713,17 @@ def _create_new_sessions(
new_session_id,
os_user,
)
except (DeadlineRequestWorkerOfflineError, DeadlineRequestUnrecoverableError) as e:
except (
DeadlineRequestWorkerOfflineError,
DeadlineRequestUnrecoverableError,
RuntimeError,
) as e:
# Terminal error. We need to fail the Session.
message = (
"Unrecoverable error trying to obtain AWS Credentials for the Queue Role."
)
message = f"Unrecoverable error trying to obtain AWS Credentials for the Queue Role: {e}"
if str(e).startswith("Can't determine home directory"):
message += ". Possible non-valid username."
self._fail_all_actions(session_spec, message)
logger.warning("[%s] %s: %s", new_session_id, message, str(e))
logger.warning("[%s] %s", new_session_id, message)
# Force an immediate UpdateWorkerSchedule request
self._wakeup.set()
continue
Expand Down

0 comments on commit 1049a48

Please sign in to comment.