You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What you are pointing out is one of the (several) conditions through which we can create jobs that stay in "Waiting" status for potentially a long time, maybe "forever". At least 2 connected unavoidable cases:
At time x the job goes to "Waiting", after getting its replicas. At time x+y (before any matching attempt) the site's RunningLimit is set to 0, and further attempts of matching will fail.
Jobs without input data would not check their replicas. A user/bot can ask to run at a specific site for which its RunningLimit is 0, with or without implementing your proposal.
The list can go on, but story short there is no way to fully avoid creating jobs that will Wait for "long" time.
I also do not like much the getReplicasForJobs checks.
One other possibility is reset jobs that have been in "Waiting" for long time (because conditions of e.g. the allowed replicas might have, in the meantime, changed -- that is why the JobWrapper calls again getReplicasForJobs). Would that be a bad idea? Did we by chance think at that in the past already? -- cc @atsareg
What you are pointing out is one of the (several) conditions through which we can create jobs that stay in "Waiting" status for potentially a long time, maybe "forever". At least 2 connected unavoidable cases:
RunningLimit
is set to 0, and further attempts of matching will fail.RunningLimit
is 0, with or without implementing your proposal.The list can go on, but story short there is no way to fully avoid creating jobs that will Wait for "long" time.
I also do not like much the
getReplicasForJobs
checks.One other possibility is reset jobs that have been in "Waiting" for long time (because conditions of e.g. the allowed replicas might have, in the meantime, changed -- that is why the
JobWrapper
calls againgetReplicasForJobs
). Would that be a bad idea? Did we by chance think at that in the past already? -- cc @atsaregOriginally posted by @fstagni in #7735 (comment)
The text was updated successfully, but these errors were encountered: