-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable retry on network failures #14
Comments
Migrated from mlcommons/mlperf-automations_archived#11 (comment) git clone failures
|
Migrated from mlcommons/mlperf-automations_archived#11 (comment) The below failure is seen many times in our github actions. Trying the fix
|
Migrated from mlcommons/mlperf-automations_archived#11 (comment) @anandhu-eng I think we should enable it by default and let users an ENV variable to turn it off for any reason. But first we need to list out the places where we need this. Below are some of them. We should probably try it on one, and if it works as expected move to the remaining places.
|
Migrated from mlcommons/mlperf-automations_archived#11 (comment) Hi @arjunsuresh , this would be useful. Should this be kept on by default or should it be controlled through any env variable? I'm wondering if there is a case where user wants to turn it off |
Migrated from mlcommons/mlperf-automations_archived#11
Originally created by @arjunsuresh on Fri, 01 Nov 2024 10:44:17 GMT
We often see CM script runs failing due to netwok failures like this. It'll be good to add a retry mechanism for such failures to improve the user experience and reduce the failures of automatic runs.
The text was updated successfully, but these errors were encountered: