-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
corefile uploader: Updates per review comments offline #3915
corefile uploader: Updates per review comments offline #3915
Conversation
1) core_uploader service waits for syslog.service 2) core_uploader service enabled for restart on failure 3) Use mtime instead of file size + ample time to be robust.
I have not tested the code yet. Can I get your comments please, before I do my last round testing. |
1) If rc file is missing or required data missing, it periodically logs error in forever loop. 2) If upload fails, retry every hour with a error log, forever.
Changes:
|
daemonname = fname.split(".")[0] | ||
i = 0 | ||
fail_msg = "" | ||
|
||
while i <= MAX_RETRIES: | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do retry in the big loop. if it fails, retry in the big loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess no. The bigger loop is either scan/wait-for-core. In either case, the next one would suffer the same fate. So I rather spew log & retry, until either I succeed or service is restarted, by someone alerted by these log messages.
|
||
[Service] | ||
Type=simple | ||
ExecStart=/usr/bin/core_uploader.py | ||
StandardOutput=null | ||
Restart=on-failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need back off here? if core_uploader is constantly restarting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need. The only failure that crashes the service, is some fatal things like, sonic_version.yaml does not have expected attributes or some fatal system related failure (which should crash many more ...), and running out of disk space.
The one thing, I would need to take care of "running out of disk space". This I can take care of inside the script.
vs & vsimage build failure are unrelated to this PR. console o/p:
|
* Updates per review comments 1) core_uploader service waits for syslog.service 2) core_uploader service enabled for restart on failure 3) Use mtime instead of file size + ample time to be robust. * Avoid reloading already uploaded file, by marking the names with a prefix. * Updated failing path. 1) If rc file is missing or required data missing, it periodically logs error in forever loop. 2) If upload fails, retry every hour with a error log, forever. * Fix few bugs * The binary update_json.py will come from sonic-utilities.
* Updates per review comments 1) core_uploader service waits for syslog.service 2) core_uploader service enabled for restart on failure 3) Use mtime instead of file size + ample time to be robust. * Avoid reloading already uploaded file, by marking the names with a prefix. * Updated failing path. 1) If rc file is missing or required data missing, it periodically logs error in forever loop. 2) If upload fails, retry every hour with a error log, forever. * Fix few bugs * The binary update_json.py will come from sonic-utilities.
- What I did
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)