-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lock/query: make robust against paddles errors #1642
Conversation
Retry paddles requests, and for get_status() return an empty dict rather than None so callers behave. get_status() failing in particular has caused the dispatcher and jobs to fail several times over the past few weeks. With this change, we should be able to run multiple paddles workers again, since all the common callers will retry on error. Signed-off-by: Josh Durgin <[email protected]>
@susebot run deploy |
Commit 179edf2 is OK. |
This is failed in fact. Need to retest. |
|
|
I guess there is no builds anymore for centos for octopus? |
@susebot run deploy |
Commit 179edf2 is NOT OK. |
there should be octopus centos builds, if you're seeing something missing @djgalloway may be able to help |
@susebot run deploy |
Commit 179edf2 is NOT OK. |
|
One of the problem I see that http://git.ceph.com:8080/ceph.git/history/ returns json with 'err' instead of 'error'. |
what is that service and since when it got updated so teuthology cannot handle responses correctly |
Scheduling is failed because arm build is failed, and build_complete returns False:
{
"status": "failed",
"sha1": "e647a64c1e8147b04e84575a0fc53dee65cecab2",
"distro_arch": "arm64",
"started": "2021-04-20 19:06:00.620116",
"distro_codename": null,
"completed": null,
"extra": {
"node_name": "172.21.4.63+confusa01",
"version": "",
"build_user": "",
"root_build_cause": "SCMTRIGGER",
"job_name": "ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic"
},
"modified": "2021-04-20 20:29:55.141954",
"distro_version": "8",
"project": "ceph",
"url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457/",
"log_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457//consoleFull",
"flavor": "default",
"ref": "octopus",
"distro": "centos"
}
{
"status": "completed",
"sha1": "e647a64c1e8147b04e84575a0fc53dee65cecab2",
"distro_arch": "x86_64",
"started": "2021-04-20 17:43:16.257781",
"distro_codename": null,
"completed": "2021-04-20 18:34:27.490982",
"extra": {
"node_name": "172.21.2.4+braggi04",
"version": "15.2.11-166-ge647a64c",
"build_user": "",
"root_build_cause": "SCMTRIGGER",
"job_name": "ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic"
},
"modified": "2021-04-20 18:34:27.492338",
"distro_version": "8",
"project": "ceph",
"url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457/",
"log_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457//consoleFull",
"flavor": "default",
"ref": "octopus",
"distro": "centos"
} |
I have tried to address the issue with #1643 , but it still not able to schedule a test run because octopus arm build is failed for centos/8. |
@susebot run deploy |
Commit 179edf2 is OK. |
Retry paddles requests, and for get_status() return an empty dict
rather than None so callers behave.
get_status() failing in particular has caused the dispatcher and jobs
to fail several times over the past few weeks. With this change, we
should be able to run multiple paddles workers again, since all the
common callers will retry on error.
Signed-off-by: Josh Durgin [email protected]