Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Nova server diagnostic tests #338

Merged
merged 1 commit into from
Mar 19, 2024

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented Mar 8, 2024

No description provided.

@bogdando
Copy link
Contributor Author

bogdando commented Mar 8, 2024

This fixes the following mishap

"+ oc exec -t openstackclient -- openstack server list",
"+ grep -qF '| test | ACTIVE |'",
"+ oc exec -t openstackclient -- openstack server stop test",
"+ oc exec -t openstackclient -- openstack server list",
"+ grep -qF '| test | SHUTOFF |'",
"+ oc exec -t openstackclient -- openstack server --os-compute-api-version 2.48 show --diagnostics test",
"+ grep 'it is in power state shutdown'",
"No ServerDiagnostics found for None: Server Error for url: https://nova-public-openstack.apps-crc.testing/v2.1/servers/0c9c00a0-95ac-4b07-a546-46f5aa59b251/diagnostics, The server didn't respond in time.: 504 Gateway Time-out",
"command terminated with exit code 1",
"+ echo PASS"], "stdout": "PASS",
"stdout_lines": ["PASS"]}

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f5c138f2744b4b79b42ff16da58aa3d3

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 2h 54m 55s
adoption-docs-preview RETRY_LIMIT in 30m 56s

@bogdando bogdando force-pushed the fix_diag_nova_tests branch from a1942f7 to 68ba66d Compare March 11, 2024 12:51
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8e6da5703b5a4a23a4738ebe31a1c088

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 2h 10m 13s
✔️ adoption-docs-preview SUCCESS in 2m 00s

@bogdando
Copy link
Contributor Author

bogdando commented Mar 12, 2024

this fails as the VM workload appears to be stopped (see "Cannot 'stop' instance d0c0def7-6f63-4d1c-b46b-744f1c077e84 while it is in vm_state stopped) initially after adoption (a regression? it used to work with local storage)

"stderr": "+ alias 'openstack=oc exec -t openstackclient -- openstack'
+ FIP=192.168.122.20
+ grep -qF '| test | ACTIVE |'
+ oc exec -t openstackclient -- openstack server list
+ echo FAIL
+ oc exec -t openstackclient -- openstack server stop test
Cannot 'stop' instance d0c0def7-6f63-4d1c-b46b-744f1c077e84 while it is in vm_state stopped (HTTP 409) (Request-ID: req-eda3aa87-15e4-4705-8c74-5f2ad9e649e6)
command terminated with exit code 1
+ echo FAIL
+ oc exec -t openstackclient -- openstack server list
+ grep -qF '| test | SHUTOFF |'
+ oc exec -t openstackclient -- openstack server --os-compute-api-version 2.48 show --diagnostics test
+ grep -q 'it is in power state shutdown'
+ echo FAIL",
"stderr_lines": ["+ alias 'openstack=oc exec -t openstackclient -- openstack'",
"+ FIP=192.168.122.20",
"+ grep -qF '| test | ACTIVE |'",
"+ oc exec -t openstackclient -- openstack server list",
"+ echo FAIL",
"+ oc exec -t openstackclient -- openstack server stop test",
"Cannot 'stop' instance d0c0def7-6f63-4d1c-b46b-744f1c077e84 while it is in vm_state stopped (HTTP 409) (Request-ID: req-eda3aa87-15e4-4705-8c74-5f2ad9e649e6)",
"command terminated with exit code 1",
"+ echo FAIL",
"+ oc exec -t openstackclient -- openstack server list",
"+ grep -qF '| test | SHUTOFF |'",
"+ oc exec -t openstackclient -- openstack server --os-compute-api-version 2.48 show --diagnostics test",
"+ grep -q 'it is in power state shutdown'",
"+ echo FAIL"], "stdout": "FAIL
FAIL
FAIL",
"stdout_lines": ["FAIL",
"FAIL",
"FAIL"]}

@bogdando
Copy link
Contributor Author

recheck autohold

@bogdando
Copy link
Contributor Author

the root cause
standalone.localdomain virtqemud[84623]: Cannot create daemon common directory '/run/libvirt/common': Not a directory

@bogdando
Copy link
Contributor Author

recheck dep

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/e3d0ad236c8d4643869f20f15f0c81eb

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 2h 20m 03s
✔️ adoption-docs-preview SUCCESS in 1m 42s

@bogdando
Copy link
Contributor Author

recheck dep

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/af6f553c35674784bab073261fca4796

data-plane-adoption-osp-17-to-extracted-crc FAILURE in 2h 46m 25s
✔️ adoption-docs-preview SUCCESS in 2m 15s

@bogdando

This comment was marked as outdated.

@bogdando
Copy link
Contributor Author

recheck

@bogdando
Copy link
Contributor Author

bogdando commented Mar 14, 2024

I believe the selinux relabling fix doesn't help as we got a pack of another fundamental problems related to edpm role that reboots standalone VMs during adoption (and this leaves VM shutoff, so power diag steps fail as report differently - cannot power off node in suut off state):

  • we need to avoid rebooting nodes at all during adoption
  • we need to instantly recover VMs, if reboot is inevitable
  • we need to fix the normal VM power state recovery process, which doesn't work neither

@gibizer @SeanMooney @jistr

@bogdando
Copy link
Contributor Author

recheck dep

@bogdando bogdando force-pushed the fix_diag_nova_tests branch 2 times, most recently from 6e33112 to eebfd1a Compare March 14, 2024 15:57
@bogdando
Copy link
Contributor Author

/hold testing it

@bogdando
Copy link
Contributor Author

tested, should be OK now!

@bogdando bogdando force-pushed the fix_diag_nova_tests branch from ecc54f4 to fa9a111 Compare March 14, 2024 16:40
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/208749a0ef344ccebf71c5a69447970a

✔️ data-plane-adoption-osp-17-to-extracted-crc SUCCESS in 2h 10m 54s
adoption-docs-preview FAILURE in 2m 06s

@bogdando
Copy link
Contributor Author

bogdando commented Mar 15, 2024

UPDATE: I was looking to old exec logs, the new one looks OKish https://logserver.rdoproject.org/38/338/fa9a111e86ebe89a108d7d14004bf251b6545cc0/github-check/data-plane-adoption-osp-17-to-extracted-crc/b245966/controller/data-plane-adoption-tests-repo/data-plane-adoption/tests/logs/test_with_ceph_out_2024-03-14T13:50:01EDT.log

The change under test in this PR wasn't included in CI build, there was old code running:

TASK [dataplane_adoption : verify if Nova services can stop the existing test VM instance] ***
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (10 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (9 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (8 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (7 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (6 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (5 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (4 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (3 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (2 retries left).
FAILED - RETRYING: [localhost]: verify if Nova services can stop the existing test VM instance (1 retries left).
fatal: [localhost]: FAILED! => {"attempts": 10, "changed": true, "cmd": "set -euxo pipefail

alias openstack=\"oc exec -t openstackclient -- openstack\"
FIP=192.168.122.20

${BASH_ALIASES[openstack]} server list | grep -qF '| test | ACTIVE |' || echo FAIL
${BASH_ALIASES[openstack]} server stop test || echo FAIL
${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |' || echo FAIL
${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test | grep -q \"it is in power state shutdown\" || echo FAIL
",

Neither depends-on has worked , see the reboot os service job logs - it should've been not included

@bogdando
Copy link
Contributor Author

pj-rehearse adoption-docs-preview

@bogdando
Copy link
Contributor Author

/retest adoption-docs-preview

@jistr
Copy link
Contributor

jistr commented Mar 18, 2024

I'll press the "update with rebase" button here to get the CI going after merging the deps.

@jistr jistr force-pushed the fix_diag_nova_tests branch from fa9a111 to 2963743 Compare March 18, 2024 10:04
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/ef8ca67714b344b884c13699e579cc84

data-plane-adoption-osp-17-to-extracted-crc RETRY_LIMIT in 1h 07m 53s
adoption-docs-preview FAILURE in 2m 06s

@bogdando bogdando force-pushed the fix_diag_nova_tests branch from 2963743 to 3a60f8e Compare March 18, 2024 14:17
Signed-off-by: Bohdan Dobrelia <[email protected]>
@jistr jistr merged commit 247eaaf into openstack-k8s-operators:main Mar 19, 2024
3 checks passed
@bogdando bogdando deleted the fix_diag_nova_tests branch March 19, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants