Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding back the dataplane services #171

Merged
merged 1 commit into from
Oct 5, 2023

Conversation

fao89
Copy link
Contributor

@fao89 fao89 commented Sep 27, 2023

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/797f0f5b90124f61b35bcbde4d75988b

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 48m 42s

@fao89
Copy link
Contributor Author

fao89 commented Sep 28, 2023

TASK [osp.edpm.edpm_iscsid : Download needed container] ************************
Thursday 28 September 2023  01:16:46 +0000 (0:00:00.194)       0:00:43.069 **** 
�[1;30mFAILED - RETRYING: [standalone]: Download needed container (5 retries left).�[0m
�[1;30mFAILED - RETRYING: [standalone]: Download needed container (4 retries left).�[0m
�[1;30mFAILED - RETRYING: [standalone]: Download needed container (3 retries left).�[0m
�[1;30mFAILED - RETRYING: [standalone]: Download needed container (2 retries left).�[0m
�[1;30mFAILED - RETRYING: [standalone]: Download needed container (1 retries left).�[0m
�[0;31mfatal: [standalone]: FAILED! => {"attempts": 5, "changed": false, "msg": "Failed to pull image quay.io/podified-antelope-centos9/openstack-iscsid:current-podified"}�[0m

https://logserver.rdoproject.org/71/171/1dc75617ef47d1f77c16a719b51120d928601f27/github-check/data-plane-adoption-github-rdo-centos-9-crc-single-node/0e8168e/controller/pod/dataplane-deployment-download-cache-openstack-m458k-logs.txt

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/b9fe9105103d499c907e247500d2ece0

data-plane-adoption-github-rdo-centos-9-crc-single-node RETRY_LIMIT in 11m 07s

@fao89
Copy link
Contributor Author

fao89 commented Sep 28, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/5a9f5180351449f79994cec2069958f2

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 46m 37s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/6193cc4b8bd34dd885ed2d7e16e3e93b

data-plane-adoption-github-rdo-centos-9-crc-single-node RETRY_LIMIT in 12m 02s

@fao89
Copy link
Contributor Author

fao89 commented Sep 29, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/9fb6969edc234ad7b9fd69498733c0f6

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 54m 40s

@fao89
Copy link
Contributor Author

fao89 commented Sep 29, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/c1720a3d9a3743febbe8d8cb53f02f64

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 47m 06s

@fao89 fao89 requested a review from stuggi September 29, 2023 18:20
@fao89
Copy link
Contributor Author

fao89 commented Sep 29, 2023

@stuggi could you please review if I got the routes right?

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/075e10fecdea46b481b2fd6a3f8bdb3b

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 52m 25s

@fao89
Copy link
Contributor Author

fao89 commented Sep 29, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/6686dc79b7b54612abd97307216768ac

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 01m 25s

@fao89 fao89 force-pushed the dpsvc branch 2 times, most recently from d56f96e to c064b38 Compare September 29, 2023 23:30
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/41442aac784f461ab1cac8abc413d3da

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 54m 40s

@fao89
Copy link
Contributor Author

fao89 commented Sep 30, 2023

it is getting stuck at:

TASK [osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list] ***
Saturday 30 September 2023  01:27:41 +0000 (0:00:02.531)       0:02:47.775 **** 

https://logserver.rdoproject.org/71/171/c064b38f5bf8eed510c691cf80d8b30fb53d5345/github-check/data-plane-adoption-github-rdo-centos-9-crc-single-node/2a32835/controller/pod/dataplane-deployment-run-os-openstack-p5lt4-logs.txt

@fultonj could you please take a look?

@fao89 fao89 requested a review from fultonj September 30, 2023 01:36
fultonj added a commit to fultonj/edpm-ansible that referenced this pull request Oct 2, 2023
Patch 0e6058c resulted in a
lot of output in the Ansible logs which made a CI job for the
following PR get stuck. The output is unnecessary so do not
log it in an attempt to relieve IO pressure.

openstack-k8s-operators/data-plane-adoption#171

Signed-off-by: John Fulton <[email protected]>
@fultonj
Copy link
Contributor

fultonj commented Oct 2, 2023

it is getting stuck at:

TASK [osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list] ***
Saturday 30 September 2023  01:27:41 +0000 (0:00:02.531)       0:02:47.775 **** 

https://logserver.rdoproject.org/71/171/c064b38f5bf8eed510c691cf80d8b30fb53d5345/github-check/data-plane-adoption-github-rdo-centos-9-crc-single-node/2a32835/controller/pod/dataplane-deployment-run-os-openstack-p5lt4-logs.txt

@fultonj could you please take a look?

Please try again with this:

openstack-k8s-operators/edpm-ansible#385

If it doesn't help, then let me know.

@softwarefactory-project-zuul
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/edpm-ansible#385 is needed.

@fao89
Copy link
Contributor Author

fao89 commented Oct 2, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/e3a277b77c6f4f10bf9f23e5a95ed35a

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 04m 58s

@fao89
Copy link
Contributor Author

fao89 commented Oct 3, 2023

I tested locally,
Exclude Ceph containers from podman container is taking a long time, which is timing out the CI

osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list - 695.17s
osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list - 414.45s
osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list - 144.40s

full logs: https://paste.centos.org/view/e45553fe

fultonj added a commit to fultonj/edpm-ansible that referenced this pull request Oct 3, 2023
This reverts commit 0e6058c.

This breaks HCI but HCI is not yet in CI so a follow up patch
will fix what this patch breaks for people deploying HCI.

The commit being reverted introduced a task that we still
the functionality of, however, its implementation was not
efficient and takes too long to run in the CI as documented:

openstack-k8s-operators/data-plane-adoption#171

We still need to exclude the ceph containers but will need
to do it with less computational resources (e.g. maybe pure
jinja will be faster or we might need a custom filter).

Conflicts:
	roles/edpm_container_manage/tasks/delete_orphan.yml
@fultonj
Copy link
Contributor

fultonj commented Oct 3, 2023

I tested locally, Exclude Ceph containers from podman container is taking a long time, which is timing out the CI

osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list - 695.17s
osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list - 414.45s
osp.edpm.edpm_container_manage : Exclude Ceph containers from podman container list - 144.40s

full logs: https://paste.centos.org/view/e45553fe

This should unblock you:

openstack-k8s-operators/edpm-ansible#387

It breaks HCI though so I'll need to reintroduce excluding the ceph containers but more efficiently.

@fao89
Copy link
Contributor Author

fao89 commented Oct 3, 2023

I've started this: openstack-k8s-operators/edpm-ansible#386

@fultonj
Copy link
Contributor

fultonj commented Oct 3, 2023

I've started this: openstack-k8s-operators/edpm-ansible#386

I've approved the above since it passed my tests in the HCI environment.

  1. reproduce bug by doing revert
  2. test your patch and see it not hit the same bug
  3. log the filtered_containers generated by your patch and read that they have openstack containers but not ceph containers (as they should).

So hopefully you resolved your issue.

@fao89
Copy link
Contributor Author

fao89 commented Oct 3, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f4a575364d714113b8838a139d4c85cb

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 1h 09m 41s

@cescgina
Copy link
Contributor

cescgina commented Oct 4, 2023

recheck looks like a crc hiccup while creating the openstack namespace https://logserver.rdoproject.org/71/171/1f33c7ac59c8e996f8cf8b1f9ed6ae22531f586d/github-check/data-plane-adoption-github-rdo-centos-9-crc-single-node/6caedf3/controller/data-plane-adoption-tests-repo/data-plane-adoption/tests/logs/test_minimal_out_2023-10-03T20:34:58EDT.log :

TASK [prelude_local : set up and use openstack namespace] **********************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -euxo pipefail\n\n\ncd /home/zuul/src/github.com/openstack-k8s-operators/install_yamls/\nmake namespace\n", "delta": "0:00:07.218662", "end": "2023-10-03 20:35:11.270343", "msg": "non-zero return code", "rc": 2, "start": "2023-10-03 20:35:04.051681", "stderr": "+ cd /home/zuul/src/github.com/openstack-k8s-operators/install_yamls/\n+ make namespace\n+ '[' -z /home/zuul/src/github.com/openstack-k8s-operators/install_yamls/out ']'\n+ '[' -z openstack ']'\n+ OUT_DIR=/home/zuul/src/github.com/openstack-k8s-operators/install_yamls/out/openstack\n+ '[' '!' -d /home/zuul/src/github.com/openstack-k8s-operators/install_yamls/out/openstack ']'\n+ mkdir -p /home/zuul/src/github.com/openstack-k8s-operators/install_yamls/out/openstack\n+ cat\nerror: unable to default to a user name: the server is currently unable to handle the request (get users.user.openshift.io ~)

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/83df0d3837a848ada90689b64a494161

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 13m 38s

@fao89
Copy link
Contributor Author

fao89 commented Oct 4, 2023

Build failed (check pipeline). Post recheck (without leading slash) to rerun all jobs. Make sure the failure cause has been resolved before you rerun jobs.

https://review.rdoproject.org/zuul/buildset/83df0d3837a848ada90689b64a494161

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 13m 38s

ansible-runner image didn't get updated yet 😕

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/7d073d7efe8c4a478fa4fcca3f6833c5

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 07m 00s

@fao89
Copy link
Contributor Author

fao89 commented Oct 4, 2023

@cescgina does CI cache quay.io/openstack-k8s-operators/openstack-ansibleee-runner ?
my PR was merged yesterday, but the image does seem to be updated

@cescgina
Copy link
Contributor

cescgina commented Oct 4, 2023

@cescgina does CI cache quay.io/openstack-k8s-operators/openstack-ansibleee-runner ? my PR was merged yesterday, but the image does seem to be updated

I don't think so, maybe the operator image needs to be updated to pull this new changes? At the moment in CI we install the operators by doing make openstack from install_yamls and as far as I can see there is nothing that would prevent it from pulling the latest image

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/41413d65e4974c5085e33d897e9ce125

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 06m 39s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/996fb9c8c56140b0976a69dca5f75069

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 10m 11s

@fao89
Copy link
Contributor Author

fao89 commented Oct 4, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/1d7d585c2149411baddf8c729d8067a5

data-plane-adoption-github-rdo-centos-9-crc-single-node FAILURE in 2h 23m 38s

Signed-off-by: Fabricio Aguiar <[email protected]>
@jistr
Copy link
Contributor

jistr commented Oct 5, 2023

🟢🟢🟢
🚀

@jistr jistr merged commit 4f3a019 into openstack-k8s-operators:main Oct 5, 2023
1 check passed
{{ shell_header }}
{{ oc_header }}
{{ oc_login_command }}
oc get csv openstack-ansibleee-operator.v0.0.1 -o yaml -n openstack-operators | sed "s/RELATED_IMAGE_ANSIBLEEE_IMAGE_URL_DEFAULT/AEE_IMAGE/" | oc apply -f -
Copy link
Contributor Author

@fao89 fao89 Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack for using the latest ansible-runner image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants