Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deploy): separate -storage and -db pods #923

Open
wants to merge 19 commits into
base: split-deployment
Choose a base branch
from

Conversation

mwangggg
Copy link
Member

@mwangggg mwangggg commented Jul 23, 2024

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Fixes: #814
Depends on #968

@andrewazores
Copy link
Member

/build_test

@andrewazores
Copy link
Member

Copy link

github-actions bot commented Nov 1, 2024

/build_test : At least one test failed ❌.
View Actions Run.

@andrewazores
Copy link
Member

@ebaron every scorecard test in the run seems to have state: pass, but the log ends like this:

		The heap is around 0.227 % full. There is likely no big benefit from enabling string deduplication.] name:String Deduplication score:12.901007290315437 topic:heap] SystemGc:map[evaluation:map[explanation:<nil> solution:<nil> suggestions:[] summary:No garbage collections were caused by System.gc().] name:GCs Caused by System.gc() score:0 topic:garbage_collection] TlabAllocationRatio:map[evaluation:map[explanation:<nil> solution:Allocating objects outside of Thread Local Allocation Buffers (TLABs) is more expensive than allocating inside TLABs. This may be acceptable if the individual allocations are intended to be larger than a reasonable TLAB. It may be possible to avoid this by decreasing the size of the individual allocations. There are some TLAB related JVM flags that you can experiment with, but it is usually better to let the JVM manage TLAB sizes automatically. suggestions:[] summary:The program allocated 48.4 % of the memory outside of TLABs.] name:TLAB Allocation Ratio score:46.712468530433846 topic
namespace "cryostat-operator-scorecard" deleted
make: *** [Makefile:199: test-scorecard] Error 1
Error: Process completed with exit code 2.

(that first piece is an automated analysis rule result snippet)

any idea what that's about? Is it a timeout?

@andrewazores
Copy link
Member

/build_test

Copy link

github-actions bot commented Nov 4, 2024

/build_test : At least one test failed ❌.
View Actions Run.

@ebaron
Copy link
Member

ebaron commented Nov 4, 2024

Looks like this one failed due to timeout in the most recent run:

2024-11-04T15:11:20.0968904Z Image:      ghcr.io/cryostatio/cryostat-operator-scorecard:pr-923-0ec9a84ffded9865671402790379c2884097f456
2024-11-04T15:11:20.0969353Z Entrypoint: [cryostat-scorecard-tests cryostat-multi-namespace]
2024-11-04T15:11:20.0969479Z Labels:
2024-11-04T15:11:20.0969613Z 	"suite":"cryostat"
2024-11-04T15:11:20.0969841Z 	"test":"cryostat-multi-namespace"
2024-11-04T15:11:20.0969969Z Results:
2024-11-04T15:11:20.0970194Z 	Name: cryostat-multi-namespace
2024-11-04T15:11:20.0970323Z 	State: fail
2024-11-04T15:11:20.0970330Z 
2024-11-04T15:11:20.0970456Z 	Errors:
2024-11-04T15:11:20.0970899Z 		Cryostat main deployment did not become available: context deadline exceeded
2024-11-04T15:11:20.0971548Z 		cryostat-multi-namespace test failed: context deadline exceeded

@andrewazores
Copy link
Member

Weird. I can reproduce the failure locally with make test-scorecard, but make test-scorecard-local SCORECARD_TEST_SELECTION=operator-install,cryostat-cr,cryostat-multi-namespace,cryostat-recording,cryostat-config-change,cryostat-report seems to succeed.

@andrewazores
Copy link
Member

NAME                                                                  READY   STATUS      RESTARTS   AGE
pod/andrewazores-cryostat-operator-bundle-4-0-0-split-deployments-4   1/1     Running     0          7m1s
pod/cryostat-operator-controller-5559bcd8b6-vm6k4                     1/1     Running     0          6m42s
pod/fa200ffb618b00c1795a2635f60894bdc36cfbb76a6db4f8c2748a6056gqnvm   0/1     Completed   0          6m57s
pod/scorecard-test-5n99                                               0/1     Completed   0          6m29s
pod/scorecard-test-7kxq                                               0/1     Completed   0          6m29s
pod/scorecard-test-f7lw                                               0/1     Completed   0          6m29s
pod/scorecard-test-hmxx                                               1/1     Running     0          5m48s
pod/scorecard-test-jq6r                                               0/1     Completed   0          6m22s
pod/scorecard-test-kjb2                                               0/1     Completed   0          6m29s
pod/scorecard-test-r7jb                                               0/1     Completed   0          6m25s
pod/scorecard-test-vtkt                                               0/1     Completed   0          6m29s
pod/scorecard-test-xbmc                                               0/1     Completed   0          6m29s

NAME                                           TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
service/cryostat-operator-controller-service   ClusterIP   10.217.5.60   <none>        443/TCP   6m42s
service/cryostat-operator-webhook-service      ClusterIP   10.217.5.23   <none>        443/TCP   6m45s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cryostat-operator-controller   1/1     1            1           6m42s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/cryostat-operator-controller-5559bcd8b6   1         1         1       6m42s

NAME                                                                        STATUS     COMPLETIONS   DURATION   AGE
job.batch/fa200ffb618b00c1795a2635f60894bdc36cfbb76a6db4f8c2748a60563e689   Complete   1/1           8s         6m57s

function cleanup { ( set +e; /home/work/bin/operator-sdk cleanup -n cryostat-operator-scorecard cryostat-operator; /home/work/workspace/cryostat-operator/bin/kustomize build internal/images/custom-scorecard-tests/rbac/ | oc delete --ignore-not-found=false -f -; oc delete --ignore-not-found=false -n cryostat-operator-scorecard secret registry-key; oc delete --ignore-not-found=false namespace cryostat-operator-scorecard; ) }; cleanup
WARN[0000] Cleanup operator: package "cryostat-operator" not found 
Error from server (NotFound): error when deleting "STDIN": serviceaccounts "cryostat-scorecard" not found
Error from server (NotFound): error when deleting "STDIN": roles.rbac.authorization.k8s.io "cryostat-scorecard" not found
Error from server (NotFound): error when deleting "STDIN": clusterroles.rbac.authorization.k8s.io "cryostat-scorecard" not found
Error from server (NotFound): error when deleting "STDIN": rolebindings.rbac.authorization.k8s.io "cryostat-scorecard" not found
Error from server (NotFound): error when deleting "STDIN": clusterrolebindings.rbac.authorization.k8s.io "cryostat-scorecard" not found
Error from server (NotFound): secrets "registry-key" not found
Error from server (NotFound): namespaces "cryostat-operator-scorecard" not found
make: [Makefile:221: clean-scorecard] Error 1 (ignored)
mkdir -p /home/work/workspace/cryostat-operator/bin
test -s /home/work/workspace/cryostat-operator/bin/kustomize || { curl -Ss "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash -s -- 4.5.7 /home/work/workspace/cryostat-operator/bin; }
oc create namespace cryostat-operator-scorecard && oc label --overwrite namespace cryostat-operator-scorecard pod-security.kubernetes.io/warn=restricted pod-security.kubernetes.io/audit=restricted
namespace/cryostat-operator-scorecard created
namespace/cryostat-operator-scorecard labeled
cd internal/images/custom-scorecard-tests/rbac/ && /home/work/workspace/cryostat-operator/bin/kustomize edit set namespace cryostat-operator-scorecard
/home/work/workspace/cryostat-operator/bin/kustomize build internal/images/custom-scorecard-tests/rbac/ | oc apply -f -
serviceaccount/cryostat-scorecard created
role.rbac.authorization.k8s.io/cryostat-scorecard created
clusterrole.rbac.authorization.k8s.io/cryostat-scorecard created
rolebinding.rbac.authorization.k8s.io/cryostat-scorecard created
clusterrolebinding.rbac.authorization.k8s.io/cryostat-scorecard created
/home/work/bin/operator-sdk run bundle -n cryostat-operator-scorecard --timeout 20m quay.io/andrewazores/cryostat-operator-bundle:4.0.0-split-deployments-4 --security-context-config=restricted 
INFO[0016] Creating a File-Based Catalog of the bundle "quay.io/andrewazores/cryostat-operator-bundle:4.0.0-split-deployments-4" 
INFO[0016] Generated a valid File-Based Catalog         
INFO[0019] Created registry pod: andrewazores-cryostat-operator-bundle-4-0-0-split-deployments-4 
INFO[0019] Created CatalogSource: cryostat-operator-catalog 
INFO[0019] OperatorGroup "operator-sdk-og" created      
INFO[0019] Created Subscription: cryostat-operator-v4-0-0-split-deployments-4-sub 
INFO[0030] Approved InstallPlan install-qzqrx for the Subscription: cryostat-operator-v4-0-0-split-deployments-4-sub 
INFO[0030] Waiting for ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" to reach 'Succeeded' phase 
INFO[0031]   Waiting for ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" to appear 
INFO[0033]   Found ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" phase: Pending 
INFO[0036]   Found ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" phase: Installing 
INFO[0037]   Found ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" phase: InstallReady 
INFO[0038]   Found ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" phase: Installing 
INFO[0047]   Found ClusterServiceVersion "cryostat-operator-scorecard/cryostat-operator.v4.0.0-split-deployments-4" phase: Succeeded 
INFO[0047] OLM has successfully installed "cryostat-operator.v4.0.0-split-deployments-4" 
function cleanup { ( set +e; /home/work/bin/operator-sdk cleanup -n cryostat-operator-scorecard cryostat-operator; /home/work/workspace/cryostat-operator/bin/kustomize build internal/images/custom-scorecard-tests/rbac/ | oc delete --ignore-not-found=false -f -; oc delete --ignore-not-found=false -n cryostat-operator-scorecard secret registry-key; oc delete --ignore-not-found=false namespace cryostat-operator-scorecard; ) } ; \
trap cleanup EXIT ; \
/home/work/bin/operator-sdk scorecard -n cryostat-operator-scorecard -s cryostat-scorecard -w 20m quay.io/andrewazores/cryostat-operator-bundle:4.0.0-split-deployments-4 --pod-security=restricted 

$ oc logs -n cryostat-operator-scorecard -f pod/cryostat-operator-controller-5559bcd8b6-vm6k4

2024-11-04T21:17:55Z	ERROR	controllers.Cryostat	Failed to set up TLS for Cryostat	{"Request.Namespace": "cryostat-operator-scorecard", "Request.Name": "cryostat-multi-namespace", "error": "admission webhook \"webhook.cert-manager.io\" denied the request: spec.commonName: Too long: must have at most 64 bytes"}
github.com/cryostatio/cryostat-operator/internal/controllers.(*Reconciler).reconcileCryostat
	/workspace/internal/controllers/reconciler.go:249
github.com/cryostatio/cryostat-operator/internal/controllers.(*CryostatReconciler).Reconcile
	/workspace/internal/controllers/cryostat_controller.go:102
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2024-11-04T21:17:55Z	ERROR	Reconciler error	{"controller": "cryostat", "controllerGroup": "operator.cryostat.io", "controllerKind": "Cryostat", "Cryostat": {"name":"cryostat-multi-namespace","namespace":"cryostat-operator-scorecard"}, "namespace": "cryostat-operator-scorecard", "name": "cryostat-multi-namespace", "reconcileID": "4d033dc6-013b-408e-8fdf-92725ab727ae", "error": "admission webhook \"webhook.cert-manager.io\" denied the request: spec.commonName: Too long: must have at most 64 bytes"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2024-11-04T21:17:59Z	INFO	controllers.Cryostat	Reconciling Cryostat	{"Request.Namespace": "cryostat-operator-scorecard", "Request.Name": "cryostat-cr"}
2024-11-04T21:17:59Z	INFO	controllers.Cryostat	Cryostat instance not found	{"Request.Namespace": "cryostat-operator-scorecard", "Request.Name": "cryostat-cr"}

@andrewazores
Copy link
Member

#964 ^

@andrewazores andrewazores added feat New feature or request blocked labels Nov 4, 2024
@andrewazores
Copy link
Member

Last two commits include #968 and an accompanying adjustment.

@andrewazores
Copy link
Member

/build_test

Copy link

github-actions bot commented Nov 5, 2024

/build_test completed successfully ✅.
View Actions Run.

@andrewazores andrewazores changed the base branch from main to split-deployment December 20, 2024 16:04
andrewazores and others added 2 commits December 20, 2024 11:10
* fix(tls): use fixed-length cert CommonNames

* certificate change recreation

* delete cert secret so they are also recreated
Copy link

This PR/issue depends on:

@andrewazores
Copy link
Member

/build_test

Copy link

/build_test completed successfully ✅.
View Actions Run.

@andrewazores
Copy link
Member

Rebased since #968 was merged to main, along with other recent changes.

Target to a new upstream split-deployment feature branch. This one will eventually merge into main, hopefully in time for 4.0 release. Other changes that need to go into that branch, building on top of this one:

  • ensure auth between components
  • TLS between components
  • NetworkPolicies for ingress?

@andrewazores andrewazores marked this pull request as ready for review December 20, 2024 17:05
@andrewazores andrewazores requested a review from a team December 20, 2024 17:05
@andrewazores
Copy link
Member

To test:

  1. Install cert manager. make cert_manager
  2. Check out, build and deploy. BUNDLE_IMG=quay.io/andrewazores/cryostat-operator-bundle:4.0.0-split-deployment-1 make deploy_bundle
  3. Create a namespace. oc new-project cryostat
  4. Create a Cryostat CR. TARGET_NAMESPACES=$(oc project -q) make create_cryostat_cr
  5. Wait for installation to complete. The resulting deployment should look like this:

NAME                                                                  READY   STATUS      RESTARTS   AGE
pod/andrewazores-cryostat-operator-bundle-4-0-0-split-deployment-1    1/1     Running     0          113m
pod/ba40d32483b3886b5308f8906f3e903cb1590653d501d129a3dd7fc90exszxh   0/1     Completed   0          113m
pod/cryostat-operator-controller-7697b8c757-7vshd                     1/1     Running     0          112m
pod/cryostat-sample-5b7d986897-7b7v4                                  5/5     Running     0          103m
pod/cryostat-sample-database-79bc5cf6c-9m6kt                          1/1     Running     0          103m
pod/cryostat-sample-storage-546cb7cb95-4wh46                          1/1     Running     0          103m

NAME                                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                       AGE
service/cryostat-operator-controller-service   ClusterIP   172.30.183.175   <none>        443/TCP                       112m
service/cryostat-operator-webhook-service      ClusterIP   172.30.151.71    <none>        443/TCP                       112m
service/cryostat-sample                        ClusterIP   172.30.90.129    <none>        4180/TCP                      103m
service/cryostat-sample-agent                  ClusterIP   172.30.155.75    <none>        8282/TCP                      103m
service/cryostat-sample-database               ClusterIP   172.30.20.91     <none>        5432/TCP                      103m
service/cryostat-sample-storage                ClusterIP   172.30.52.206    <none>        8333/TCP                      103m

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cryostat-operator-controller   1/1     1            1           112m
deployment.apps/cryostat-sample                1/1     1            1           103m
deployment.apps/cryostat-sample-database       1/1     1            1           103m
deployment.apps/cryostat-sample-storage        1/1     1            1           103m

NAME                                       HOST/PORT                                                                      PATH   SERVICES          PORT   TERMINATION          WILDCARD
route.route.openshift.io/cryostat-sample   cryostat-sample-cryostat.redacted.com          cryostat-sample   4180   reencrypt/Redirect   None

i.e. the database and storage are separate Deployments with individual Services.
5. Open the web UI (Route host/port)
6. Deploy some sample applications. make sample_app_agent ; oc scale deployment quarkus-cryostat-agent --replicas=4
7. Ensure everything is working as expected. Try creating some automated rules, creating and archiving some recordings, etc. Try updating the CR to add reports replicas and ensure report generation works both before and after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked feat New feature or request safe-to-test
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

[Story] Deploy database and storage containers separately
3 participants