-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable CDC on DSE clusters #478
Conversation
3e3f772
to
eb0c3c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave approval to @burmanm, but I did have the one comment.
README.md
Outdated
@@ -158,7 +158,7 @@ defaults: | |||
cassandra: | |||
repository: "k8ssandra/cass-management-api" | |||
dse: | |||
repository: "datastax/dse-server" | |||
repository: "datastax/dse-mgmtapi-6_8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires upgrade test to verify that existing clusters do not break, lets say 6.8.9 installations.
@@ -14,7 +14,7 @@ When leveraging a single endpoint ingress / load balancer we lose the ability to | |||
k3d load image --cluster k3s-default \ | |||
datastax/cass-operator:1.3.0 \ | |||
datastax/cass-config-builder:1.0-ubi7 \ | |||
datastax/dse-server:6.8.4-ubi7 \ | |||
datastax/dse-mgmtapi-6_8:6.8.4-ubi7 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image does not exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this so we're talking about a recent version. The UBI will exist tomorrow :)
cdc_successful_test must be modified to test with DSE also (and updated in the .github configs). |
And of course CHANGELOG ;) |
Thanks @burmanm, I've made the requested changes. |
No, not yet. The same integration test must work in DSE & OSS (just like we do for most tests), not separate one. There's a method that takes input the version and that should be used. No separate testfiles needed or maintaining of multiple tests per version. |
I don't know about that. The problem is, we need to delete Pulsar to ensure the two tests don't interfere with each other. Otherwise Pulsar will still have the data from the previous test which might result in inaccurate test results. Also, if we do the tests serially, they will be very slow due to Pulsar taking a while to roll out (although image pulls might be reduced for Pulsar components, which could make things faster.. IDK.) |
This is not what I meant. The test code must be the same (cdc_successful_test), the execution in different VM is done by the GHA. Just like all the other tests behave, there's a separate code that modifies the input file to match the correct serverVersion and serverType. |
I'm not following what you mean sorry. The parallelisation happens at the level of the test name doesn't it? And that means that we need to have two test files? Or am I wrong? Maybe you can show me an example of the way you'd like it structured from another test. |
You are, the parallelisation between different settings happens in the GHA. Most of the other tests are all working this way, just look at for example https://github.com/k8ssandra/cass-operator/blob/master/tests/test_all_the_things/test_all_the_things_suite_test.go#L62 That method will return the deployment file as DSE / Cass (3.11/4.0.x/4.1.x ..). Those are then different tests to be run. |
And separate note, this PR can't be merged as is, the image_config.yaml can't change in the master branch. Otherwise it will break clusters that are running older versions that our images do not support. |
@Miles-Garnsey I wanted to try to clarify a few things about the tests. You added a new test, It looks like you cloned the existing The environment variables that can be set are M_SERVER_VERSION, M_SERVER_TYPE and M_SERVER_IMAGE. If any of these are set when the test runs, calls to CreateTestFile will replace the test fixture data with those overrides. So you don't need to add that test, you need to put it in one of the GitHub Actions jobs that has DSE in the test matrix. That said, there isn't a spot where you can add Also, to clarify a little of what @burmanm is talking about:
The main issue here is that changing the default DSE image repo from A possible solution to this last point would be to publish older DSE images (all that cass-operator supports) to the |
@emerkle826 and I have had a chat about this and he's walked me through some of your comments @burmanm. Test versions/serverTypes @emerkle826, thanks for explaining this, it looks like a bunch of changes were made a few months ago and the CDC test no longer applies the manifests verbatim. I'll do as you suggested and update the workflow instead of duplicating the tests.
Image registry change Erik has provided some explanation of this above. It seems that the issue is that users may not have the image set, and (given we only publish 6.8.25+ via the management api image registry) a customer who has no image set but has the serverVersion set at e.g. 6.8.5 would experience problems due to the change in repo that we're implementing. There are two solutions to this problem in my view:
I think that there is a consensus that we need to be building our own Docker images for DSE if we're going to be able to implement features like this. |
I'm not crazy about the former, we can't assume that folks will read the release notes before upgrading. |
That can be done in the images.go, but I'd say it should almost be 6.8.31 only automatic, etc. Whenever we do the decided cut. Until then users might have deployed clusters with older versions, which we would then have to test for compatibility. In any case, not for next release (1.14.0). |
We do now have Management API images for DSE based on RedHat UBI images: |
As discussed today, the approach we will take is documented in #479. This will be a separate PR (target release 1.15.0). From this PR, the image_config.yaml as well as image.go changes should be removed. |
+1, let's do this |
Use k8ssandra repository for DSE images going forward.
This reverts commit 40dc8e6.
8e4cd87
to
ebed106
Compare
@@ -3,7 +3,7 @@ apiVersion: apiextensions.k8s.io/v1 | |||
kind: CustomResourceDefinition | |||
metadata: | |||
annotations: | |||
controller-gen.kubebuilder.io/version: v0.10.0 | |||
controller-gen.kubebuilder.io/version: v0.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong controller-gen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR updated this to v0.10.0 dude: https://github.com/k8ssandra/cass-operator/pull/452/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R247
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's supposed to be 0.10.0 (that PR was done 3 months ago). You're recreating these with 0.8.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, sorry! Fixed.
Those files get generated via kubebuilder don't they? I ran make controller-gen because I know that is local to the repo, but we use a system level kubebuilder in this repo don't we? Maybe we should be using a local one to avoid this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's because the make targets use a local version of controller-gen which is under ./bin
and is only re-downloaded if it's not already in that folder. You may have had an older version there, which can be fixed by just deleting the binary and re-running the make target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it turns out I'd only updated the one in k8ssandra-operator, not cass-operator. My bad, apologies.
@@ -91,9 +91,12 @@ func updateAdditionalJVMOpts(optsSlice []string, CDCConfig *cassdcapi.CDCConfigu | |||
} | |||
} | |||
CDCOpt := fmt.Sprintf("-javaagent:%s=%s", "/opt/cdc_agent/cdc-agent.jar", strings.Join(optsSlice, ",")) | |||
// We want to disable MCAC when the server being deployed is DSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to remove MCAC agent when MGMT_API_DISABLE_MCAC is set (like done in https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/master/scripts/docker-entrypoint.sh#L43)
Does this process mess up with that logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but that's a problem to solve under that PR, not this one IMO. That one is still under development anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #481 to track this. We need it also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(separate PR)
- upgrade_operator # Test is not setup to run against 4.0 | ||
- additional_seeds #TODO: Fails against C* 4.0, fix in https://github.com/k8ssandra/cass-operator/issues/459 | ||
- scale_down_unbalanced_racks #TODO: Fails against C* 4.0 and DSE 6.8, fix in https://github.com/k8ssandra/cass-operator/issues/459 | ||
- upgrade_operator # Test is not setup to run against 4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with these mass-reindents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation has been pretty inconsistent so I've turned on Trim Trailing Whitespace
and Yaml › Format: Enable
in vs code, I think everyone should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also reindented all our CRDs. And those are now being regenerated by the make manifests. So we have 10k line commits doing reindents and I have no idea if anything was changed in them (I just assumed they were re-created after changing the controller-gen version). We can't reindent stuff with these tools if they don't understand what's generated.
@@ -299,7 +299,7 @@ jobs: | |||
- name: set environment for DSE | |||
if: ${{ startsWith( matrix.version, '6.8' ) }} | |||
run: | | |||
echo "M_SERVER_IMAGE=datastax/dse-mgmtapi-6_8:${{ matrix.version }}-jdk8" >> $GITHUB_ENV | |||
echo "M_SERVER_IMAGE=datastax/dse-server:${{ matrix.version }}-jdk8" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. this isn't right. It should be in the mgmtapi-6_8 still.
What this PR does:
Ensures MCAC doesn't get enabled for DSE clusters.
Ensures the management API images come from the management API docker repository. These contain the CDC agent.
Which issue(s) this PR fixes:
Fixes #476
Checklist