-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-5218] Implement restore flow #162
Conversation
8bc5ffc
to
5e498ac
Compare
1d939aa
to
c78ac8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look and the code looks very well structured and clear to me. I don't have any major concerns with this PR. Actually kudos for the work here, and for the very detailed PR description (even with a demo!!!)
I would only have a small request (which is outside of the scope of the ticket) and we could maybe do this in the next pulse: to have some integration tests for Kafka, where we actually tests both restoring a backup with or without an existing relations, and making sure that the credentials of a given clients do not need to be changed (meaning Kafka still retrieves the correct credentials/ACLs/lags from ZooKeeper). I have already seen that there is a PR open with the required changes, probably adding the tests too is very appropriate for that PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an excellent PR 👍🏾. I mentioned it in the comments, but I really appreciate how readable the 'relation-changed-restore' flow was. Overall I'm fine to approve now, but will hold off until the comments have been addressed. They're non-blocking mostly, but I expect a few conversations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT! The RestoreStep
is a neat abstraction to do the rolling databag keys for units/app.
This PR implements the flow for restoring a ZooKeeper snapshot.
Other changes
client-jaas.cfg
file to streamline a little bit the process of accessing the ZK cluster within the machine.Use cases
This flow can be used to bootstrap a new cluster using seeded data. Upon relating with a new client application, let's say, Kafka, here is what is going to happen:
/kafka
and its subnodes will be updated to the new relation, if it already existedadmin
andsync
users, but, we will keep others pre-existing users (and other sub zNodes around).With a minimal change in the Kafka charm, we can also restore a snapshot on an already related ZK application. The chain of events will be as follows:
endpoints
field from the databag. On Kafka, this has the effect of triggering therelation-changed
event and setting the application's status toZK_NO_DATA
relation-changed
event. Kafka relation flow will then rotate theadmin
andsync
user passwordsHere is the patch for Kafka:
About the restore flow
Since we need to operate the workload on all units, we use a chain of events instead of staying within the context of the action execution.
We want to synchronize the units while they follow these steps:
Therefore, we do not use the rolling ops lib, and use a different peer relation to manage this flow, making sure all units are done with a step before moving on to the next.
- Instead of using the peercluster
relation, we now have a newrestore
one. There are two main reasons why:- it makes following the chain of events way easier because the dozens of events fired during the flow do not trigger the "something changed"cluster-relation-changed
- this separation of concerns guarantees that if anything goes wrong, we can ssh into the machine to solve this issue, and then continue with the flow usingjuju resolve
. This is also why we do not have defers or try except for theexec
callsNEW: we use the peer
cluster
relation instead of the dedicatedrestore
from earlier commits.TODO
Demo
demo_restore.mp4