-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(upgrade): add support to migrate jiva volumes resources to openebs namespace #1646
feat(upgrade): add support to migrate jiva volumes resources to openebs namespace #1646
Conversation
Signed-off-by: shubham <[email protected]>
How does the code handle the following cases:
|
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
@@ -411,6 +409,20 @@ func (j *jivaVolumeOptions) preupgrade(pvName, openebsNamespace string) error { | |||
return err | |||
} | |||
|
|||
if currentVersion < "1.9.0" { | |||
klog.Infof("Scaling down old deployments") | |||
err = j.preMigration(openebsNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name preMigration as something to say if less than 1.9.0
if currentVersion < "1.9.0" { | ||
klog.Infof("Scaling down old deployments") | ||
err = j.preMigration(openebsNamespace) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of failures, do we need to revert the partial upgrade that has been done so that operations go fine? or would that be out of scope of upgrade jobs?
Did we handled such things like reverting partial upgrade in any previous failures?
is this scope for higher level operator (or) need to follow blue-green deployments?
return nil | ||
} | ||
|
||
func (j *jivaVolumeOptions) migrateReplica(openebsNamespace string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this function name as well to reflect version < 1.9.0
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
// Scaling down controller deployment before patching replica deployment | ||
// if the replica is not upgraded already. | ||
if j.replicaObj.version == currentVersion { | ||
err = scaleDeploy(j.controllerObj.name, j.ns, ctrlDeployLabel, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be moved to part of preUpgarde rather than being part of replicaUpgrade?
func jivaUpgrade(pvName, openebsNamespace string, utaskObj *utask.UpgradeTask) (*utask.UpgradeTask, error) { | ||
|
||
var ( | ||
pvLabel = "openebs.io/persistent-volume=" + pvName | ||
err error | ||
pvLabel = "openebs.io/persistent-volume=" + pvName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert these hardcoded strings as constants like apis.PersistentVolumeCPK
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
return err | ||
} | ||
nodeNames := deployObj.Spec.Template.Spec.Affinity.NodeAffinity. | ||
RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions[0].Values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be sure that first element is the one that we always want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see if-else in CASTemplates
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions[0].Values | ||
var one int32 = 1 | ||
for i, node := range nodeNames { | ||
replicaDeploy := deployObj.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add validation for each nodeName to verify that it exists
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
// updating the replica deployments image and version before | ||
// creating them in openebs namespace. | ||
lastIndex := strings.LastIndex(replicaDeploy.Spec.Template.Spec.Containers[0].Image, ":") | ||
replicaDeploy.Spec.Template.Spec.Containers[0].Image = replicaDeploy.Spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be avoided and let regular patch takes care of updating image name, version etc?
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
replicaDeploy.Name = replicaDeploy.Name + "-" + strconv.Itoa(i+1) | ||
replicaDeploy.Namespace = openebsNamespace | ||
replicaDeploy.ResourceVersion = "" | ||
replicaDeploy.Spec.Replicas = &one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as part of postMigration i.e., after replicaPatch is done, scale up the deployments
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
return err | ||
} | ||
var one int32 = 1 | ||
deployObj.Spec.Replicas = &one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as in migrateReplica applies here
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
if err != nil { | ||
return err | ||
} | ||
svcList, err := serviceClient.WithNamespace(j.ns).List(metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to migrateService
if err != nil { | ||
return err | ||
} | ||
klog.Infof("creating controller service %s in %s namespace", svcObj.Name, openebsNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if upgrade job dies after deleting service? we will be losing the IP..
is it possible to create service with same IP but in disabled mode?
If so, first create in disabled mode, delete old one and then enable new one?
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code structure doesn't look neat..
better to allow old code to get executed by reducing code in new path as provided in comments there
Same applies for target also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its also required to allow executing existing code path as 'patch' may keep changing as we make new releases..
By combining upgrade code into migration path, we will miss the patches that need to be done in new releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other suggestion is:
- As backward compatibility is already there, document the steps for migration and let user perform the steps
- segregate migration from upgarde Signed-off-by: shubham <[email protected]>
return nil, err | ||
} | ||
svcObj := &corev1.Service{} | ||
svcObj.ObjectMeta = metav1.ObjectMeta{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about getting the existing service object and then doing a deep copy, patching the namespace. I guess this logic is written to account for the case where the previous service object is already deleted and then possible job also is restarted and will have to now go ahead and create the new service.
Can there be an alternate solution like:
- create a temporary service - with a new name and deleting the older cluster ip.
- delete orig service
- create new service, by copying from temp service and using the target ip (from PV) as the cluster ip
- as part of cleanup, delete the temporary service.
…er verifying upgrade Signed-off-by: shubham <[email protected]>
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
} | ||
if err == nil { | ||
klog.Info("getting old deployment") | ||
if deployObj.Name == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can never happen
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
if j.ns == openebsNamespace { | ||
replicasLeft = replicasLeft - 1 | ||
} | ||
if len(deployList.Items) != replicaCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a chance where old replica deployment got scaled down and upgrade process got restarted without completing upgrade?
If such thing happens, replicaCount will be 0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of depending on spec.Replicas to get replicationFactor, better to depend on len of nodeNames in NodeAffinity..
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions[0].Values = []string{nodeNames[i]} | ||
klog.Infof("creating replica deployment %s in %s namespace", replicaDeploy.Name, openebsNamespace) | ||
replicaDeploy, err := deployClient.WithNamespace(openebsNamespace).Create(replicaDeploy) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by handling 'IsAlreadyExists' error, we can just keep creating on every iteration without much logic in getting the count.
Current code of getting count of replicas to be created is also fine.
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
return err | ||
} | ||
replicaCount := int(*oldDeployObj.Spec.Replicas) | ||
replicasLeft := len(deployList.Items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable name should be replicasCreated?
|
||
func (j *jivaVolumeOptions) migrateTarget(pvName, openebsNamespace string) error { | ||
// get the controller deployment in openebs namespace | ||
deployObj, err := deployClient.WithNamespace(openebsNamespace).Get(j.controllerObj.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing Get, by just checking 'j.ns' for openebsNamespace and returning should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.. this Get call helps in case of upgrade restart also..
func (j *jivaVolumeOptions) migrateTarget(pvName, openebsNamespace string) error { | ||
// get the controller deployment in openebs namespace | ||
deployObj, err := deployClient.WithNamespace(openebsNamespace).Get(j.controllerObj.name) | ||
if err != nil && !k8serror.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can return for err as nil also, and can avoid next 'if' condition
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
if j.replicaObj.version == currentVersion { | ||
err = scaleTargetDeploy(j.controllerObj.name, j.ns, 0) | ||
if currentVersion < "1.9.0" { | ||
err = scaleDeploy(j.controllerObj.name, j.ns, ctrlDeployLabel, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want to scale down controller post 1.9 also?
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
|
||
func (j *jivaVolumeOptions) migrateTargetSVC(pvName, openebsNamespace string) error { | ||
// migrate service only if service not in openebs namespace | ||
if j.ns != openebsNamespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do reversing the condition
return nil | ||
} | ||
|
||
func getTargetSVC(pvName, openebsNamespace string) (*corev1.Service, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs validation with CASTemplate..
@utkarshmani1997 can you do this review?
if j.replicaObj.version == currentVersion && j.replicaObj.name != "" { | ||
klog.Info("cleaning old replica deployment") | ||
err = deployClient.WithNamespace(j.ns).Delete(j.replicaObj.name, &metav1.DeleteOptions{}) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle isNotFound error as not error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the job restarts and the replica deployment is missing the getReplica
function will not populate replicaObj.name
and the check j.replicaObj.name != ""
will skip the deletion.
return err | ||
} | ||
} | ||
if j.controllerObj.version == currentVersion && j.ns != openebsNamespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if upgrade job gets restarted at this point, replicaObj.version will not be set in getReplica call.. and all the upgrade process is dependent on this field replicaObj.version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same problem with replicaObj.name also
Signed-off-by: shubham <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if err != nil { | ||
return nil, err | ||
} | ||
replicaObj.replicas[deployObj.Name] = version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding logs at required places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logs like those in patchReplica would be good to be there in new functions
pkg/upgrade/upgrader/helper.go
Outdated
@@ -59,6 +60,23 @@ func getOpenEBSVersion(d *appsv1.Deployment) (string, error) { | |||
return d.Labels["openebs.io/version"], nil | |||
} | |||
|
|||
func checkOpenEBSVersion(d *appsv1.Deployment) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function be called isValidVersion
? Also there is a message in this function about replica.
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
return patchDetails, nil | ||
} | ||
|
||
func getReplica(pvName, replicaLabel, namespace, openebsNamespace string) (*replicaDetails, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_: just to be clear, the variable namespace
can be renamed as volumeNamespace
metav1.ListOptions{ | ||
LabelSelector: controllerLabel, | ||
}) | ||
func (j *jivaVolumeOptions) preReplicaUpgradeLessThan190(pvName, openebsNamespace string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments for the functions
"openebs.io/persistent-volume": pvName, | ||
"openebs.io/persistent-volume-claim": pvObj.Spec.ClaimRef.Name, | ||
"pvc": pvObj.Spec.ClaimRef.Name, | ||
"openebs.io/version": currentVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be UpgradeVersion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.. we are updating this in patchService fn
}) | ||
func (j *jivaVolumeOptions) preReplicaUpgradeLessThan190(pvName, openebsNamespace string) (string, error) { | ||
if currentVersion < "1.9.0" { | ||
err := scaleDeploy(j.replicaObj.name, j.ns, replicaDeployLabel, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can replicaObj.name be nil here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Added appropriate checks.
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
} | ||
replicaObj.name = deployObj.Name | ||
version, err := getOpenEBSVersion(deployObj) | ||
func getPatchDetailsForDeploy(pvName string, deployObj *appsv1.Deployment) (*replicaPatchDetails, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_: As this is specific to Replica Patch. Can we change this to getPatchDetailsForReplicaDeploy
// check if old replica is present for currentVersion < 1.9.0 | ||
// if present then migration is not complete and store the old | ||
// replica details | ||
if currentVersion < "1.9.0" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above comments are good..
add few more like
'replicaObj.name and replicaObj.version would be empty if old replica got deleted as part of upgrade.
So, later on code uses replicaObj.name to perform replica related migration.'
|
||
klog.Infof("splitting replica deployment") | ||
var zero int32 | ||
for i := replicasCreated; i < replicaCount; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment above this for loop to say that:
replica deployment pv-name-rep
will be split into multiple replicas like pv-name-rep-1
, pv-name-rep-2
,... pv-name-rep-n
, where n is the replica count for this volume.
|
||
func (j *jivaVolumeOptions) cleanup(openebsNamespace string) error { | ||
var err error | ||
if j.replicaObj.version == currentVersion && j.replicaObj.name != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add a check on the namespace here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace check is not required as irrespective of namespace either openebs or pvc, the old deployment pvName-rep
is deleted after splitting it.
pkg/upgrade/upgrader/jiva_upgrade.go
Outdated
} | ||
} | ||
if j.controllerObj.version == currentVersion && j.ns != openebsNamespace { | ||
klog.Info("cleaning old replica deployment") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_: this should be deleting controller(or target) deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -584,6 +611,18 @@ func (j *jivaVolumeOptions) verify(pvLabel, openebsNamespace string) error { | |||
} | |||
return err | |||
} | |||
if currentVersion < "1.9.0" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be possible that the job dies after cleaning up old replica deployments. So when the job restarts, will it still know that it has to go and cleanup the pending target deployment. One thought is -- just call the j.cleanup anyway without version check, and that cleanup will only delete stale entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If job dies after cleanup is successful it will not attempt to delete the deployment as it will not be able to find old deployment and the replicaObj.name
variable will be empty and we check on that before deleting the deployment.
Signed-off-by: shubham <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes are good
Signed-off-by: shubham [email protected]
What this PR does / why we need it:
This PR adds support to migrate the old jiva volume resources to openebs namespace.
Test job logs:
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Checklist:
documentation
tagbreaking-changes
tagrequires-upgrade
tag