-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db upgrage: upgrade postgres 15 to postgres 16 #1415
Conversation
@vh05, can you please improve the PR description? What are the changes, technical details on the env used, etc. |
sure |
ec79248
to
009642f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that instead of deciding on upgrade and applying changes inside SetDesiredNooBaaDB
it will be more manageable, easy to follow, and less error-prone if you implement an UpgradeDB
function that will be performed before the DB reconciliation. This method will ensure that everything is upgraded correctly, and will only return nil if the reconciliation is ready to proceed.
Sure Danny. Let me check and do the changes |
999ccc6
to
e0c629d
Compare
5307eec
to
7873068
Compare
88d449c
to
4526d87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my comments, I think the main issue is to check for issues in the middle of the upgrade. I would recommend testing the flows with failing in the middle of every phase. try to delete the pod in the middle of upgrade/cleanning and so on, and make sure we recover from it in a good manner.
pkg/system/phase2_creating.go
Outdated
r.NooBaa.Status.PostgresUpdatePhase = nbv1.UpgradePhasePrepare | ||
} else { | ||
if r.NooBaa.Status.PostgresMajorVersion == nbv1.PostgresMajorVersionV16 { | ||
r.NooBaa.Status.PostgresMajorVersion = nbv1.PostgresMajorVersionV16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow this... didn't we check this in the if statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for this condition, If the desired_image is still pg_15
and current_image = pg15
but major version is not 16
then we come in the else case. Then in this case we, assign phaseNone
, so that we go to phaseNone
case and check for desired image and do upgrade reqd
This is to distinguish, desired image pg_16_SOME_HASH
and current_image = pg16
but major version is 16
then we need not upgrade but simply assign phaseFinished
flag
pkg/system/phase2_creating.go
Outdated
return nil | ||
} | ||
// make sure previous step has finished | ||
if dbPod.ObjectMeta.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check is enough to make sure the upgrade finished successfully? I think we better check the pod status, check it's not it CLBO, other error, still initializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here. When the pod goes into ERROR or CLBO, most oberserved reason is either
- POSTGRES_COPY env removed in between upgrade or
- POSTGRES_COPY env is still present after upgrade.
I am not sure how to distinguish using the status. The pod logs gives clue about this but not sure how and whether we should read the logs and come to conclusion
We can't really revert the situation as the db upgrade is not in our control.
Ex: case 1: During upgrade to PG16, PG upgrade process changes the directory where the config etc present to POSTGRES_16 dir. If POSTGRES_COPY flag is removed at this stage it goes into CLBO. We have to place that flag at this moment to get the pod into running status. We can't really revert the upgrade state.
Likewise, case 2: Once after upgrade, we got remove the flag and not sure we can revert the status.
These are some 2 cases where we dont really decide upon pod status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think we should check as much as we can before moving to next phase. you set the env POSTGRES_COPY before the upgrade and the pod doen't come up, you will try this phase again and set it again, only when the pod is running with no errors we will assume the phase was successful. moving to the next phase will make us finish the upgrade and mark it as done while the upgrade completely failed.
pkg/system/phase2_creating.go
Outdated
r.Logger.Errorf("UpgradePostgresDB::got error on endpoints deployment reconcile %v", err) | ||
return err | ||
} | ||
r.NooBaa.Status.PostgresMajorVersion = nbv1.PostgresMajorVersionV16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here maybe we should double-check the db pod is up and running before setting the upgrade to finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is CLBO or Error, Its not clear how to handle the errors as explained above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should at least fail the upgrade no? add an event? Finished should be set when we tried starting with v16 and to pod starts with no issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot fail the upgrade. Once the env to do the upgrade is set, the postgres will take the responsibility of upgrading. For any further errors we need to track the logs of the db pod
pkg/system/phase2_creating.go
Outdated
if r.NooBaa.Spec.DBResources != nil { | ||
c.Resources = *r.NooBaa.Spec.DBResources | ||
} | ||
|
||
c.Lifecycle = &corev1.Lifecycle{ | ||
PreStop: &corev1.LifecycleHandler{ | ||
Exec: &corev1.ExecAction{ | ||
Command: []string{"/bin/sh", "-c", "pg_ctl -D /var/lib/pgsql/data/userdata/ -w -t 60 -m fast stop"}, | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure of this. PG upgrade takes care of this. Just doing it explicitly that db upgrade can start safely. Your opinion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to reconcile only the parts we want to change. and these parts are supposed to be already set. but verify me if it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this part
pkg/system/phase2_creating.go
Outdated
sts.Namespace = r.NooBaa.Namespace | ||
stsObj, err := util.KubeGetObject(sts) | ||
// Postgres database doesn't exists, no need to upgrade. | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to perform this check outside of the switch
, as this is not part of the upgrade.
you should perform all gating checks before starting the upgrade process: are we already in the desired version? are we on a new installation. did we already perform the upgrade?, etc.
Then, when we realize that we should start the upgrade, set the phase accordingly (preparing?) and then go into the switch.
Try not to use an empty phase as an expected value for the upgrade init. As mentioned before, we can have other values from previous upgrades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty phrase is used for the cases when we move from older version. On older version upgradephase
flag was not used/updated.
sure. I will move this case and phaseNone case outside
since both are doing the samething and start from phaseprepate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this part out of switch
@vh05 @vh05, it will be helpful to add a markdown document in the PR (under I believe that such a document will help us both with the implementation and as a reference for debugging later. |
Here is the PR for doc: |
The Postgres upgrade from 15 to 16 or any later versions can be done by adding the env variable named
POSTGRES_UPGRADE = copy
in the pg db pod.More about upgrade process is here: https://www.postgresql.org/docs/current/pgupgrade.html
In our case, we achieve the upgrade by setting this env var in
noobaa-db-pg
STS.Once after upgrade is done, we have to remove the
POSTGRES_UPGRADE
env from the STS. Otherwise pg db pod fails to come up because the db pod check PG directory version against upgrade image version and exits if there is no upgrade required andPOSTGRES_UPGRADE
is still provided.To remove this env, we store the upgrade status in
noobaa-core
CR after the PG upgrade and refer the same status during db reconcile. If the upgrade is already done, we remove this env from desired sts status.This will help to upgrade the db smoothly