You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Last week I did a major version upgrade with the playbook and encountered a few issues which I want to share.
During the upgrade maintenance_enable and maintenance_disable roles are used. Their functions however are somewhat different: enable role disables confd and deploys a temporary configuration for haproxy which disables healthchecks but also stops patroni cluster (it also handles vip-manager tasks but I don't use that). Disable role, however, only deals with confd/haproxy/vip-manager but not patroni. These tasks are executed on database nodes, while conf/haproxy are deployed to balancers host group from the inventory. Therefore, when you run an upgrade the playbook is trying to stop confd on cluster nodes which don't have it and fails.
After I initially deployed my cluster I tried various settings to adjust my setup and once set an invalid value for log_timezone parameter. I have fixed that long ago but during the upgrade patroni got this old config from somewhere and tried to start new postgres version with that incorrect value which caused a failure loop. I couldn't figure out where it was coming from for a while but then I found patroni.dynmic.json file located in my data directory which was used to generate settings for the new version. I think that the best course of action would be to use the latest DCS config to start not that file which was somehow persisted in data directory.
So item 1 definitely looks like a bug to me, while item 2 is mostly my own mistake and lack of understanding of patroni configuration but I think it should be highlighted so others are aware of that during the upgrade.
The text was updated successfully, but these errors were encountered:
Move stopping patroni cluster from maintenance_enable to stop_services role, leaving maintenance_enable and maintenance_disable roles to take care of confd/haproxy/vip-manager.
Extract maintenance_enable and maintenance_disable tasks from (5/6) UPGRADE: Upgrade PostgreSQL group to be executed before and after it correspondingly on balancers hosts.
Last week I did a major version upgrade with the playbook and encountered a few issues which I want to share.
maintenance_enable
andmaintenance_disable
roles are used. Their functions however are somewhat different: enable role disables confd and deploys a temporary configuration for haproxy which disables healthchecks but also stops patroni cluster (it also handles vip-manager tasks but I don't use that). Disable role, however, only deals with confd/haproxy/vip-manager but not patroni. These tasks are executed on database nodes, while conf/haproxy are deployed to balancers host group from the inventory. Therefore, when you run an upgrade the playbook is trying to stop confd on cluster nodes which don't have it and fails.log_timezone
parameter. I have fixed that long ago but during the upgrade patroni got this old config from somewhere and tried to start new postgres version with that incorrect value which caused a failure loop. I couldn't figure out where it was coming from for a while but then I foundpatroni.dynmic.json
file located in my data directory which was used to generate settings for the new version. I think that the best course of action would be to use the latest DCS config to start not that file which was somehow persisted in data directory.So item 1 definitely looks like a bug to me, while item 2 is mostly my own mistake and lack of understanding of patroni configuration but I think it should be highlighted so others are aware of that during the upgrade.
The text was updated successfully, but these errors were encountered: