When using snapshot isolation doing pcp_recovery_node will execute recovery_2nd_stage and there is a promote instruction there #53

possebon · 2024-05-29T14:22:16Z

Issues with recovery_2nd_stage Execution in PostgreSQL 14 and pgpool 4.5.2

Configuration:

PostgreSQL: 14
pgpool: 4.5.2

Objective: Use load balancing for all SELECT queries

Documentation Reference: pgpool Documentation

Issue Description:
I am configuring PostgreSQL 14 with pgpool 4.5.2 to load balance all SELECT queries. Following the documentation, I encountered an issue where the recovery_2nd_stage fails to execute correctly.

Observed Behavior:
While debugging the configuration and logs, I found the following error:

+ /usr/pgsql-14/bin/psql -h pgcluster01 -p 5433 -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1
+ read i
+ '[' postgres '!=' '' ']'
+ psql -h pgcluster01 -p 5433 -c 'SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = '\''S'\''' postgres
 setval
--------
(0 rows)

+ read i
+ '[' pgtestdb '!=' '' ']'
+ psql -h pgcluster01 -p 5433 -c 'SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = '\''S'\''' pgtestdb
Password for user postgres:
psql: error: connection to server at "pgcluster01" (10.28.141.21), port 5433 failed: fe_sendauth: no password supplied
+ read i
+ '[' 2 -ne 0 ']'
+ echo ERROR: recovery_2nd_stage: failed.
ERROR: recovery_2nd_stage: failed.
+ exit 1
2024-05-29 10:41:53.305 -03 [82771] ERROR:  pgpool_recovery failed
2024-05-29 10:41:53.305 -03 [82771] STATEMENT:  SELECT pgpool_recovery('recovery_2nd_stage', 'pgcluster02', '/var/lib/pgsql/14/data', '5433', 1, '5433', 'pgcluster01')
[root@pgcluster01 data]#

Diagnosis:
I created the pgtestdb to test using pgbench, but this database was not listed in .pgpass. The documentation states that the postgres user only needs to connect to the postgres database (I also added template1 to resolve the original error).

When recovery_2nd_stage fails, it does not execute the following promotion command:

# Promote the target server to a primary server
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@$DEST_NODE_HOST "
    $PGHOME/bin/pg_ctl -l /dev/null -w -D $DEST_NODE_PGDATA promote"

Workaround:
To avoid further errors when user databases are present, I modified the .pgpass file to use * instead of specifying databases manually. This allowed pcp_recovery_node to run smoothly (as recovery_2nd_stage completes successfully), but no replication slot appears when checking with SELECT * FROM pg_stat_replication.

Logs:

The PostgreSQL logs for the pcp_recovery_node process show no errors, indicating that everything ran successfully.

Concerns:

It seems recovery_2nd_stage should not execute the promote command, as this causes the target node to assume it is the primary, leading to the disappearance of the replication slot.
If this is a characteristic of snapshot isolation mode, I need to understand how data is replicated between nodes without a replication slot.

Request for Clarification:

Is there a configuration detail I might be missing?
How should data replication be managed in this scenario to maintain consistency across nodes?

The text was updated successfully, but these errors were encountered:

pengbo0328 · 2024-06-17T07:46:43Z

@possebon Sorry for the late response.

Diagnosis:
I created the pgtestdb to test using pgbench, but this database was not listed in .pgpass. The documentation states that the postgres user only needs to connect to the postgres database (I also added template1 to resolve the original error).

If you use other users in your scripts, you need to add those users in .pgpass.

Request for Clarification:

Is there a configuration detail I might be missing?
How should data replication be managed in this scenario to maintain consistency across nodes?

In snapshot isolation mode, streaming replication is not used.
Pgpool-II will replicate data to each node. (Pgpool-II executes the same queries on each PostgreSQL node.)

https://www.pgpool.net/docs/latest/en/html/runtime-config-running-mode.html

If you want to use streaming replication for data consistency, you can use Pgpool-II's streaming replication mode by setting backend_clustering_mode = 'streaming_replication'

Below is the configuration example:
https://www.pgpool.net/docs/latest/en/html/example-cluster.html

pengbo0328 self-assigned this Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using snapshot isolation doing pcp_recovery_node will execute recovery_2nd_stage and there is a promote instruction there #53

When using snapshot isolation doing pcp_recovery_node will execute recovery_2nd_stage and there is a promote instruction there #53

possebon commented May 29, 2024 •

edited

Loading

pengbo0328 commented Jun 17, 2024

When using snapshot isolation doing pcp_recovery_node will execute recovery_2nd_stage and there is a promote instruction there #53

When using snapshot isolation doing pcp_recovery_node will execute recovery_2nd_stage and there is a promote instruction there #53

Comments

possebon commented May 29, 2024 • edited Loading

Issues with recovery_2nd_stage Execution in PostgreSQL 14 and pgpool 4.5.2

pengbo0328 commented Jun 17, 2024

possebon commented May 29, 2024 •

edited

Loading