Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using snapshot isolation doing pcp_recovery_node will execute recovery_2nd_stage and there is a promote instruction there #53

Open
possebon opened this issue May 29, 2024 · 1 comment
Assignees

Comments

@possebon
Copy link

possebon commented May 29, 2024

Issues with recovery_2nd_stage Execution in PostgreSQL 14 and pgpool 4.5.2

Configuration:

  • PostgreSQL: 14
  • pgpool: 4.5.2

Objective: Use load balancing for all SELECT queries

Documentation Reference: pgpool Documentation

Issue Description:
I am configuring PostgreSQL 14 with pgpool 4.5.2 to load balance all SELECT queries. Following the documentation, I encountered an issue where the recovery_2nd_stage fails to execute correctly.

Observed Behavior:
While debugging the configuration and logs, I found the following error:

+ /usr/pgsql-14/bin/psql -h pgcluster01 -p 5433 -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1
+ read i
+ '[' postgres '!=' '' ']'
+ psql -h pgcluster01 -p 5433 -c 'SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = '\''S'\''' postgres
 setval
--------
(0 rows)

+ read i
+ '[' pgtestdb '!=' '' ']'
+ psql -h pgcluster01 -p 5433 -c 'SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = '\''S'\''' pgtestdb
Password for user postgres:
psql: error: connection to server at "pgcluster01" (10.28.141.21), port 5433 failed: fe_sendauth: no password supplied
+ read i
+ '[' 2 -ne 0 ']'
+ echo ERROR: recovery_2nd_stage: failed.
ERROR: recovery_2nd_stage: failed.
+ exit 1
2024-05-29 10:41:53.305 -03 [82771] ERROR:  pgpool_recovery failed
2024-05-29 10:41:53.305 -03 [82771] STATEMENT:  SELECT pgpool_recovery('recovery_2nd_stage', 'pgcluster02', '/var/lib/pgsql/14/data', '5433', 1, '5433', 'pgcluster01')
[root@pgcluster01 data]#

Diagnosis:
I created the pgtestdb to test using pgbench, but this database was not listed in .pgpass. The documentation states that the postgres user only needs to connect to the postgres database (I also added template1 to resolve the original error).

When recovery_2nd_stage fails, it does not execute the following promotion command:

# Promote the target server to a primary server
ssh -T ${SSH_OPTIONS} ${POSTGRESQL_STARTUP_USER}@$DEST_NODE_HOST "
    $PGHOME/bin/pg_ctl -l /dev/null -w -D $DEST_NODE_PGDATA promote"

Workaround:
To avoid further errors when user databases are present, I modified the .pgpass file to use * instead of specifying databases manually. This allowed pcp_recovery_node to run smoothly (as recovery_2nd_stage completes successfully), but no replication slot appears when checking with SELECT * FROM pg_stat_replication.

Logs:

The PostgreSQL logs for the pcp_recovery_node process show no errors, indicating that everything ran successfully.

Concerns:

  • It seems recovery_2nd_stage should not execute the promote command, as this causes the target node to assume it is the primary, leading to the disappearance of the replication slot.
  • If this is a characteristic of snapshot isolation mode, I need to understand how data is replicated between nodes without a replication slot.

Request for Clarification:

  • Is there a configuration detail I might be missing?
  • How should data replication be managed in this scenario to maintain consistency across nodes?
@pengbo0328 pengbo0328 self-assigned this Jun 3, 2024
@pengbo0328
Copy link
Collaborator

@possebon Sorry for the late response.

Diagnosis:
I created the pgtestdb to test using pgbench, but this database was not listed in .pgpass. The documentation states that the postgres user only needs to connect to the postgres database (I also added template1 to resolve the original error).

If you use other users in your scripts, you need to add those users in .pgpass.

Request for Clarification:

Is there a configuration detail I might be missing?
How should data replication be managed in this scenario to maintain consistency across nodes?

In snapshot isolation mode, streaming replication is not used.
Pgpool-II will replicate data to each node. (Pgpool-II executes the same queries on each PostgreSQL node.)

https://www.pgpool.net/docs/latest/en/html/runtime-config-running-mode.html

If you want to use streaming replication for data consistency, you can use Pgpool-II's streaming replication mode by setting backend_clustering_mode = 'streaming_replication'

Below is the configuration example:
https://www.pgpool.net/docs/latest/en/html/example-cluster.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants