-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multiple hosts in the connection string, where only one host h… #476
Conversation
…as writable session. Will skip read only hosts
Thank you for working on this issue! |
lib/pg/connection.rb
Outdated
@@ -764,6 +764,10 @@ def new(*args) | |||
# Seems to be no authentication error -> try next host | |||
errors << err | |||
return nil | |||
elsif err.to_s =~ /.*session is read-only.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is language dependant. If the server is configured for a different language this condition will always fail.
Not sure how to solve is better especially that I'm on vacation currently, but I would read the libpq sources and the wireshark output to get an idea. Will do this when I'm back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked deeper at the code, and the libpq code as well. I still can't find any flag or parameter at the connection that will tell us this is the case for the failure.
I do have a different idea, and would love your view on it: libpq knows how to handle this situation internally. It means that if we pass the whole URI to libpq connect, it will parse and handle it itself. Is this something you would consider as a solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed new code that does what I proposed before (send the URI based strings directly to libpq) to allow you to see how it can work. I tested and this works fine, as libpq knows how to handle multiple hosts with read-write requirements.
Would love to get your feedback on that, and extend it as a general way to connect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change to remove warnings from the tests.
Please note that many of the tests fail now, as I first want to make sure this change is accepted in principle, as it is a major change to how we call libpq. If it is, I will work on cleaning up the tests, and add deeper tests for the reda-only case in a cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for handling multiple hosts one after another is the connect_timeout
parameter. That parameter is not handled by libpq, when doing non-blocking connection, which is necessary for Fiber.scheduler
compatibility. Also the address resolution is done in this step, which would be a blocking operation otherwise. So #459 fixed these two issues in pg-1.4.0.
I don't think it makes sense to revert to the behaviour of pg-1.3.x, that doesn't respect connect_timeout
or losing Fiber.scheduler
compatibility as in pg-1.2.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larskanis this latest version restored connection.rb to what it was, and now just has the additional tests for a read only and read-write sessions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guynaor I also created a test case that checks the target_session_attrs
parameter properly some days ago, but didn't have the time to push it. Will try to work on this today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried again on my actual DB cluster, and it fails when the read-only server is the first on the list. Here is the output:
irb(main):003:0> c = PG::Connection.new "postgres://[email protected]:5432,172.63.80.8:5432/liba_app?target_session_attrs=read-write&sslmode=require"
Traceback (most recent call last):
14: from /home/bc-runner/.rbenv/versions/2.6.10/bin/irb:23:in `<main>'
13: from /home/bc-runner/.rbenv/versions/2.6.10/bin/irb:23:in `load'
12: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
11: from (irb):3
10: from (irb):3:in `rescue in irb_binding'
9: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:672:in `new'
8: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:721:in `connect_to_hosts'
7: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:721:in `each_with_index'
6: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:721:in `each'
5: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:735:in `block in connect_to_hosts'
4: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:735:in `each'
3: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:737:in `block (2 levels) in connect_to_hosts'
2: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:761:in `connect_internal'
1: from /home/bc-runner/.rbenv/versions/2.6.10/lib/ruby/gems/2.6.0/gems/pg-1.4.3/lib/pg/connection.rb:609:in `async_connect_or_reset'
PG::ConnectionBad (connection to server at "172.63.107.80", port 5432 failed: session is read-only)
As you can see, it fails as the first IP is the replica IP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larskanis I think I know where the problem is coming from on my real cluster, compared to the test DB. The real cluster machine will not respond with a connection OK, and then let us query it for read-write. It will fail to even create the connection. So the fix might be to actually do the test manually as I did in one of my revisions. But I'd like to hear your opinion on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larskanis after digging into the connection code, libpq source, and protocol documentation, I finally understood the root cause of the issue. When a connection is established, the server sends a set of params to the client, which can be retrieved with param_status(). Two params are relevant for the read-only mode: in_hot_standby and default_transaction_read_only. In both cases, the connection is considered read-only and will fail to establish. At that stage, we can't check the param_status, as the connection was issued the finish call. In order to work around this, I added an instance variable which I initialize before we finish the connection. Similar to how @last_status is handled.
Let me know if this solution is in the right direction.
…hat we were called with. Undo previous changes
…d-only DB for testing, and check if we correctly connect to the host based on the target_session_params
…he next host in case it is. Handles a pg cluster with read replicas correctly
Hi @guynaor! After thinking a bit more about this issue, I propose to solve it per #485. That means to rely on the connection startup handling of libpq instead of reinventing more of it's logic in ruby. The |
Hi @larskanis, Thanks for looking at it thoroughly! I agree that the solution of letting pqlib handle it is best. I think this will solve the issues I had, and will let the Ruby gem, follow changes in libpq automatically. THanks again, Guy. |
…as writable session. Will skip read-only hosts
This small change adds a test to see if the failure of the connection is because the host is read-only, and the session has a read-write requirement.
With this change a connection string of the format: postgres://192.168.1.20:5432,192.168.2.30:5432/liba_app?target_session_attrs=read-write&sslmode=require will work correctly, even if the first host (192.168.1.20) is the read only host in a primary/secondary style cluster