-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: jasync client encounters "SimpleQuery not allowed while in extended protocol mode" #33693
Comments
Thank you for your report! @andreimatei I think this is addressed by your ongoing change, can you confirm? |
I don't think I'm addressing this in any way. I think the error says that the client is mixing the simple protocol and the extended protocol in ways we don't expect. |
@andreimatei |
@mjibson do you reckon you could try out their test case during your current pgwire investigations? maybe this meshes well with the overarching "improve pgwire" story arc. |
Will investigate. |
I was able to repro this with the above code. Here's a proxy log from the test:
The interesting line is Our problem appears to be that we mishandle the Close message. Will fix. Thanks for the report.
|
Having looked into this, I don't think this is our mishandling of the Close message, rather we don't quite get the semantics between the extended protocol, sync, and readyForQuery. For example, both us and postgres have a cockroach/pkg/sql/pgwire/conn.go Line 346 in 734fb0e
cockroach/pkg/sql/pgwire/conn.go Line 388 in 734fb0e
Outstanding questions:
One idea to test if our state machine here is correct is to do more goo similar to https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/generate-binary/main.go which stuffs messages to a postgres server and records the responses, then writes a file of bytes that we can test cockroach against. It seems like it could be beneficial to have a proxy between, say, this java client and the real postgres server, record all the message types, and output a test file that cockroach can use to verify its impl. This proxy will make it very fast to debug other problems like this. I already have a proxy program for general TCP connections. I think it's worth it to enhance that by teaching it about pgwire and give it some testdata generation support. |
FWIW, I've been testing Postgres by writing tests in the pgx package,
hijacking the underlying conn from the pgx Conn.
…On Thu, Jan 17, 2019, 2:57 PM Matt Jibson ***@***.*** wrote:
Having looked into this, I don't think this is our mishandling of the
Close message, rather we don't quite get the semantics between the extended
protocol, sync, and readyForQuery. For example, both us and postgres have a doing
extended query message variable. pg sets it to true when an extended
query message comes in (
https://github.com/postgres/postgres/blob/c0d0e540847d24d9dfe374549fb4cfd5ca40d050/src/backend/tcop/postgres.c#L440).
But it is only used in pg to set ignore_till_sync
https://github.com/postgres/postgres/blob/c0d0e540847d24d9dfe374549fb4cfd5ca40d050/src/backend/tcop/postgres.c#L440.
I can't see where in the C the ignore_till_sync variable can even be set
since it appears before the for loop, so I'm not sure what's going on
there. Cockroach uses the doing extended var to just error out of a simple
query (which is what is happening in this issue above)
https://github.com/cockroachdb/cockroach/blob/734fb0e1bf4a0aee8a125a4d2d9dfa9e36d9a029/pkg/sql/pgwire/conn.go#L346.
We also say that extended messages should be skipped (
https://github.com/cockroachdb/cockroach/blob/734fb0e1bf4a0aee8a125a4d2d9dfa9e36d9a029/pkg/sql/pgwire/conn.go#L388)
if an error occurs and wait until the next sync. I don't see this happening
explicitly either.
Outstanding questions:
1. How can the postgres ignore_till_sync var be set? Is it actually in
some loop I don't see? Otherwise doing_extended_query_message has no
discernible purpose. @knz <https://github.com/knz> can you take a look
at this? I've been looking over the C code but still can't see where
ignore_till_sync can even possibly be set in the normal message processing
for loop.
2. What does postgres do when it receives a sync message (which closes
an extended protocol statement or implicit transaction) then a close
message followed by a simple query? Or with a twist: what happens if the
simple query comes before the close, or even before the sync?
3. Postgres says "If we were handling an extended-query-protocol
message, initiate skip till next Sync. This also causes us not to issue
ReadyForQuery (until we get Sync)". Are we doing this identically,
including not issuing ReadForQuery until we get a sync?
One idea to test if our state machine here is correct is to do more goo
similar to
https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/generate-binary/main.go
which stuffs messages to a postgres server and records the responses, then
writes a file of bytes that we can test cockroach against. It seems like it
could be beneficial to have a proxy between, say, this java client and the
real postgres server, record all the message types, and output a test file
that cockroach can use to verify its impl. This proxy will make it very
fast to debug other problems like this. I already have a proxy program for
general TCP connections. I think it's worth it to enhance that by teaching
it about pgwire and give it some testdata generation support.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXBcYSMMneoM8MgBfqHQo8NkJCiOKMlks5vENWTgaJpZM4Z8_Bs>
.
|
Answers to your question:
if (sigsetjmp(local_sigjmp_buf, 1) != 0)
{
...
if (doing_extended_query_message)
ignore_till_sync = true; This block of code is auto-magically executed (jumped back into, ie not called) when a SQL exception is encountered. The block does error processing, then all the code below the
case 'S': /* sync */
pq_getmsgend(&input_message);
finish_xact_command();
send_ready_for_query = true;
break;
|
@mjibson did anything more come from your investigations on this issue? It would be good to close the loop, since we've already put a decent amount of work into this. |
I tried to make a DSL for testing this kind of stuff but wasn't successful. There's definitely still a bug in our pgwire state machine here. I also didn't understand it well enough to know what the exact bug or fix was. Unclear where this should fall on our priority list. But hey if someone wants to learn about pgwire, this is a place to start. |
Just updating the issue to say that it seems to affect PGJDBC as well: #41511. 136 tests in the PGJDBC test suite fail because of this. |
I have also run into this using Elixir/Postgrex in a unit testing context. |
An update to my previous comment: we now have a pgtest DSL that would be able to reproduce this easily. It should be used when addressing this issue. |
Awesome! What release should I be looking for in? |
We are going to backport this, so probably 20.1.4 and 19.2.10 which are scheduled for early-mid August. |
Any chance this could be pushed to cockroach-unstable ? |
Yes, it will be in an upcoming 20.2.0 alpha, which is pushed to cockroach-unstable. Unsure of the date of that though. If you want it before that you will need to build it yourself. |
Hi @mjibson , I just tried the 20.2.0 alpha pushed to docker 13 days ago and the bug is still there. Thanks |
Hi Christian, The commit that interests you was merged on July 9. So the next release should be the good one. |
Ha thanks for the explanation @knz , ended up building my own docker image using the 20.1 branch 😄 |
Describe the problem
async connect and query got error:
To Reproduce
What did you do? Describe in your own words.
singleton.kt(singleton connection)
query kotlin code
same code and work is fine
Expected behavior
Environment:
The text was updated successfully, but these errors were encountered: