Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire: jasync client encounters "SimpleQuery not allowed while in extended protocol mode" #33693

Closed
leenux opened this issue Jan 13, 2019 · 21 comments · Fixed by #51194
Closed
Assignees
Labels
A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community

Comments

@leenux
Copy link

leenux commented Jan 13, 2019

Describe the problem

async connect and query got error:

ErrorMessage(fields=[(Severity, ERROR), (SQLSTATE, 08P01), (File, sql/pgwire/pgwirebase/encoding.go), (Line, 189), (Routine, NewProtocolViolationErrorf), (Message, SimpleQuery not allowed while in extended protocol mode)])

To Reproduce

What did you do? Describe in your own words.

  • start cockroach 3-clusters
  • kotlin async connect and query
    singleton.kt(singleton connection)
object DbOne {
    val Pool: ConnectionPool<PostgreSQLConnection> by lazy {
        PostgreSQLConnectionBuilder.createConnectionPool(
            url = "postgres://cherry@localhost:26257/cherry?user=***&password=***&sslmode=verify-full&sslrootcert=ca.crt&sslcert=client.cherry.crt&sslkey=client.cherry.key"
        )
    }
}

query kotlin code

GlobalScope.async {
        val queryResult = DbOne.Pool.sendPreparedStatement("select 0").await()
}
  • error log
ErrorMessage(fields=[(Severity, ERROR), (SQLSTATE, 08P01), (File, sql/pgwire/pgwirebase/encoding.go), (Line, 189), (Routine, NewProtocolViolationErrorf), (Message, SimpleQuery not allowed while in extended protocol mode)])
  • start postgre 3-cluster
    same code and work is fine

Expected behavior
Environment:

  • CockroachDB version [e.g. 2.1.3]
  • Server OS: [ubuntu 18.04]
  • Client app [com.github.jasync-sql:jasync-postgresql:0.8.64]
  • postgre version 11
@knz
Copy link
Contributor

knz commented Jan 13, 2019

Thank you for your report!

@andreimatei I think this is addressed by your ongoing change, can you confirm?

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community A-sql-pgwire pgwire protocol issues. labels Jan 13, 2019
@andreimatei
Copy link
Contributor

I don't think I'm addressing this in any way. I think the error says that the client is mixing the simple protocol and the extended protocol in ways we don't expect.
@leenux I think we don't have much Kotlin experience. If you could give us a binary that reproduces this error, it'd be of great help.

@leenux
Copy link
Author

leenux commented Jan 14, 2019

@andreimatei
This is a special project for you.
https://github.com/leenux/async-test

@knz
Copy link
Contributor

knz commented Jan 14, 2019

@mjibson do you reckon you could try out their test case during your current pgwire investigations? maybe this meshes well with the overarching "improve pgwire" story arc.

@maddyblue
Copy link
Contributor

Will investigate.

@maddyblue
Copy link
Contributor

I was able to repro this with the above code. Here's a proxy log from the test:

2019/01/16 16:54:23 listening on localhost:26257
2019/01/16 16:54:23 relay to localhost:26258
127.0.0.1:36092 -> 127.0.0.1:26258: "\x00\x00\x00^\x00\x03\x00\x00user\x00cherry\x00database\x00cherry\x00client_encoding\x00UTF-8\x00DateStyle\x00ISO\x00extra_float_digits\x002\x00\x00"
	[0 0 0 94 0 3 0 0 117 115 101 114 0 99 104 101 114 114 121 0 100 97 116 97 98 97 115 101 0 99 104 101 114 114 121 0 99 108 105 101 110 116 95 101 110 99 111 100 105 110 103 0 85 84 70 45 56 0 68 97 116 101 83 116 121 108 101 0 73 83 79 0 101 120 116 114 97 95 102 108 111 97 116 95 100 105 103 105 116 115 0 50 0 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "R\x00\x00\x00\b\x00\x00\x00\x00"
	[82 0 0 0 8 0 0 0 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "S\x00\x00\x00\x19server_version\x009.5.0\x00S\x00\x00\x00\x19server_encoding\x00UTF8\x00S\x00\x00\x00\x1aclient_encoding\x00UTF-8\x00S\x00\x00\x00\x16application_name\x00\x00"
	[83 0 0 0 25 115 101 114 118 101 114 95 118 101 114 115 105 111 110 0 57 46 53 46 48 0 83 0 0 0 25 115 101 114 118 101 114 95 101 110 99 111 100 105 110 103 0 85 84 70 56 0 83 0 0 0 26 99 108 105 101 110 116 95 101 110 99 111 100 105 110 103 0 85 84 70 45 56 0 83 0 0 0 22 97 112 112 108 105 99 97 116 105 111 110 95 110 97 109 101 0 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "S\x00\x00\x00\x12DateStyle\x00ISO\x00S\x00\x00\x00\x1bIntervalStyle\x00postgres\x00S\x00\x00\x00\x11TimeZone\x00UTC\x00S\x00\x00\x00\x19integer_datetimes\x00on\x00"
	[83 0 0 0 18 68 97 116 101 83 116 121 108 101 0 73 83 79 0 83 0 0 0 27 73 110 116 101 114 118 97 108 83 116 121 108 101 0 112 111 115 116 103 114 101 115 0 83 0 0 0 17 84 105 109 101 90 111 110 101 0 85 84 67 0 83 0 0 0 25 105 110 116 101 103 101 114 95 100 97 116 101 116 105 109 101 115 0 111 110 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "S\x00\x00\x00#standard_conforming_strings\x00on\x00"
	[83 0 0 0 35 115 116 97 110 100 97 114 100 95 99 111 110 102 111 114 109 105 110 103 95 115 116 114 105 110 103 115 0 111 110 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "S\x00\x00\x00hcrdb_version\x00CockroachDB OSS v2.1.3 (x86_64-unknown-linux-gnu, built 2018/12/17 19:15:31, go1.10.3)\x00"
	[83 0 0 0 104 99 114 100 98 95 118 101 114 115 105 111 110 0 67 111 99 107 114 111 97 99 104 68 66 32 79 83 83 32 118 50 46 49 46 51 32 40 120 56 54 95 54 52 45 117 110 107 110 111 119 110 45 108 105 110 117 120 45 103 110 117 44 32 98 117 105 108 116 32 50 48 49 56 47 49 50 47 49 55 32 49 57 58 49 53 58 51 49 44 32 103 111 49 46 49 48 46 51 41 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "S\x00\x00\x00!session_authorization\x00cherry\x00"
	[83 0 0 0 33 115 101 115 115 105 111 110 95 97 117 116 104 111 114 105 122 97 116 105 111 110 0 99 104 101 114 114 121 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "S\x00\x00\x00\x15is_superuser\x00off\x00Z\x00\x00\x00\x05I"
	[83 0 0 0 21 105 115 95 115 117 112 101 114 117 115 101 114 0 111 102 102 0 90 0 0 0 5 73]
127.0.0.1:36092 -> 127.0.0.1:26258: "P\x00\x00\x00\x1b1\x00select * from user\x00\x00\x00B\x00\x00\x00\x0e1\x001\x00\x00\x00\x00\x00\x00\x00D\x00\x00\x00\aP1\x00E\x00\x00\x00\n1\x00\x00\x00\x00\x00S\x00\x00\x00\x04C\x00\x00\x00\aP1\x00"
	[80 0 0 0 27 49 0 115 101 108 101 99 116 32 42 32 102 114 111 109 32 117 115 101 114 0 0 0 66 0 0 0 14 49 0 49 0 0 0 0 0 0 0 68 0 0 0 7 80 49 0 69 0 0 0 10 49 0 0 0 0 0 83 0 0 0 4 67 0 0 0 7 80 49 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "1\x00\x00\x00\x042\x00\x00\x00\x04T\x00\x00\x00%\x00\x01current_user\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x19\xff\xff\xff\xff\xff\xff\x00\x00D\x00\x00\x00\x10\x00\x01\x00\x00\x00\x06cherryC\x00\x00\x00\rSELECT 1\x00Z\x00\x00\x00\x05I"
	[49 0 0 0 4 50 0 0 0 4 84 0 0 0 37 0 1 99 117 114 114 101 110 116 95 117 115 101 114 0 0 0 0 0 0 0 0 0 0 25 255 255 255 255 255 255 0 0 68 0 0 0 16 0 1 0 0 0 6 99 104 101 114 114 121 67 0 0 0 13 83 69 76 69 67 84 32 49 0 90 0 0 0 5 73]
127.0.0.1:36092 -> 127.0.0.1:26258: "Q\x00\x00\x00\rSELECT 0\x00"
	[81 0 0 0 13 83 69 76 69 67 84 32 48 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "3\x00\x00\x00\x04E\x00\x00\x00\x90SERROR\x00C08P01\x00Fsql/pgwire/pgwirebase/encoding.go\x00L189\x00RNewProtocolViolationErrorf\x00MSimpleQuery not allowed while in extended protocol mode\x00\x00"
	[51 0 0 0 4 69 0 0 0 144 83 69 82 82 79 82 0 67 48 56 80 48 49 0 70 115 113 108 47 112 103 119 105 114 101 47 112 103 119 105 114 101 98 97 115 101 47 101 110 99 111 100 105 110 103 46 103 111 0 76 49 56 57 0 82 78 101 119 80 114 111 116 111 99 111 108 86 105 111 108 97 116 105 111 110 69 114 114 111 114 102 0 77 83 105 109 112 108 101 81 117 101 114 121 32 110 111 116 32 97 108 108 111 119 101 100 32 119 104 105 108 101 32 105 110 32 101 120 116 101 110 100 101 100 32 112 114 111 116 111 99 111 108 32 109 111 100 101 0 0]
127.0.0.1:26258 -> 127.0.0.1:36092: "Z\x00\x00\x00\x05I"
	[90 0 0 0 5 73]
127.0.0.1:36092 -> 127.0.0.1:26258: "X\x00\x00\x00\x04"
	[88 0 0 0 4]

The interesting line is "P\x00\x00\x00\x1b1\x00select * from user\x00\x00\x00B\x00\x00\x00\x0e1\x001\x00\x00\x00\x00\x00\x00\x00D\x00\x00\x00\aP1\x00E\x00\x00\x00\n1\x00\x00\x00\x00\x00S\x00\x00\x00\x04C\x00\x00\x00\aP1\x00". These are a series of messages sent as a single string to cockroach. They translate to Parse, Bind, Describe, Execute, Sync, Close (see below program for pgx debugging).

Our problem appears to be that we mishandle the Close message. Will fix. Thanks for the report.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"strings"

	"github.com/jackc/pgx/pgproto3"
)

func main() {
	const s = "P\x00\x00\x00\x1b1\x00select * from user\x00\x00\x00B\x00\x00\x00\x0e1\x001\x00\x00\x00\x00\x00\x00\x00D\x00\x00\x00\aP1\x00E\x00\x00\x00\n1\x00\x00\x00\x00\x00S\x00\x00\x00\x04C\x00\x00\x00\aP1\x00"
	b, err := pgproto3.NewBackend(strings.NewReader(s), ioutil.Discard)
	if err != nil {
		log.Fatal(err)
	}
	for {
		msg, err := b.Receive()
		if err != nil {
			fmt.Println(err)
			return
		}
		fmt.Printf("%T: %v\n", msg, msg)
	}
}

@maddyblue
Copy link
Contributor

Having looked into this, I don't think this is our mishandling of the Close message, rather we don't quite get the semantics between the extended protocol, sync, and readyForQuery. For example, both us and postgres have a doing extended query message variable. pg sets it to true when an extended query message comes in (https://github.com/postgres/postgres/blob/c0d0e540847d24d9dfe374549fb4cfd5ca40d050/src/backend/tcop/postgres.c#L440). But it is only used in pg to set ignore_till_sync https://github.com/postgres/postgres/blob/c0d0e540847d24d9dfe374549fb4cfd5ca40d050/src/backend/tcop/postgres.c#L440. I can't see where in the C the ignore_till_sync variable can even be set since it appears before the for loop, so I'm not sure what's going on there. Cockroach uses the doing extended var to just error out of a simple query (which is what is happening in this issue above)

if doingExtendedQueryMessage {
. We also say that extended messages should be skipped (
// We're starting a batch here. If the client continues using the extended
) if an error occurs and wait until the next sync. I don't see this happening explicitly either.

Outstanding questions:

  1. How can the postgres ignore_till_sync var be set? Is it actually in some loop I don't see? Otherwise doing_extended_query_message has no discernible purpose. @knz can you take a look at this? I've been looking over the C code but still can't see where ignore_till_sync can even possibly be set in the normal message processing for loop.
  2. What does postgres do when it receives a sync message (which closes an extended protocol statement or implicit transaction) then a close message followed by a simple query? Or with a twist: what happens if the simple query comes before the close, or even before the sync?
  3. Postgres says "If we were handling an extended-query-protocol message, initiate skip till next Sync. This also causes us not to issue ReadyForQuery (until we get Sync)". Are we doing this identically, including not issuing ReadForQuery until we get a sync?

One idea to test if our state machine here is correct is to do more goo similar to https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/generate-binary/main.go which stuffs messages to a postgres server and records the responses, then writes a file of bytes that we can test cockroach against. It seems like it could be beneficial to have a proxy between, say, this java client and the real postgres server, record all the message types, and output a test file that cockroach can use to verify its impl. This proxy will make it very fast to debug other problems like this. I already have a proxy program for general TCP connections. I think it's worth it to enhance that by teaching it about pgwire and give it some testdata generation support.

@andreimatei
Copy link
Contributor

andreimatei commented Jan 19, 2019 via email

@knz
Copy link
Contributor

knz commented Jan 21, 2019

Answers to your question:

  1. ignore_till_sync I see the following:
    if (sigsetjmp(local_sigjmp_buf, 1) != 0)
    {
     ...
        if (doing_extended_query_message)
            ignore_till_sync = true;

This block of code is auto-magically executed (jumped back into, ie not called) when a SQL exception is encountered. The block does error processing, then all the code below the if (){} gets executed again (the client session becomes ready again to accept a new stmt / txn).
This is possible because C allows inter-procedural "goto"s: when there is a SQL error somewhere deep down which cannot be recovered, the control is transferred to that location and the client session simply aborts what it's currently doing.
(It's a bit more complicated than that -- there are multiple stages of recovery -- but in any case the control is transferred there ultimately on the error path)

  1. What does postgres do when it receives a sync message
            case 'S':           /* sync */
                pq_getmsgend(&input_message);
                finish_xact_command();
                send_ready_for_query = true;
                break;
  • finish_xact_command cancels the current statement, then commits the current txn if any
  • send_ready_for_query ensures that the next iteration of the loop sends a "ready" message to the client
  1. I don't understand your question

@jordanlewis
Copy link
Member

@mjibson did anything more come from your investigations on this issue? It would be good to close the loop, since we've already put a decent amount of work into this.

@maddyblue
Copy link
Contributor

I tried to make a DSL for testing this kind of stuff but wasn't successful. There's definitely still a bug in our pgwire state machine here. I also didn't understand it well enough to know what the exact bug or fix was. Unclear where this should fall on our priority list. But hey if someone wants to learn about pgwire, this is a place to start.

@rafiss
Copy link
Collaborator

rafiss commented Nov 18, 2019

Just updating the issue to say that it seems to affect PGJDBC as well: #41511. 136 tests in the PGJDBC test suite fail because of this.

@perezd
Copy link

perezd commented Jul 1, 2020

I have also run into this using Elixir/Postgrex in a unit testing context.

@maddyblue
Copy link
Contributor

An update to my previous comment: we now have a pgtest DSL that would be able to reproduce this easily. It should be used when addressing this issue.

@perezd
Copy link

perezd commented Jul 9, 2020

Awesome! What release should I be looking for in?

@maddyblue
Copy link
Contributor

We are going to backport this, so probably 20.1.4 and 19.2.10 which are scheduled for early-mid August.

@tlvenn
Copy link
Contributor

tlvenn commented Jul 10, 2020

Any chance this could be pushed to cockroach-unstable ?

@maddyblue
Copy link
Contributor

Yes, it will be in an upcoming 20.2.0 alpha, which is pushed to cockroach-unstable. Unsure of the date of that though. If you want it before that you will need to build it yourself.

@tlvenn
Copy link
Contributor

tlvenn commented Aug 10, 2020

Hi @mjibson , I just tried the 20.2.0 alpha pushed to docker 13 days ago and the bug is still there.
Any idea why the fix was not part of that alpha ?

Thanks

@knz
Copy link
Contributor

knz commented Aug 10, 2020

Hi Christian,
the last alpha 20.2.0-alpha.2 corresponds to the source code as of July 7. It took a long while to validate the release after that, which is why it only showed up at the end of the month.

The commit that interests you was merged on July 9. So the next release should be the good one.

@tlvenn
Copy link
Contributor

tlvenn commented Aug 10, 2020

Ha thanks for the explanation @knz , ended up building my own docker image using the 20.1 branch 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants