Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to write large objects #1865

Closed
mitar opened this issue Jan 4, 2024 · 5 comments · Fixed by #1875
Closed

Unable to write large objects #1865

mitar opened this issue Jan 4, 2024 · 5 comments · Fixed by #1875
Labels

Comments

@mitar
Copy link
Contributor

mitar commented Jan 4, 2024

Describe the bug

I am unable to write 3 GB large object into the database.

To Reproduce

See this repository. Run:

$ docker run --name pgsql --rm -d --network=host -e LOG_TO_STDOUT=1 -e PGSQL_ROLE_1_USERNAME=test -e PGSQL_ROLE_1_PASSWORD=test -e PGSQL_DB_1_NAME=test -e PGSQL_DB_1_OWNER=test registry.gitlab.com/tozd/docker/postgresql:16
$ go run ./...

Expected behavior

But large objects are correctly stored.

Actual behavior

The first (small) value is stored. The second (3 GB large value) is not. I get the error:

2024/01/04 21:42:04 write failed: write tcp 127.0.0.1:45636->127.0.0.1:5432: write: connection reset by peer

In PostgreSQL logs I can also see:

{"database":"test","level":"LOG","msg":"invalid message length","pid":44,"time":"2024-01-04T20:42:04.533Z","user":"test","service":"postgresql","stage":"run","logged":"2024-01-04T20:42:04.533Z"}

Version

  • Go: go version go1.21.3 linux/amd64
  • PostgreSQL: PostgreSQL 16.1 (Ubuntu 16.1-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
  • pgx: v5.5.1

Additional context

Interestingly, there is another error:

{"database":"[unknown]","level":"LOG","msg":"PID 44 in cancel request did not match any process","pid":45,"time":"2024-01-04T20:42:04.537Z","user":"[unknown]","service":"postgresql","stage":"run","logged":"2024-01-04T20:42:04.537Z"}

This happens because code calls e.dbpool.Close() always, even on error. And I am not sure why pgx tries to close a connection which does not exist anymore, it seems? At least this is how I am reading this log message.

@mitar mitar added the bug label Jan 4, 2024
@jackc
Copy link
Owner

jackc commented Jan 13, 2024

The current large objects Write method sends the entire buffer in a single call to lowrite. There appears to be a PostgreSQL limit that one message cannot be greater than 1GB. I was able to determine that the largest message that will work is 1024*1024*1024-90.

We can either document the limitation or internally make multiple requests.

As far as the odd errors regarding closing the connection, I believe that PostgreSQL does a hard shutdown of the connection because it got what it considers an invalid message. So probably pgx doesn't know the connection is closed.

@jackc
Copy link
Owner

jackc commented Jan 13, 2024

I've added documentation to the Write and Read methods. I investigated the PostgreSQL source code and found there is a hard limit of ~1GB on the size of messages send to or from the PostgreSQL server.

@mitar
Copy link
Contributor Author

mitar commented Jan 13, 2024

I do not think we can really document this. Write and Read methods have a fixed interface dictated by the Golang standard interface. I think we should do multiple requests. Otherwise it is really tricky because there is no way to control for this as a caller of this interface. Maybe somebody else using some other library stored an object larger than 1 GB into the database and I am reading it with this library.

I can do a PR to split reading and writing here if you would be OK with that?

As far as the odd errors regarding closing the connection, I believe that PostgreSQL does a hard shutdown of the connection because it got what it considers an invalid message. So probably pgx doesn't know the connection is closed.

I think it is a bit strange that messages are:

{"database":"test","level":"LOG","msg":"invalid message length","pid":44,"time":"2024-01-04T20:42:04.533Z","user":"test","service":"postgresql","stage":"run","logged":"2024-01-04T20:42:04.533Z"}
{"database":"[unknown]","level":"LOG","msg":"PID 44 in cancel request did not match any process","pid":45,"time":"2024-01-04T20:42:04.537Z","user":"[unknown]","service":"postgresql","stage":"run","logged":"2024-01-04T20:42:04.537Z"}

So pid 44 is connection which fails, but then it seems pgx sends the cancel over the connection with pid 45? Is this valid?

@jackc
Copy link
Owner

jackc commented Jan 13, 2024

I can do a PR to split reading and writing here if you would be OK with that?

Sure. I suggest making the max message size configurable from the tests though (maybe with a test that is in the pgx instead of pgx_test package). This would make it easier to test all the boundary conditions, make the tests faster, and avoid writing multiple GB to the storage device every time the tests are run.

So pid 44 is connection which fails, but then it seems pgx sends the cancel over the connection with pid 45? Is this valid?

Yes. Canceling requests requires creating a new connection just to trigger the cancellation. See https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-CANCELING-REQUESTS.

mitar added a commit to mitar/pgx that referenced this issue Jan 14, 2024
@mitar
Copy link
Contributor Author

mitar commented Jan 14, 2024

I made #1875.

mitar added a commit to mitar/pgx that referenced this issue Jan 14, 2024
mitar added a commit to mitar/pgx that referenced this issue Jan 14, 2024
jackc pushed a commit that referenced this issue Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants