Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected EOF on client connection with an open transaction #5641

Open
woozhijun opened this issue Nov 8, 2021 · 33 comments
Open

unexpected EOF on client connection with an open transaction #5641

woozhijun opened this issue Nov 8, 2021 · 33 comments

Comments

@woozhijun
Copy link

When we upgrade v8.0.0 to v10.0.0, I have a error log
unexpected EOF on client connection with an open transaction with postgres for exec cmd docker logs -f xxxx.

Not sure if it will affect the service?

@songdeebuzzni
Copy link

songdeebuzzni commented Jan 13, 2022

An error that occurs because the db client (sqlalchemy?) terminated abnormally.

The same occurs even if localhost and (committed internal) IPs are additionally set in "/var/lib/postgresql/data/pg_hba.conf".

It seems to happen right after the connection is closed or rollback failed.

@silvaartur
Copy link

Solved?

@susodapop
Copy link
Contributor

Thanks for the ping. No solution so far. Because I've not been able to reproduce this in version 10.1 locally. Does anyone have steps to reproduce? Are specific requests occurring around which these errors are emitted?

@KeycapCaper
Copy link

I'm not sure about steps to reproduce, but it does seem to be related to the volume of records. Currently experiencing the issue on a couple queries. If i set a smaller date range of records to return, it doesnt hit the error. Once I walked up the date range, it started hitting this issue somewhere after 22,000 records returned. Also confirmed it wasn't a specific bad record by taking multiple sample datasets from various date ranges.

@susodapop
Copy link
Contributor

Is this a regression from V8?

@ribtoks
Copy link

ribtoks commented Nov 12, 2022

Also hitting this in v10

@puttehi
Copy link

puttehi commented Feb 2, 2023

Also seeing this in Redash 10.0.0 (9c928bd) with PostgreSQL 9.6.24

Logs seems to come in 15s or 30s intervals in batches of 1-3. The clear interval making it feel like some keepalive ping or such.

@OnkarVO7
Copy link

Facing the same issue while migrating from v8 to v10. Is there any solution to fix this?

@songdeebuzzni
Copy link

@OnkarVO7 Copy and paste your docker-compose.yml file script.

@george74greece
Copy link

george74greece commented May 25, 2023

Hello, i have the same issue when i am trying to upgrade from 8V to 10V
postgres_1 | LOG: unexpected EOF on client connection with an open transaction

ubuntu@ip-10-233-135-229:/opt/redash$ cat docker-compose.yml
version: "2"
x-redash-service: &redash-service
  image: redash/redash:10.1.0.b50633
  depends_on:
    - postgres
    - redis
  env_file: /opt/redash/env
  restart: always
services:
  server:
    <<: *redash-service
    command: server
    ports:
      - "5000:5000"
    environment:
      REDASH_WEB_WORKERS: 4
  scheduler:
    <<: *redash-service
    command: scheduler
  scheduled_worker:
    <<: *redash-service
    command: worker
  adhoc_worker:
    <<: *redash-service
    command: worker
  redis:
    image: redis:5.0-alpine
    restart: always
  postgres:
    image: postgres:9.6-alpine
    env_file: /opt/redash/env
    volumes:
      - /opt/redash/postgres-data:/var/lib/postgresql/data
    restart: always
  nginx:
    image: redash/nginx:latest
    ports:
      - "80:80"
    depends_on:
      - server
    links:
      - server:redash
    restart: always
  worker:
    <<: *redash-service
    command: worker
    environment:
      QUEUES: "periodic emails default"
      WORKERS_COUNT: 1
ubuntu@ip-10-233-135-229:/opt/redash$

@Samuel29
Copy link

Samuel29 commented Jun 7, 2023

Hi. same issue for me on a fresh install via Helm on a K8s cluster

@JPGallo1510
Copy link

Facing the same issue while migrating from v8 to v10. Is there any solution to fix this?

@OnkarVO7 i have the same issue, did you solve it?

@lpong
Copy link
Contributor

lpong commented Dec 6, 2023

Has this problem been resolved so far? I am also troubled by this problem.

@alavi-sorrek
Copy link

Has this been resolved?

@guidopetri
Copy link
Contributor

Hi folks, unless you give us instructions to reproduce the error, we can't resolve it.

@alavi-sorrek
Copy link

As stupid as this sounds, it's by following the walkthrough video for updating to v10 from v8. Have you or someone on the team tried to follow those instructions recently to see if they are still correct? I tried doing it 3 or 4 times, each time on a fresh AWS EC2 using the pre-baked AMI.

Here's the link: https://redash.io/help/open-source/setup

@alavi-sorrek
Copy link

To troubleshoot I tried several things - force closing any open processes that might be running postgres (which wasn't installed), I tried installing postgres, I made sure there weren't any active connections, I tried setting it up once by only logging in after the upgrade, etc.

@guidopetri
Copy link
Contributor

guidopetri commented Jan 30, 2024

Well; a couple of things.

Have you or someone on the team

Redash is community owned now, there is no team of dedicated developers.

using the pre-baked AMI
updating to v10

Both of these indicate that you're using a pretty old version of redash; we don't really maintain the AMIs anymore, and v10 is from several years ago (let alone v8). My recommendation would be to back up your database and set redash up using either the v10 docker image directly, or the redash:preview image, and wiping the AMI machine.

I don't personally have access to running any AMI things, so I'm unable to reproduce this bug if you're only using the AMIs. Does the same issue happen if you use the docker images?

@ribtoks
Copy link

ribtoks commented Jan 30, 2024

unless you give us instructions to reproduce the error, we can't resolve it.

@guidopetri For me the reproduction is docker-compose up

@guidopetri
Copy link
Contributor

@ribtoks are you starting from scratch? or an existing db?

@ribtoks
Copy link

ribtoks commented Jan 30, 2024

@guidopetri An existing DB. But "from scratch" instruction also requires you to "init" the DB first so it's kind of always from existing DB in such sense.

@guidopetri
Copy link
Contributor

guidopetri commented Feb 3, 2024

I can repro! 🎉

I can also assert that it's definitely related to refreshing schemas. The screenshot below has the scheduled_worker (in my case, responsible for refreshing schemas even on-demand) logs at the top, and postgres logs at the bottom. Whenever I click the "refresh schema" button on the UI, I get a new log hit on the top asserting the schema got refreshed, and on the bottom that there's the EOF error.

From an initial look, it appears the postgres runner doesn't have the get_schema() method defined, and as a result it breaks the connection. I'm not sure how the schema gets refreshed without that method defined, so this needs more investigation, but maybe all we need is to define that method?

image

@guidopetri
Copy link
Contributor

Some more investigation:

  • this is definitely related to refreshing schemas
  • this doesn't seem to really affect the service nor postgres
  • the postgres runner does have a get_schema definition, it's just OOP'd in
  • the connection that gets created is async and gets closed, but no rollback/commit gets created. I tried adding a rollback and it did not work to fix this. I also tried making the connection not-async and it also did not work (even with a rollback call).
  • the EOF log line comes up after RQ saves the result (in redis?), and this doesn't seem to be related to any of the statds tracking we have

I only have access to a postgres data source; can anyone confirm if something similar happens on different data sources?

@ribtoks
Copy link

ribtoks commented Feb 3, 2024

@guidopetri nope, only using Postgres so far. Great progress btw!

@guidopetri
Copy link
Contributor

@ribtoks @alavi-sorrek - to be clear - do you see this message on the postgres db backing redash, or on a postgres data source?

(again, in my case I only have one postgres server, so I can't tell)

@alavi-sorrek
Copy link

alavi-sorrek commented Feb 3, 2024 via email

@guidopetri
Copy link
Contributor

Hmm. What data source are you using then?

@alavi-sorrek
Copy link

alavi-sorrek commented Feb 3, 2024 via email

@wtfiwtz
Copy link

wtfiwtz commented Mar 19, 2024

I found a potentially related issue.
Seems that the version of gunicorn in Redash 10.1.0 is v20.0.4 where they were halfway between patching this Keep Alive behaviour. benoitc/gunicorn#2297

If you have hit the graceful shutdown request count, then the processing loop could be terminated early, closing the connection. I think this situation could also cause this log error if you've opened any database connections for running the query.

I'm about to test an upgrade of gunicorn to (I think) 20.1.0 to see if it fixes it for us in production. We only see this under load, and whilst it has nothing to do with the load balancer itself, our AWS ALB returns a HTTP 502 Bad Gateway because the connection has been dropped (you see "502 -" in the access logs). This happens repeatedly under load for bigger JSON requests with respect to this patch that we applied - #78 (comment).

NOTE: maybe only with async workers, not sync workers (for Keep Alive)
NOTE 2: On second thought, DB access would be on the 'rq' workers, not the HTTP request handlers. Maybe it's unrelated then!

wtfiwtz added a commit to orchestrated-io/redash that referenced this issue Mar 19, 2024
@wtfiwtz
Copy link

wtfiwtz commented Mar 20, 2024

Seems to be stable in production with gunicorn v21.0.1 🎉

wtfiwtz added a commit to orchestrated-io/redash that referenced this issue Mar 20, 2024
…#4)

* Fix 502 Bad Gateway error from gunicorn keep-alive default setting of 2 seconds
   See https://www.ikw.cz/aws-alb-gunicorn-error-502-bad-gateway-fix
* Log gunicorn errors
* Ungrade gunicorn to latest
   See getredash#5641 (comment)
   and benoitc/gunicorn#2297
@mackenzieclark
Copy link

As stupid as this sounds, it's by following the walkthrough video for updating to v10 from v8. Have you or someone on the team tried to follow those instructions recently to see if they are still correct? I tried doing it 3 or 4 times, each time on a fresh AWS EC2 using the pre-baked AMI.

Here's the link: https://redash.io/help/open-source/setup

I was having the same problem and resolved it by using the Bitnami AMI. https://bitnami.com/stack/redash

@mashanz
Copy link

mashanz commented Nov 13, 2024

wdyt if we migrate Gunicorn to Granian for better performance and stability?
https://github.com/emmett-framework/granian

And from poetry to UV for faster dependency resolver?
https://github.com/astral-sh/uv

@mashanz
Copy link

mashanz commented Nov 13, 2024

Seems to be stable in production with gunicorn v21.0.1 🎉

Let's see, I will try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests