Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Postgres queries stuck in status:active freezing our apps that use Lucia #1710

Open
MikeSpock opened this issue Oct 16, 2024 · 7 comments
Labels
bug Something ain't right...

Comments

@MikeSpock
Copy link

MikeSpock commented Oct 16, 2024

Package

​@lucia-auth/adapter-postgresql

Describe the bug

We have a worrying issue on production.

We are using Lucia with a Postgres.js adapter. Queries from pg_stat_activity start to hang on status "active". They can be in that status for hours, and don't go away until the server is restarted / apps redeployed. The queries are the simplest they can be, SELECT by an indexed session id column [session validation]). They pile up quickly, and the app's pool thread limit is quickly reached (it is 10 by default), freezing the app.

  • DB: AWS RDS Postgres
  • Apps: SvelteKit apps
  • Postgres adapter: PostgresJS

The issue:

  • pg_stat_activity shows that queries pile up in status: active
  • often with the same query_id
  • the very same SELECT is executed under 10ms on the server (it is not a long-running query: "fetching: 7 ms")

Here is an example of stuck queries from the pg_stat_activity. As you may see from this example, the query_id is the same, and on is added 9 minutes after the first query.
I usually see queries duplicating like that in the pg_stat_activity query.

[
  {
    "datid": 16461,
    "datname": "auth-db",
    "pid": 21110,
    "leader_pid": null,
    "usesysid": 16460,
    "usename": "auth-db-user",
    "application_name": "postgres.js",
    "client_addr": "x.x.x.132",
    "client_hostname": null,
    "client_port": 37434,
    "backend_start": "2024-10-16 17:39:13.649736 +00:00",
    "xact_start": "2024-10-16 18:03:22.498475 +00:00",
    "query_start": "2024-10-16 18:03:22.498475 +00:00",
    "state_change": "2024-10-16 18:03:22.498478 +00:00",
    "wait_event_type": "Client",
    "wait_event": "ClientRead",
    "state": "active",
    "backend_xid": null,
    "backend_xmin": null,
    "query_id": -6194834080809489076,
    "query": "SELECT * FROM \"user_session\" WHERE id = $1",
    "backend_type": "client backend"
  },
  {
    "datid": 16461,
    "datname": "auth-db",
    "pid": 20583,
    "leader_pid": null,
    "usesysid": 16460,
    "usename": "auth-db-user",
    "application_name": "postgres.js",
    "client_addr": "x.x.x.132",
    "client_hostname": null,
    "client_port": 46586,
    "backend_start": "2024-10-16 17:30:48.670474 +00:00",
    "xact_start": "2024-10-16 18:03:28.508803 +00:00",
    "query_start": "2024-10-16 18:03:28.508803 +00:00",
    "state_change": "2024-10-16 18:03:28.508805 +00:00",
    "wait_event_type": "Client",
    "wait_event": "ClientRead",
    "state": "active",
    "backend_xid": null,
    "backend_xmin": null,
    "query_id": -6194834080809489076,
    "query": "SELECT * FROM \"user_session\" WHERE id = $1",
    "backend_type": "client backend"
  }
]
@MikeSpock MikeSpock added the bug Something ain't right... label Oct 16, 2024
@pilcrowonpaper
Copy link
Member

Are you sure this is a Lucia issue? You may need to close your database connections if you're using a lambda etc
https://github.com/porsager/postgres?tab=readme-ov-file#connection-timeout

@MikeSpock
Copy link
Author

MikeSpock commented Oct 17, 2024

I'm not 100% sure, but we are not using the DB for any other stuff, just Lucia, which means that the only queries that the DB receives are Lucia queries.
Also, we are running the apps in ECS containers, not Lambda.
So I thought if there is any special connection handling necessary for not having stuck queries in the DB, it probably ​should be implemented in @lucia-auth/adapter-postgresql.

@pilcrowonpaper
Copy link
Member

Are you by chance passing an undefined session ID? It's possible that Postgres is waiting for the value of $1

@pilcrowonpaper
Copy link
Member

Can you try querying the database manually and see if you can recreate the issue? I kinda doubt this is an issue with Lucia and adapters aren't expected to manage connections

@zsp03
Copy link

zsp03 commented Oct 21, 2024

I'm not 100% sure, but we are not using the DB for any other stuff, just Lucia, which means that the only queries that the DB receives are Lucia queries. Also, we are running the apps in ECS containers, not Lambda. So I thought if there is any special connection handling necessary for not having stuck queries in the DB, it probably ​should be implemented in @lucia-auth/adapter-postgresql.

Use a runtime validator such as Zod if your app keeps having connection issue which resulted in some variable not get passed since Typescript only handles error at compiles. On large projects this could happen often.

Lucia pg adapter execute whatever is passed from the function as string into an unsafe query which could be anything.

https://github.com/lucia-auth/lucia/blob/fc016ca8deb62b1925298ac2625254afa5ae1531/packages/adapter-postgresql/src/drivers/postgresjs.ts#L28C2-L30C3

@MikeSpock
Copy link
Author

Thank you @pilcrowonpaper and @zsp03 for your inputs. I wrote stress tests that try to make these stuck queries happen on my local env, with no success.
These queries are stuck in active status, which if I'm not mistaken happen when they received the prepared sql statement, and are waiting for the BIND command so that $1 parameter would get defined.

Sidenote:
I also noticed in our error tracking system some aborted requests from the browsers to our API endpoints which use Lucia for session validation.
My hunch is that when an aborted request is exactly in this unfortunate state, and the postgre thread gets killed, the query stays in hanging status. This should be handled at some level, arguably the PostgreJS or the adapter or Lucia/app level itself. The earlier ones make more sense.

I did try to reproduce aborted queries on my local machine, but no luck. I guess the roundtrip on the docker setup is much much lower than in prod (usually under 10ms), which makes it impossible to randomly abort the query just at the right time.

@zsp03
Copy link

zsp03 commented Oct 21, 2024

These queries are stuck in active status, which if I'm not mistaken happen when they received the prepared sql statement, and are waiting for the BIND command so that $1 parameter would get defined.

I don't think $1 would ever get defined since from Lucia perspective, the queries had already been sent. It's not undefined either since postgres.js won't allow it. According to their repo

Postgres.js won't accept undefined as values in tagged template queries since it becomes ambiguous what to do with the value. If you want to set something to null, use null explicitly.

Notice in the pg adapter base.ts, the $1 is written as string with args userId instead of variable value ${userId}. This is so that it's written as is since getAll is using sql.unsafe

public async getUserSessions(userId: UserId): Promise<DatabaseSession[]> {
		const result = await this.controller.getAll<SessionSchema>(
			`SELECT * FROM ${this.escapedSessionTableName} WHERE user_id = $1`,
			[userId]
		);
		return result.map((val) => {
			return transformIntoDatabaseSession(val);
		});
	}

Based on your log, the query sent is SELECT * FROM \"user_session\" WHERE id = $1
The userId never made it but the escaped string this.escapedSessionTableName does since it's translated directly when reaching getAll function. The userId args would later get transform/substituted inside the $1. I can't quite figure out where but it happened when it reaches postgres.js part of execution. This is probably where the aborted request happened.

This should be handled at some level, arguably the PostgreJS or the adapter or Lucia/app level itself. The earlier ones make more sense.

Lucia is moving away from adapters, i don't think this is possible in Lucia side. It would require you to make custom adapter.

What you can do is add some runtime validation to check the session object and use values on the queries instead of args. This would make sure everything is correct before passing it into the queries and the db driver won't need to do extra. You can use the current postgres adapter as a reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something ain't right...
Projects
None yet
Development

No branches or pull requests

3 participants