Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKIP LOCKED #65

Open
ukd1 opened this issue Jul 29, 2020 · 12 comments
Open

SKIP LOCKED #65

ukd1 opened this issue Jul 29, 2020 · 12 comments
Labels
question Further information is requested

Comments

@ukd1
Copy link

ukd1 commented Jul 29, 2020

Hey - love the project, was checking out how you were doing things as I maintain https://github.com/queueclassic/Queue_Classic. Just wondering what issues you found / if you benchmarked advisory locks over SKIP LOCKED? Are you using advisory locks as the support goes a little further back in Postgres? Have they gotten faster?

FYI, we found great improvements in speed with SKIP LOCKED over previous ways. If advisory locks are faster again, I'll be taking a look. Benchmarks here - QueueClassic/queue_classic#303 (comment).

@bensheldon
Copy link
Owner

Thank you @ukd1. It means a lot to me for you to open this issue!

I haven’t benchmarked Advisory Locks over SKIP LOCKED; I honestly wasn’t even aware of that Postgres functionality until I released GoodJob 😓

Let me write out what GoodJob currently does and why:

  • This is the advisory lock implementation I’ve used on multiple projects and has gone through a couple of engineers‘ hands so I feel somewhat confident in it (but I’m always learning):
    SELECT "good_jobs".*
    FROM "good_jobs"
    WHERE "good_jobs"."id" IN (
    WITH "rows" AS (
    SELECT "good_jobs"."id"
    FROM "good_jobs"
    WHERE "good_jobs"."priority" = 99
    ORDER BY "good_jobs"."priority" DESC
    )
    SELECT "rows"."id"
    FROM "rows"
    WHERE pg_try_advisory_lock(('x'||substr(md5('good_jobs' || "rows"."id"::text), 1, 16))::bit(64)::bigint)
    LIMIT 2
    )
    ORDER BY "good_jobs"."priority" DESC
    SQL
  • Probably not necessary for a job queue, but I have found useful that Advisory Locks don’t prevent other connections from updating the Advisory Locked record. Which is an element of the appeal for Advisory Locks is my familiarity and the benefit of the implementation being extractable/reusable, though that’s a little selfish on my part to reduce my personal workload.
  • Being able to find a lockable row and lock it in a single query is important (seems possible with SKIP LOCKED too)
  • Being able to inspect the lock status via pg_locks has been useful both for debugging
  • Session-level Advisory Locks are nice because they don’t involve holding open a transaction.

Things I’m not sure about directly comparing Advisory Locks with SKIP LOCKED:

  • The underlying job query itself isn’t very performant or cleanly indexable because of time series (scheduled_at < now() ORDER BY priority DESC), regardless of the locking strategy. But that can be benchmarked
  • Advisory Locks don’t impact database migrations because they aren’t true row locks. I’m not sure if using a SELECT ... FOR UPDATE would make orchestrating deploys/migrations a requirement which could limit the general usability of GoodJob
  • Unless I misunderstand it, SKIP LOCKED would require holding open a database transaction for the life of the job. This seems desirable for Document issues with PgBouncer and session-level Advisory Locks #52, but my own world is filled with 15 minute selenium scripts and long-running transactions is something I would like to avoid, though obviously involves trade-offs.

Also, in regards to benchmarking, I would generally say that “database performance” is more important to me to mean “don’t thrash the database” than “raw speed”. Those things are very linked, but figured it could be useful to write out that train of thought for posterity too 😄

...all of that being written out, I should set up a benchmark rig for GoodJob. There are a number of questions that would help answer. Thank you 🙏

@bensheldon bensheldon added the question Further information is requested label Aug 11, 2020
@ukd1
Copy link
Author

ukd1 commented Sep 6, 2020

Ya! There is a "standard" benchmark the Que author wrote (https://github.com/chanks/queue-shootout) - if you wanted to join in on that one I'm sure he'd accept a PR.

Interesting re your use case of the 15 minute scripts - you're right SKIP LOCKED requires a transaction. Session advisory locks still need the connection though; is the transaction on top of that causing issues? FYI SKIP LOCK does do the single row in one query too, which is nice - and you can use it in an update (e.g. to set it to being processed in some way). If you wanted to close the transaction, you could skip-lock update it to the process id, then close the transaction + connection if you wanted, do the 15 minute work, reconnect and update it as done. Also would need some kinda retry / clear logic that you probably get "free" by holding open the session today...

Advisory Locks don’t prevent other connections from updating the Advisory Locked record

ya, kinda nice like that - one is totally free to manage however one wants!

Anyway, was just super curious as to your project / design!

@bensheldon
Copy link
Owner

To close this out, I feel confident that GoodJob will stick with Advisory Locks instead of transition to PG row-level locks, either:

  • keeping a FOR UPDATE lock for the entire duration of the job, and thus having to wrap the entire execution of the job in a transaction (long-lived transactions bad)
  • using a short-lived FOR UPDATE lock transaction and using alocked_at/locked_by column to preserve lock state. Performance-wise this is probably optimal, but would necessitate either a process heartbeat or job-timeout mechanism to avoid abandoned jobs if a process exits without cleaning up (complexity).

I might revisit this in the far future (~v3) and will be monitoring this Issue if folks have additional thoughts or even better, implementation support.

@ukd1
Copy link
Author

ukd1 commented Sep 11, 2021

@bensheldon makes sense, and I get the reasoning!

@bensheldon bensheldon reopened this Sep 15, 2023
@ukd1
Copy link
Author

ukd1 commented Sep 15, 2023

🥳 exited!

@bensheldon
Copy link
Owner

@ukd1 I wanted to reopen this issue because I'm close to switching over from advisory locks to row-level locks. I had a question about QueueClassic's lock strategy. I saw that your query looks like this:

WITH selected_job AS (
  SELECT id
  FROM queue_classic_jobs
  WHERE
    locked_at IS NULL AND
    q_name = $1 AND
    scheduled_at <= now()
  LIMIT 1
  FOR NO KEY UPDATE SKIP LOCKED
)
UPDATE queue_classic_jobs
SET
  locked_at = now(),
  locked_by = pg_backend_pid()
FROM selected_job
WHERE queue_classic_jobs.id = selected_job.id
RETURNING *

...and I found the PR where it moves the inline SELECT out into a CTE. That makes sense!

Does that CTE need to be MATERIALIZED on newer versions of Postgres that don't materialize by default?

@ukd1
Copy link
Author

ukd1 commented Sep 16, 2023

So, I've not done any testing since it came out mostly as I didn't notice any regression now the default (as I understand it) is that the optimizer chooses rather than pre Postgres 12 all CTEs were MATERIALIZED. i.e. I think / it seems mostly the optimizer enables it "fine" that I didn't see a regression...

But as far as I understand, it's only materializing the results of that query per run (i.e. not like a view; it's point is if you're doing multiple other queries off that one WITH... I think?), and seeing as my (+ I assume yours) query only uses it once (and basically CTE for me is just for clarity) I don't see it making much of a pro being on - maybe actually being off would be faster?

However, I'd be interested in some testing...

For me to put in QC, I'd have to drop support for 9.6 (lol, I should), or support checking the version / some kinda flag. So, not super hard!

@ukd1
Copy link
Author

ukd1 commented Sep 16, 2023

BTW - would be curious on the move from advisory locks --> skip locked? What changed?

@bensheldon
Copy link
Owner

BTW - would be curious on the move from advisory locks --> skip locked? What changed?

Here's some of my thoughts on what's changed: #831

For materializing, I did see a regression in Postgres 12+. The issue seemed to be that not explicitly MATERIALIZing the CTE was leading to Postgres inlining the CTE (as if it was a nested SELECT), and then (this is the bad part) that allows the query planner to lift conditions out of the CTE into the parent query (and thus the LIMIT may not properly applied). It's explained sorta here:

https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-CTE-MATERIALIZATION

I think Postgres's intent of non-MATERIALIZEd CTEs is that one can write prettier SQL queries with CTEs but have them perform like nested SELECTs. So if the query is improper as a nested select, it's likely improper as a non-materialized CTE.

@bensheldon
Copy link
Owner

Also, I'm digging into this because I'm expecting GoodJob's row-locking query to look like what you're using in Queue Classic 😊

@ukd1
Copy link
Author

ukd1 commented Sep 17, 2023

Nice. Feel free to use as much as you wish if it's helpful, and lmk if I can help out.

@ukd1
Copy link
Author

ukd1 commented Sep 17, 2023

BTW - would be curious on the move from advisory locks --> skip locked? What changed?

Here's some of my thoughts on what's changed: #831

For materializing, I did see a regression in Postgres 12+. The issue seemed to be that not explicitly MATERIALIZing the CTE was leading to Postgres inlining the CTE (as if it was a nested SELECT), and then (this is the bad part) that allows the query planner to lift conditions out of the CTE into the parent query (and thus the LIMIT may not properly applied). It's explained sorta here:

https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-CTE-MATERIALIZATION

I think Postgres's intent of non-MATERIALIZEd CTEs is that one can write prettier SQL queries with CTEs but have them perform like nested SELECTs. So if the query is improper as a nested select, it's likely improper as a non-materialized CTE.

Ok, then I need to look in to it more carefully. I'll do some benchmarking for QC with it like it is now, and on + off too. Thanks for the pointer, I'd basically ignored it as it seemed ok...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: In progress
Status: No status
Development

No branches or pull requests

2 participants