-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run-once guarantee? #151
Comments
@jrochkind thanks for the additional context on this 🙏 It was discussed in #59 which might be a good place to continue the conversation. |
ok thanks. I could copy this discussion over to that closed issue, would you like me to do that? I remain suspicious of the "guarantee that a completely-performed job will run once and only once", I'm not sure I really believe the "only once" can be guaranteed. I think I have an example where it might run more than once, as above. But am I mistaken? In general, you are confident that good_job really can guarantee "completely-performed only once", never more than once, regardless of what errors may happen? |
You're welcome to copy it over. I think the phrase is open to interpretation. I originally had a strict statement close to what you're expecting ("at least once"), but that also caused confusion. It looks like that during a recent documentation overhaul there was an additional statement that was dropped: "Writing reliable, transactional, and idempotent |
So honestly I'm not sure the "completely-performed" is clear what it means, or if the distinction is relevant once explained. I think it's probably really "at least once", not "exactly once". Sidekiq says:
In the context of an example, that I'm not sure is analagous here or not:
https://github.com/mperham/sidekiq/wiki/FAQ Maybe (?) something more like: "GoodJob will normally run each enqueued job once. GoodJob guarantees that regardless of error conditions all jobs will be run at least once, they won't be dropped. As with most other job queue systems, In unusual circumstances a job can be run more than once so it is important jobs are written to be idempotent." |
I was interested in the discussion at #65, which I think got me understanding the intent of the advisory lock use here. Regarding @ukd1's question about transactions vs sessions... I'm not sure, but I could believe holding open a transaction for 15 minutes has performance implications that simply holding open a session does not. I believe long-term open connections/sessions are an expected usage pattern for postgres (or any db), but long-term transactions being open are not.
However, thinking through this led me to wonder if I had an example backing up my questioning of the README statement:
In general, this is a hard thing to guarantee in concurrent systems. Most systems don't try to guarantee "exactly once", but something more like "always run at LEAST once, USUALLY run only once, occasionally in some edge cases it might run more than once." Which is one reason it's often said it's important for your ActiveJobs to be idempotent -- so it's okay if they run more than once. (For instance, here's sidekiq on it)
Since it's so rare to make the "run exactly once" claim, I was a a bit suspicious of it. If i understand how advisory locks are being used.... is this an example showing how it can fail and wind up more than once? --
What if the network connecction to postgres gets interupted in minute 7 of the 15 minute job. The session-level advisory lock would be dropped, and another worker free to take the job, correct? However, the original job is still running -- and may even be able to save it's work to the database, because if it was an intermittent network interuption, ActiveRecord in some cases may transparently reestablish the connection again.
Is this a case where a job might be run twice, completely-performed? You do hedge with "completely-performed", but I think it could still be. Although if a job were run once 99% performed, and then again 100% performed... I'm not sure it actually matters that much that it wasn't "completely-performed" twice?
I may have the details of this potential problem not quite right. But I'm suspicious of the "Exactly right" guarantee, and wonder how confident you really are this can be guaranteed even with unexpected infrastructure failures like intermittent network interuption etc. I don't think any other systems, including sidekiq, make a "exactly once" guarantee -- it may be safer for everyone to realize that a job can in some cases be run more than once and must be idempotent.
The text was updated successfully, but these errors were encountered: