Job Queue #585

qianl15 · 2024-08-12T18:42:15Z

A feature request from Discord: https://discord.com/channels/1156433345631232100/1272247538556080281

Ideally I wouldn't need a job queue at all and do everything in parallel, but rate limits on api's and limits like 256 instances pr. service on Unikraft makes this problematic down the road. Lambda has a 15 minute timeout, and I'd rather not deal with AWS Batch and ECS or EC2. One might also not be able to deploy ML inference services in a serverless way, requiring a job queue to make sure they don't get overloaded.

Some of the managed products for this are hatchet.run and riverqueue.com. They both support workflows (albeit in a limited fashion), but they're also job queues that can limit concurrency. They're quite expensive compared to the resources you get: Hatchet is 150$/month for 4 concurrent workers, and I could probably get 200 for that price on Unikraft.

Take this one for example https://github.com/uwiger/jobs with the following features:

Job scheduling: A job is scheduled according to certain constraints. For instance, you may want to define that no more than 9 jobs of a certain type can execute simultaneously and the maximal rate at which you can start such jobs are 300 per second.

Job queueing: When load is higher than the scheduling limits additional jobs are queued by the system to be run later when load clears. Certain rules govern queues: are they dequeued in FIFO or LIFO order? How many jobs can the queue take before it is full? Is there a deadline after which jobs should be rejected. When we hit the queue limits we reject the job. This provides a feedback mechanism on the client of the queue so you can take action.

Sampling and dampening: Periodic samples of the Erlang VM can provide information about the health of the system in general. If we have high CPU load or high memory usage, we apply dampening to the scheduling rules: we may lower the concurrency count or the rate at which we execute jobs. When the health problem clears, we remove the dampener and run at full speed again.

chuck-dbos · 2024-10-02T19:12:19Z

Some of these use cases are covered now (in 1.24), but not all of them.

qianl15 added the enhancement New feature or request label Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job Queue #585

Job Queue #585

qianl15 commented Aug 12, 2024

chuck-dbos commented Oct 2, 2024

Job Queue #585

Job Queue #585

Comments

qianl15 commented Aug 12, 2024

chuck-dbos commented Oct 2, 2024