Batch consume #660

renchap · 2022-07-10T15:37:57Z

renchap
Jul 10, 2022

We are currently considering switching to good_job for our job processing, and I am wondering if consuming jobs in batch is doable.

The use-case: we have multiple jobs that could benefit batching the writes to a storage (events to an analytics storage, re-indexing of some records in our search system).

Ideally, we would like to say that a specific job (ReindexUserJob) should be consumed by 10, with a timeout of 30 seconds.

This would mean that a method on the job (perform_bulk?) would be called with a maximum of 10 jobs in arguments, but can be called with less than 10 jobs if we did not reach 10 queues job in the last 30 seconds. If the method fails, then all jobs are put back in the queue to be reprocessed later.

What do you think about it? Is it doable without too much pain in good_job?

bensheldon · 2022-07-10T16:53:19Z

bensheldon
Jul 10, 2022
Maintainer

Hmm, forgive me if I am misunderstanding here.

I would design this slightly differently because I don't think the queue should be "holding" jobs for some later logic; it should be performing them asap. GoodJob also doesn't have any batch functionality right now; it is entirely ActiveJob semantics for performing jobs.

I would design this with something like this:

# When you want a user to be reindexed
some_user.touch(:needs_to_be_indexed_at)

class BatchUserIndexJob < ApplicationJob
  def perform
    return if User.where.not(needs_to_be_indexed_at: nil).count < 10
    User.where.not(needs_to_be_indexed_at: nil).first(10).each do |user|
      UserIndexJob.perform_later(user) # <= queue it or maybe just do it right here in the loop?
      user.update(needs_to_be_indexed_at: nil)
    end
  end
end

# and then queue BatchUserIndexJob every 30 seconds using GoodJob's cron or some external cron system

0 replies

renchap · 2022-07-11T07:05:33Z

renchap
Jul 11, 2022
Author

The issue with using a cron is that you need to add a field to each of the tables you want to handle this way, and it needs to be indexed as well to your queries are not too slow. It will also have problems with efficiently processing large number of jobs, for example if we do a re-index on 300 000 users, there will be one job processing them in batch, then 2 jobs after 30 seconds, then 3 jobs after 60 seconds…

With other queuing systems, you "simply" reserve X jobs, then process them at the same time once you either hit the count limit or the configured timeout, and then mark them all as done.

I also understand it can require to move to far away from ActiveJob semantics (a perform method receiving multiple jobs at once), I just wanted to see if this was someting doable with good_job's architecture.

2 replies

bensheldon Jul 11, 2022
Maintainer

@renchap hmm, I've never seen a requirement like that before. Can you share what system and feature you're using?

renchap Jul 11, 2022
Author

This is maybe not very common, but batching writes to a system (Elasticsearch for example) really improves the performance in our experience.

Currently we have a manual implementation over Google Pub/Sub where our workers pull messages until they reach the threshold/timeout, and then process the batch and then ack the messages. I am investigating if this is possible to re-align it with our other "standard" jobs, hence my question.

Kafka and RabbitMQ also support this pattern, but I think do a lot more than what is expected from an ActiveJob system :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch consume #660

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Batch consume #660

renchap Jul 10, 2022

Replies: 2 comments · 2 replies

bensheldon Jul 10, 2022 Maintainer

renchap Jul 11, 2022 Author

bensheldon Jul 11, 2022 Maintainer

renchap Jul 11, 2022 Author

renchap
Jul 10, 2022

Replies: 2 comments 2 replies

bensheldon
Jul 10, 2022
Maintainer

renchap
Jul 11, 2022
Author

bensheldon Jul 11, 2022
Maintainer

renchap Jul 11, 2022
Author