Skip to content

Database based asynchronously priority queue system -- Extracted from Shopify

License

Notifications You must be signed in to change notification settings

BJK/delayed_job

 
 

Repository files navigation

Delayed::Job

Delayed_job (or DJ) encapsulates the common pattern of asynchronously executing longer tasks in the background.

It is a direct extraction from Shopify where the job table is responsible for a multitude of core tasks. Amongst those tasks are:

  • sending massive newsletters
  • image resizing
  • http downloads
  • updating smart collections
  • updating solr, our search server, after product changes
  • batch imports
  • spam checks

Follow us on Twitter to get updates and notices about new releases.

Installation

delayed_job 2.1 only supports Rails 3.0+. See the 2.0 branch for Rails 2.

To install, add delayed_job to your Gemfile and run `bundle install`:

gem 'delayed_job'

After delayed_job is installed, you will need to setup the backend.

Backends

delayed_job supports multiple backends for storing the job queue. See the wiki for other backends besides Active Record.

The default is Active Record, which requires a jobs table.

$ script/rails generate delayed_job
$ rake db:migrate

Queuing Jobs

Call .delay.method(params) on any object and it will be processed in the background.

# without delayed_job
@user.activate!(@device)

# with delayed_job
@user.delay.activate!(@device)

If a method should always be run in the background, you can call #handle_asynchronously after the method declaration:

class Device
  def deliver
    # long running method
  end
  handle_asynchronously :deliver
end

device = Device.new
device.deliver

handle_asynchronously can take as options anything you can pass to delay. In addition the values can be Proc objects allowing call time evaluation of the value. For some examples:

class LongTasks
  def send_mailer
    # Some other code
  end
  handle_asynchronously :send_mailer, :priority => 20

  def in_the_future
    # Some other code
  end
  # 5.minutes.from_now will be evaluated when in_the_future is called
  handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now }

  def self.when_to_run
    2.hours.from_now
  end

  def call_a_class_method
    # Some other code
  end
  handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run }

  attr_reader :how_important

  def call_an_instance_method
    # Some other code
  end
  handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important }
end

Rails 3 Mailers

Due to how mailers are implemented in Rails 3, we had to do a little work around to get delayed_job to work.

# without delayed_job
Notifier.signup(@user).deliver

# with delayed_job
Notifier.delay.signup(@user)

Remove the .deliver method to make it work. It’s not ideal, but it’s the best we could do for now.

Running Jobs

script/delayed_job can be used to manage a background process which will start working off jobs. Make sure you’ve run `script/generate delayed_job`.

$ RAILS_ENV=production script/delayed_job start
$ RAILS_ENV=production script/delayed_job stop

# Runs two workers in separate processes.
$ RAILS_ENV=production script/delayed_job -n 2 start
$ RAILS_ENV=production script/delayed_job stop

Workers can be running on any computer, as long as they have access to the database and their clock is in sync. Keep in mind that each worker will check the database at least every 5 seconds.

You can also invoke rake jobs:work which will start working off jobs. You can cancel the rake task with CTRL-C.

Custom Jobs

Jobs are simple ruby objects with a method called perform. Any object which responds to perform can be stuffed into the jobs table. Job objects are serialized to yaml so that they can later be resurrected by the job runner.

class NewsletterJob < Struct.new(:text, :emails)
  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end
end

Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', Customers.find(:all).collect(&:email))

Hooks

You can define hooks on your job that will be called at different stages in the process:

class ParanoidNewsletterJob < NewsletterJob
  def enqueue(job)
    record_stat 'newsletter_job/enqueue'
  end

  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end

  def before(job)
    record_stat 'newsletter_job/start'
  end

  def after(job)
    record_stat 'newsletter_job/after'
  end

  def success(job)
    record_stat 'newsletter_job/success'
  end

  def error(job, exception)
    notify_hoptoad(exception)
  end

  def failure
    page_sysadmin_in_the_middle_of_the_night
  end
end

Gory Details

The library evolves around a delayed_jobs table which looks as follows:

create_table :delayed_jobs, :force => true do |table|
  table.integer  :priority, :default => 0      # Allows some jobs to jump to the front of the queue
  table.integer  :attempts, :default => 0      # Provides for retries, but still fail eventually.
  table.text     :handler                      # YAML-encoded string of the object that will do work
  table.text   :last_error                   # reason for last failure (See Note below)
  table.datetime :run_at                       # When to run. Could be Time.zone.now for immediately, or sometime in the future.
  table.datetime :locked_at                    # Set when a client is working on this object
  table.datetime :failed_at                    # Set when all retries have failed (actually, by default, the record is deleted instead)
  table.string   :locked_by                    # Who is working on this object (if locked)
  table.timestamps
end

On failure, the job is scheduled again in 5 seconds + N ** 4, where N is the number of retries.

The default Worker.max_attempts is 25. After this, the job either deleted (default), or left in the database with “failed_at” set.
With the default of 25 attempts, the last retry will be 20 days later, with the last interval being almost 100 hours.

The default Worker.max_run_time is 4.hours. If your job takes longer than that, another computer could pick it up. It’s up to you to
make sure your job doesn’t exceed this time. You should set this to the longest time you think the job could take.

By default, it will delete failed jobs (and it always deletes successful jobs). If you want to keep failed jobs, set
Delayed::Worker.destroy_failed_jobs = false. The failed jobs will be marked with non-null failed_at.

By default all jobs are scheduled with priority = 0, which is top priority. You can change this by setting Delayed::Worker.default_priority to something else. Lower numbers have higher priority.

It is possible to disable delayed jobs for testing purposes. Set Delayed::Worker.delay_jobs = false to execute all jobs realtime.

Here is an example of changing job parameters in Rails:

# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 60
Delayed::Worker.max_attempts = 3
Delayed::Worker.max_run_time = 5.minutes
Delayed::Worker.delay_jobs = !Rails.env.test?

Cleaning up

You can invoke rake jobs:clear to delete all jobs in the queue.

Mailing List

Join us on the mailing list at http://groups.google.com/group/delayed_job

How to contribute

If you find what looks like a bug:

  1. Search the mailing list to see if anyone else had the same issue.
  2. Check the GitHub issue tracker to see if anyone else has reported issue.
  3. If you don’t see anything, create an issue with information on how to reproduce it.

If you want to contribute an enhancement or a fix:

  1. Fork the project on github.
  2. Make your changes with tests.
  3. Commit the changes without making changes to the Rakefile or any other files that aren’t related to your enhancement or fix
  4. Send a pull request.

About

Database based asynchronously priority queue system -- Extracted from Shopify

Resources

License

Stars

Watchers

Forks

Packages

No packages published