How to properly report errors to error tracker service #159

morgoth · 2020-10-05T13:19:08Z

I have issues with proper error handling.

My setup looks like this:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, wait: :exponentially_longer, attempts: 10

  around_perform do |job, block|
    block.call
  rescue StandardError => e
    Rollbar.error(e) # Or any other error tracker service
    raise
  end
end

This works fine for most of the cases.
However when in some job I have defined "discard_on":

class SomeJob < ApplicationJob
  discard_on SomeError
  
  def perform
     # SomeError might be raised here
  end
end

The error is also submitted to issue tracker (the job is properly discarded), but it should not as I explicitly wrote to be discarded.

I cannot find a clean way to workaround this issue.

Is there anyone facing the same problem? Any ideas how it could be solved?

bensheldon · 2020-10-05T15:26:26Z

@morgoth Thanks for flagging this! I looked into it and learned a few things.

The rescue_on / retry_on / discard_on hooks all run after any around_perform blocks.

The way that the Sentry ActiveJob integration works is that it calls rescue_with_handler which directly invokes any rescuable hooks: https://github.com/getsentry/sentry-ruby/blob/d288d55db245017eb500eda85f2c2fe48f97c52e/lib/raven/integrations/rails/active_job.rb#L9-L32

To use this in your code, you'd write it like this:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, wait: :exponentially_longer, attempts: 10

  around_perform do |job, block|
    block.call
  rescue StandardError => e
    rescue_handler_result = rescue_with_handler(e)
    next rescue_handler_result if rescue_handler_result

    Rollbar.error(e) # Or any other error tracker service
    raise
  end
end

This should probably be documented in the Readme. Error handling is quite a hill to climb 😓

morgoth · 2020-10-05T15:54:30Z

This doesn't work for me.

The retry_on StandardError, wait: :exponentially_longer, attempts: 10 defines a handler, so there will be always some result from rescue_handler_result = rescue_with_handler(e), thus it will never reach reporting code.

bensheldon · 2020-10-05T16:45:07Z

@morgoth you're right. I'm gonna have to go back to the drawing board. Briefly, I think the thing we need here is for retry_on to have something like retry_on StandardError, on_retry: -> { |error, job| # do something }... which unfortunately doesn't exist. Here's some related discussion on Rails: rails/rails#38907

I think a core challenge here is there is that we want retry_on to be able to differentiate between "retry on this known situation and don't worry about it" and "retry when something unexpected happens and definitely let me know".

bensheldon · 2020-10-05T16:53:16Z

Just an idea, retry_on could be monkey-patched with a callback invoked on retry:

module ActiveJob
  module Exceptions
    module ClassMethods
      def retry_on(*exceptions, wait: 3.seconds, attempts: 5, queue: nil, priority: nil, callback: nil)
        rescue_from(*exceptions) do |error|
          executions = executions_for(exceptions)

          if executions < attempts
            callback.call(self, error) if callback # <= PROPOSED FUNCTIONALITY
            retry_job wait: determine_delay(seconds_or_duration_or_algorithm: wait, executions: executions), queue: queue, priority: priority, error: error
          else
            if block_given?
              instrument :retry_stopped, error: error do
                yield self, error
              end
            else
              instrument :retry_stopped, error: error
              raise error
            end
          end
        end
      end
    end
  end
end

morgoth · 2020-10-05T17:52:11Z

Another idea - use Rails instrumentation.
retry_on is calling retry_job which has instrumentation built in https://github.com/rails/rails/blob/7bfb9276e37913f9828046cd5198e8560ead3ed5/activejob/lib/active_job/exceptions.rb#L122-L126

I'm not sure if there are all information needed available (would be nice to have access to job instance.

morgoth · 2020-10-06T12:24:24Z

Yes, looks like instrumentation is working pretty well. I need to test it a bit more to be sure, but something like this:

# config/initializers/good_job.rb
ActiveSupport::Notifications.subscribe /(enqueue_retry|retry_stopped)\.active_job/ do |event|
  job = event.payload[:job]
  Rollbar.error(event.payload[:error],
    job: job.class.name,
    job_id: job.job_id,
    use_exception_level_filters: true,
    arguments: job.arguments
  )
end

works fine. Vide https://edgeguides.rubyonrails.org/active_support_instrumentation.html#enqueue-retry-active-job
It also covers mailer errors.

morgoth · 2020-12-10T14:04:50Z

I have problems with using retry_on StandardError, wait: :exponentially_longer, attempts: 10
In tests every error is catched and retried, thus if you write MyJob.perform_now(:wrong, :nr, :of_arguments) in tests it will silently pass, ie it won't throw an ArgumentError.

Not sure how to make it right.

morgoth · 2020-12-11T09:11:26Z

So this mention is also problematic https://github.com/bensheldon/good_job#retries as it has unexpected side effects in tests.
@bensheldon Are you using this setup in your app? Did you encounter problems described above?

bensheldon · 2020-12-11T16:41:05Z

@morgoth

This is the setup I'm using, which does infinite retries---though that would imply that my tests should time out if they encounter an exception, but I have not observed that:

# config/initializers/good_job.rb
GoodJob.preserve_job_records = true

ActiveJob::Base.queue_adapter = if Rails.env.development?
                                  GoodJob::Adapter.new(execution_mode: :external)
                                else
                                  :good_job
                                end

GoodJob.on_thread_error = ->(exception) { Raven.capture_exception(exception) }
ActionMailer::MailDeliveryJob.retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY

# app/jobs/application_job.rb
class ApplicationJob < ActiveJob::Base
  JobTimeoutError = Class.new(StandardError)

  queue_as :default
  retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY

  around_perform do |_job, block|
    block.call
  rescue StandardError => e
    Raven.capture_exception(e)
    raise
  end

  around_perform do |_job, block|
    Timeout.timeout(10.minutes, JobTimeoutError) do
      block.call
    end
  end
end

You're right though that ActiveJob error handling leaves a lot to be desired, as I think this is challenge of ActiveSupport::Rescuable/rescue_from to differentiate between functional behavior and operational configuration.

I don't have an answer I'm excited about 🤔 You could stub out the retry_on or retry_job methods in your tests as no-ops. You could also add a guard in the retry_on logic for something like unless Rails.env.test?.

morgoth · 2021-05-26T07:43:41Z

After some time, I found the setup that works fine for me. Here is a summary:

# app/lib/active_job_error_handler.rb
# This reimplements ActiveJob#retry_on method with added error notifications to Rollbar
module ActiveJobErrorHandler
  extend ActiveSupport::Concern

  MAX_RETRIES = 10

  included do
    rescue_from StandardError do |error|
      executions = executions_for([StandardError])

      Rollbar.error(...) # or any other error tracker

      if executions < MAX_RETRIES
        delay = determine_delay(seconds_or_duration_or_algorithm: :exponentially_longer, executions: executions)
        retry_job(wait: delay, error: error)
      else
        # Propagate to GoodJob, so it is stored for inspection
        raise
      end
    end
  end
end

# config/initializers/good_job.rb
require "active_job_error_handler"

# More good job setup if needed
ActionMailer::MailDeliveryJob.include(ActiveJobErrorHandler)

# test/test_helper.rb
class ApplicationJob
  # Redefine, to not swallow regular code errors, which otherwise would be put into retry queue and silently skipped
  # Testing retries should be done using assert_retried_job method
  def retry_job(error:, **)
    raise error
  end
end
class ActiveSupport::TestCase
  # used for explicit testing of retried jobs
  def assert_retried_job(job_class, by_error: StandardError)
    job_class.any_instance.expects(:retry_job).with { |attrs|
      assert_instance_of(by_error, attrs[:error])
    }
  end
end

morgoth closed this as completed May 26, 2021

coezbek mentioned this issue Dec 8, 2021

Background notifications for Shoryuken/ActiveJob smartinez87/exception_notification#368

Open

bensheldon mentioned this issue Apr 22, 2023

Add after_discard method to ActiveJob rails/rails#48010

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly report errors to error tracker service #159

How to properly report errors to error tracker service #159

morgoth commented Oct 5, 2020 •

edited

Loading

bensheldon commented Oct 5, 2020

morgoth commented Oct 5, 2020

bensheldon commented Oct 5, 2020

bensheldon commented Oct 5, 2020

morgoth commented Oct 5, 2020

morgoth commented Oct 6, 2020

morgoth commented Dec 10, 2020

morgoth commented Dec 11, 2020

bensheldon commented Dec 11, 2020

morgoth commented May 26, 2021

How to properly report errors to error tracker service #159

How to properly report errors to error tracker service #159

Comments

morgoth commented Oct 5, 2020 • edited Loading

bensheldon commented Oct 5, 2020

morgoth commented Oct 5, 2020

bensheldon commented Oct 5, 2020

bensheldon commented Oct 5, 2020

morgoth commented Oct 5, 2020

morgoth commented Oct 6, 2020

morgoth commented Dec 10, 2020

morgoth commented Dec 11, 2020

bensheldon commented Dec 11, 2020

morgoth commented May 26, 2021

morgoth commented Oct 5, 2020 •

edited

Loading