Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly report errors to error tracker service #159

Closed
morgoth opened this issue Oct 5, 2020 · 10 comments
Closed

How to properly report errors to error tracker service #159

morgoth opened this issue Oct 5, 2020 · 10 comments

Comments

@morgoth
Copy link
Collaborator

morgoth commented Oct 5, 2020

I have issues with proper error handling.

My setup looks like this:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, wait: :exponentially_longer, attempts: 10

  around_perform do |job, block|
    block.call
  rescue StandardError => e
    Rollbar.error(e) # Or any other error tracker service
    raise
  end
end

This works fine for most of the cases.
However when in some job I have defined "discard_on":

class SomeJob < ApplicationJob
  discard_on SomeError
  
  def perform
     # SomeError might be raised here
  end
end

The error is also submitted to issue tracker (the job is properly discarded), but it should not as I explicitly wrote to be discarded.

I cannot find a clean way to workaround this issue.

Is there anyone facing the same problem? Any ideas how it could be solved?

@bensheldon
Copy link
Owner

@morgoth Thanks for flagging this! I looked into it and learned a few things.

The rescue_on / retry_on / discard_on hooks all run after any around_perform blocks.

The way that the Sentry ActiveJob integration works is that it calls rescue_with_handler which directly invokes any rescuable hooks: https://github.com/getsentry/sentry-ruby/blob/d288d55db245017eb500eda85f2c2fe48f97c52e/lib/raven/integrations/rails/active_job.rb#L9-L32

To use this in your code, you'd write it like this:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError, wait: :exponentially_longer, attempts: 10

  around_perform do |job, block|
    block.call
  rescue StandardError => e
    rescue_handler_result = rescue_with_handler(e)
    next rescue_handler_result if rescue_handler_result

    Rollbar.error(e) # Or any other error tracker service
    raise
  end
end

This should probably be documented in the Readme. Error handling is quite a hill to climb 😓

@morgoth
Copy link
Collaborator Author

morgoth commented Oct 5, 2020

This doesn't work for me.

The retry_on StandardError, wait: :exponentially_longer, attempts: 10 defines a handler, so there will be always some result from rescue_handler_result = rescue_with_handler(e), thus it will never reach reporting code.

@bensheldon
Copy link
Owner

@morgoth you're right. I'm gonna have to go back to the drawing board. Briefly, I think the thing we need here is for retry_on to have something like retry_on StandardError, on_retry: -> { |error, job| # do something }... which unfortunately doesn't exist. Here's some related discussion on Rails: rails/rails#38907

I think a core challenge here is there is that we want retry_on to be able to differentiate between "retry on this known situation and don't worry about it" and "retry when something unexpected happens and definitely let me know".

@bensheldon
Copy link
Owner

Just an idea, retry_on could be monkey-patched with a callback invoked on retry:

module ActiveJob
  module Exceptions
    module ClassMethods
      def retry_on(*exceptions, wait: 3.seconds, attempts: 5, queue: nil, priority: nil, callback: nil)
        rescue_from(*exceptions) do |error|
          executions = executions_for(exceptions)

          if executions < attempts
            callback.call(self, error) if callback # <= PROPOSED FUNCTIONALITY
            retry_job wait: determine_delay(seconds_or_duration_or_algorithm: wait, executions: executions), queue: queue, priority: priority, error: error
          else
            if block_given?
              instrument :retry_stopped, error: error do
                yield self, error
              end
            else
              instrument :retry_stopped, error: error
              raise error
            end
          end
        end
      end
    end
  end
end

@morgoth
Copy link
Collaborator Author

morgoth commented Oct 5, 2020

Another idea - use Rails instrumentation.
retry_on is calling retry_job which has instrumentation built in https://github.com/rails/rails/blob/7bfb9276e37913f9828046cd5198e8560ead3ed5/activejob/lib/active_job/exceptions.rb#L122-L126

I'm not sure if there are all information needed available (would be nice to have access to job instance.

@morgoth
Copy link
Collaborator Author

morgoth commented Oct 6, 2020

Yes, looks like instrumentation is working pretty well. I need to test it a bit more to be sure, but something like this:

# config/initializers/good_job.rb
ActiveSupport::Notifications.subscribe /(enqueue_retry|retry_stopped)\.active_job/ do |event|
  job = event.payload[:job]
  Rollbar.error(event.payload[:error],
    job: job.class.name,
    job_id: job.job_id,
    use_exception_level_filters: true,
    arguments: job.arguments
  )
end

works fine. Vide https://edgeguides.rubyonrails.org/active_support_instrumentation.html#enqueue-retry-active-job
It also covers mailer errors.

@morgoth
Copy link
Collaborator Author

morgoth commented Dec 10, 2020

I have problems with using retry_on StandardError, wait: :exponentially_longer, attempts: 10
In tests every error is catched and retried, thus if you write MyJob.perform_now(:wrong, :nr, :of_arguments) in tests it will silently pass, ie it won't throw an ArgumentError.

Not sure how to make it right.

@morgoth
Copy link
Collaborator Author

morgoth commented Dec 11, 2020

So this mention is also problematic https://github.com/bensheldon/good_job#retries as it has unexpected side effects in tests.
@bensheldon Are you using this setup in your app? Did you encounter problems described above?

@bensheldon
Copy link
Owner

@morgoth

This is the setup I'm using, which does infinite retries---though that would imply that my tests should time out if they encounter an exception, but I have not observed that:

# config/initializers/good_job.rb
GoodJob.preserve_job_records = true

ActiveJob::Base.queue_adapter = if Rails.env.development?
                                  GoodJob::Adapter.new(execution_mode: :external)
                                else
                                  :good_job
                                end

GoodJob.on_thread_error = ->(exception) { Raven.capture_exception(exception) }
ActionMailer::MailDeliveryJob.retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY

# app/jobs/application_job.rb
class ApplicationJob < ActiveJob::Base
  JobTimeoutError = Class.new(StandardError)

  queue_as :default
  retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY

  around_perform do |_job, block|
    block.call
  rescue StandardError => e
    Raven.capture_exception(e)
    raise
  end

  around_perform do |_job, block|
    Timeout.timeout(10.minutes, JobTimeoutError) do
      block.call
    end
  end
end

You're right though that ActiveJob error handling leaves a lot to be desired, as I think this is challenge of ActiveSupport::Rescuable/rescue_from to differentiate between functional behavior and operational configuration.

I don't have an answer I'm excited about 🤔 You could stub out the retry_on or retry_job methods in your tests as no-ops. You could also add a guard in the retry_on logic for something like unless Rails.env.test?.

@morgoth
Copy link
Collaborator Author

morgoth commented May 26, 2021

After some time, I found the setup that works fine for me. Here is a summary:

# app/lib/active_job_error_handler.rb
# This reimplements ActiveJob#retry_on method with added error notifications to Rollbar
module ActiveJobErrorHandler
  extend ActiveSupport::Concern

  MAX_RETRIES = 10

  included do
    rescue_from StandardError do |error|
      executions = executions_for([StandardError])

      Rollbar.error(...) # or any other error tracker

      if executions < MAX_RETRIES
        delay = determine_delay(seconds_or_duration_or_algorithm: :exponentially_longer, executions: executions)
        retry_job(wait: delay, error: error)
      else
        # Propagate to GoodJob, so it is stored for inspection
        raise
      end
    end
  end
end
# config/initializers/good_job.rb
require "active_job_error_handler"

# More good job setup if needed
ActionMailer::MailDeliveryJob.include(ActiveJobErrorHandler)
# test/test_helper.rb
class ApplicationJob
  # Redefine, to not swallow regular code errors, which otherwise would be put into retry queue and silently skipped
  # Testing retries should be done using assert_retried_job method
  def retry_job(error:, **)
    raise error
  end
end
class ActiveSupport::TestCase
  # used for explicit testing of retried jobs
  def assert_retried_job(job_class, by_error: StandardError)
    job_class.any_instance.expects(:retry_job).with { |attrs|
      assert_instance_of(by_error, attrs[:error])
    }
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants