Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging #99

Merged
merged 16 commits into from
Apr 29, 2014
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ stdout
tags
tags.*
*.gem
.ruby-version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

.ruby-gemset
17 changes: 17 additions & 0 deletions lib/resque/failure/multiple_with_retry_suppression.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
require 'resque/failure/multiple'
require 'resque/plugins/retry/log_formatting'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After using this, realized that I needed logging in the failure backend as well. Extracted the formatting to one place, though the actual log call needed to be slightly different - see below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call


module Resque
module Failure
Expand All @@ -17,6 +18,7 @@ module Failure
# Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression
#
class MultipleWithRetrySuppression < Multiple
include Resque::Plugins::Retry::LogFormatting
# Called when the job fails
#
# If the job will retry, suppress the failure from the other backends.
Expand All @@ -25,10 +27,14 @@ class MultipleWithRetrySuppression < Multiple
#
# @api private
def save
log 'failure backend save', payload, exception

if !(retryable? && retrying?)
log "!(#{retryable?} && #{retryable? && retrying?}) - sending failure to superclass", payload, exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was tricky. Calling retrying? is unsafe unless you've first verified that retryable? returns true. BUT, what I want in this log message is to figure out what the value both methods returned. Simply logging the entire expression doesn't allow that. I tried making retrying? safe to call, but that caused further complications, so I figured a complex log line would be better than increasing complexity of other methods

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that you didn't re-invoke the retryable? and retrying? methods here, just capture them into variables on the line before:

retryable, retrying = retryable?, retrying?
if !(retryable && retrying)
  log "!(#{retryable} && #{retrying}) - sending failure to superclass", payload, exception
  ...

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they do hit Redis, so it makes sense. Problem is that retrying? cannot be called unless retryable? has returned true, so it would need to look like:

retryable = retryable?
job_being_retried = retryable && retrying?
if job_being_retried
  log "#{retriable ? 'retriable' : 'non-retriable'} job is not being retried - sending failure to superclass", payload, exception
elisf #...

Or something like that? That might actually be a better log message anyway…

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do:

retryable = retryable?
if retryable && retrying?
  log "#{retryable ? 'retryable' : 'non-retryable'} job is not being retried - sending failure to superclass", payload, exception
  ...

I don't understand how you could get in there if the job is not "retryable?" however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My psuedo-code was the opposite of what was there. That block is only if the job is not being retried, so the point of the log message is to distinguish "job isn't retriable at all" vs "job IS retriable, but isn't being retried" (which was the crucial distinction for unlocking #97).

cleanup_retry_failure_log!
super
elsif retry_delay > 0
log "retry_delay: #{retry_delay} > 0 - saving details for resque-web", payload, exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that this is indeed what happens here. The only use of the failure_key I could find was in the resque-web extensions - this has no bearing on the actual retry logic at all that I could tell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not terribly concerned about this code path, perhaps just annotate the message w/

"retry_delay: #{retry_delay} > 0 - saving details in Redis"

data = {
:failed_at => Time.now.strftime("%Y/%m/%d %H:%M:%S"),
:payload => payload,
Expand All @@ -40,6 +46,8 @@ def save
}

Resque.redis.setex(failure_key, 2*retry_delay, Resque.encode(data))
else
log "retry_delay: #{retry_delay} <= 0 - ignoring", payload, exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure if this captures the intent of doing nothing for this case. Obviously I can use this message to see what happened, but if there's a better way to state what this case is please let me know and I can make a better log message

end
end

Expand Down Expand Up @@ -82,6 +90,15 @@ def retrying?
def cleanup_retry_failure_log!
Resque.redis.del(failure_key) if retryable?
end

def log(message,payload=nil,exception=nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since log(message) is part of the Multiple or Base class that comes with Resque, this has to be compatible.

I initially just deferred to super, e.g. super(format_message(message,args,exception)) however that caused tests to fail, because the base class calls log on its @worker and for some of the tests, @worker is nil, and Resque doesn't check for it. So either Resque has a bug, or the test doesn't sufficiently simulate the real world, but that didn't feel like a problem I should be solving in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this simple and not make assumptions about the base implementation. Let's not hide/override the base log method.

def log_message(message, payload=nil, exception=nil)
   ...
end

I know this makes more work for you, but I think it is a bit less risky

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Do you think this should just get extracted to the module and then the plugin can also use e.g. log_message (as opposed to just sharing the formatting)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can put it in the module, that's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Might not get to this today, but will as soon as I can.

if Resque.logger
args = (payload || {})['args']
Resque.logger.info format_message(message,args,exception)
else
super(message)
end
end
end
end
end
58 changes: 47 additions & 11 deletions lib/resque/plugins/retry.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
require 'digest/sha1'
require 'resque/plugins/retry/log_formatting'

module Resque
module Plugins
Expand Down Expand Up @@ -35,6 +36,7 @@ module Plugins
# end
#
module Retry
include Resque::Plugins::Retry::LogFormatting

# Raised if the retry-strategy cannot be determined or has conflicts
#
Expand All @@ -45,7 +47,7 @@ class AmbiguousRetryStrategyException < StandardError; end
#
# @api private
def self.extended(receiver)
if receiver.instance_variable_get("@fatal_exceptions") && receiver.instance_variable_get("@retry_exceptions")
if receiver.instance_variable_get('@fatal_exceptions') && receiver.instance_variable_get('@retry_exceptions')
raise AmbiguousRetryStrategyException.new(%{You can't define both "@fatal_exceptions" and "@retry_exceptions"})
end
end
Expand All @@ -55,7 +57,7 @@ def self.extended(receiver)
# @api private
def inherited(subclass)
super(subclass)
subclass.instance_variable_set("@retry_criteria_checks", retry_criteria_checks.dup)
subclass.instance_variable_set('@retry_criteria_checks', retry_criteria_checks.dup)
end

# @abstract You may override to implement a custom retry identifier,
Expand All @@ -81,7 +83,7 @@ def retry_identifier(*args)
#
# @api public
def redis_retry_key(*args)
['resque-retry', name, retry_identifier(*args)].compact.join(":").gsub(/\s/, '')
['resque-retry', name, retry_identifier(*args)].compact.join(':').gsub(/\s/, '')
end

# Maximum number of retrys we can attempt to successfully perform the job
Expand Down Expand Up @@ -230,17 +232,25 @@ def retry_exceptions
# @api public
def retry_criteria_valid?(exception, *args)
# if the retry limit was reached, dont bother checking anything else.
return false if retry_limit_reached?
if retry_limit_reached?
log 'retry limit reached', args, exception
return false
end

# We always want to retry if the exception matches.
should_retry = retry_exception?(exception)

# call user retry criteria check blocks.
retry_criteria_checks.each do |criteria_check|
should_retry ||= !!instance_exec(exception, *args, &criteria_check)
retry_based_on_exception = retry_exception?(exception)
log "Exception is #{retry_based_on_exception ? '' : 'not '}sufficient for a retry", args, exception

retry_based_on_criteria = false
unless retry_based_on_exception
# call user retry criteria check blocks.
retry_criteria_checks.each do |criteria_check|
retry_based_on_criteria ||= !!instance_exec(exception, *args, &criteria_check)
end
end
log "user retry criteria is #{retry_based_on_criteria ? '' : 'not '}sufficient for a retry", args, exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote this once I saw it in real life. NOT Retrying based on criteria leaves one with the impression we are not retrying. Hopefully, these message more clearly paint the narrative of what's going on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


should_retry
retry_based_on_exception || retry_based_on_criteria
end

# Retry criteria checks
Expand Down Expand Up @@ -297,11 +307,14 @@ def retry_criteria_check(&block)
#
# @api private
def try_again(exception, *args)
log 'try_again', args, exception
# some plugins define retry_delay and have it take no arguments, so rather than break those,
# we'll just check here to see whether it takes the additional exception class argument or not
temp_retry_delay = ([-1, 1].include?(method(:retry_delay).arity) ? retry_delay(exception.class) : retry_delay)

retry_in_queue = retry_job_delegate ? retry_job_delegate : self
log "retry delay: #{temp_retry_delay} for class: #{retry_in_queue}", args, exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized that retry_in_queue is not a queue name but a class name. Could change the variable name, too?


if temp_retry_delay <= 0
# If the delay is 0, no point passing it through the scheduler
Resque.enqueue(retry_in_queue, *args_for_retry(*args))
Expand All @@ -322,12 +335,15 @@ def try_again(exception, *args)
#
# @api private
def before_perform_retry(*args)
log 'before_perform_retry', args
@on_failure_retry_hook_already_called = false

# store number of retry attempts.
retry_key = redis_retry_key(*args)
Resque.redis.setnx(retry_key, -1) # default to -1 if not set.
@retry_attempt = Resque.redis.incr(retry_key) # increment by 1.
log "attempt: #{@retry_attempt} set in Redis", args
@retry_attempt
end

# Resque after_perform hook
Expand All @@ -336,6 +352,7 @@ def before_perform_retry(*args)
#
# @api private
def after_perform_retry(*args)
log 'after_perform_retry, clearing retry key', args
clean_retry_key(*args)
end

Expand All @@ -350,11 +367,16 @@ def after_perform_retry(*args)
#
# @api private
def on_failure_retry(exception, *args)
return if @on_failure_retry_hook_already_called
log 'on_failure_retry', args, exception
if @on_failure_retry_hook_already_called
log 'on_failure_retry_hook_already_called', args, exception
return
end

if retry_criteria_valid?(exception, *args)
try_again(exception, *args)
else
log 'retry criteria not sufficient for retry', args, exception
clean_retry_key(*args)
end

Expand All @@ -381,9 +403,23 @@ class << self; self end.class_eval{ undef_method(mname) } rescue nil
#
# @api private
def clean_retry_key(*args)
log 'clean_retry_key', args
Resque.redis.del(redis_retry_key(*args))
end

# Log messages through the Resque logger. Generally not for appication
# logging-just for inner-workings of Resque and plugins.
#
# message:: message to log
# args:: args of the resque job in context
# exception:: the exception that might be causing a retry
#
# @api private
def log(message, args, exception=nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes about this:

  • I didn't want to include Resque::Logging as a) we don't need the different severity levels and b) those method names are super-common and would squash any jobs that do something similar (like my jobs :)
  • Checking for Resque.logger is mirroring what's in Resque::Worker - presumably there is a good reason, but I guess generally logging should not take down your system
  • The environment variable is there so that a released version of this library would behave the same way, just in case the addition of logging causes some problem. This means people can opt-in, and they can opt-in on a per-worker basis, which seems handy. The value of "true" is debatable. I prefer it, since ENV can only store strings, but I know some people like 1. If someone feel strongly, we can change it.

# Resque::Worker will handle interpeting LOGGER, VERBOSE, and VVERBOSE
# since everyone is sharing Resque.logger.
Resque.logger.info format_message(message,args,exception) if Resque.logger
end
end
end
end
12 changes: 12 additions & 0 deletions lib/resque/plugins/retry/log_formatting.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module Resque
module Plugins
module Retry
module LogFormatting
def format_message(message,args=nil,exception=nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed the best way to share this formatting. Other option would be to expose the log method in Resque::Plugins::Retry but I figured this would be better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea here.. We could create "lib/resque/plugins/retry/logging.rb" as well

exception_portion = exception.nil? ? '' : " [#{exception.class}/#{exception}]"
message = "#{args.inspect}#{exception_portion}: #{message}"
end
end
end
end
end