Improve robustness around redis connection issues #26

michaelcameron · 2012-12-03T21:42:43Z

We've had periodic issues with redis where there will be some connection issues, the workers appear to reconnect successfully, but then they fail when they get new jobs with some weird cast exception. They stay alive polling until another new job pops up, which may be much later and then have:

java.lang.ClassCastException: java.lang.Long cannot be cast to [B
        at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:182)
        at redis.clients.jedis.Connection.getBulkReply(Connection.java:171)
        at redis.clients.jedis.Jedis.lpop(Jedis.java:1090)
        at net.greghaines.jesque.worker.WorkerImpl.poll(WorkerImpl.java:487)
        at net.greghaines.jesque.worker.WorkerImpl.run(WorkerImpl.java:230)

While trying to troubleshoot, there were a few changes I wanted to make to better find the root cause:

There appears to be a code path where an exception can occur on reconnect, but the message will never be logged. In WorkerImpl.recoverFromException if there is anything but a JedisConnectionException on reconnect, then the exception will not be handled until run which only has a try/finally.
The code assumes that a connected jedis object is healthy, but the JedisPool implementation in Jedis itself uses a stronger condition: jedis.isConnected() && jedis.ping().equals("PONG"). This will further test the connection with an exchange of data.
I wanted to tweak the recoverFromException implementation in grails-jesque first since I already have a GrailsWorkerImpl sublcass, but I could not access some of the private variables necessary to make it work. I changed some of those to protected so I could try some more things if I need to before making another pull request.

Improve robustness around redis connection issues

michaelcameron · 2012-12-06T19:22:29Z

Can you release a snapshot version based on this change and the other changes you merged in?

Improve robustness around jedis connection issues

fc8e50f

ghost assigned gresrun Dec 4, 2012

gresrun added a commit that referenced this pull request Dec 4, 2012

Merge pull request #26 from michaelcameron/master

063e7cb

Improve robustness around redis connection issues

gresrun merged commit 063e7cb into gresrun:master Dec 4, 2012

michaelcameron mentioned this pull request Feb 26, 2013

Ability to set an exception handler to the worker? michaelcameron/grails-jesque#23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve robustness around redis connection issues #26

Improve robustness around redis connection issues #26

michaelcameron commented Dec 3, 2012

michaelcameron commented Dec 6, 2012

Improve robustness around redis connection issues #26

Improve robustness around redis connection issues #26

Conversation

michaelcameron commented Dec 3, 2012

michaelcameron commented Dec 6, 2012