Keep on getting FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #90

pradeepsimha143 · 2014-08-25T15:51:08Z

Hi Team.

I am trying to produce and consume Kafka messages using node library (kafka-node), I am using HighLevelConsumer API. But I keep on getting this exception at random times. and node.js server stops.

FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110]
at new FailedToRebalanceConsumerError (/home/strg/project/kafkaBroker/node_modules/kafka-node/lib/errors/FailedToRebalanceConsumerError.js:11:11)
at /home/strg/project/kafkaBroker/node_modules/kafka-node/lib/highLevelConsumer.js:141:71

I am not sure what is the issue in this?

I have kept zookeeper timeout as: 50000.

This is my High level consumer code:

consumer = new Consumer(
client,
[
{ topic: consumeTopic } //consumeTopic is the topic which user provided
],
{
autoCommit: false
}

);

consumer.on('message', function (message) {
console.log(message);
}

If I restart the server and it works fine, but again after I keep on getting this exception. Can anyone please guide me in this? I am not able to understand what does this exception means. I tried restarting the zookeeper server and kafka server but still I am facing this exception. Any help on this would be very helpful, as I am very new to Kafka

jezzalaycock · 2014-08-27T16:55:02Z

Under what circumstances is the rebalance happening - are you stopping the consumer using a CTRL-C?

jcastill0 · 2015-03-17T17:29:12Z

I also get that exception occasionally. Restarting my consumer fixes it, but obviously that is not the way to fix it. I'm using a HighLevelConsumer as well, with only one zookeeper and one broker. I'm using the latest 0.8.2.1 version of Kafka.

2015-03-17T17:20:37.379Z - error: error FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110]
at new FailedToRebalanceConsumerError (/Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/lib/errors/FailedToRebalanceConsumerError.js:11:11)
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/lib/highLevelConsumer.js:170:51
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/lib/highLevelConsumer.js:419:17
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/node_modules/async/lib/async.js:240:13
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/node_modules/async/lib/async.js:144:21
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/node_modules/async/lib/async.js:237:17
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/node_modules/async/lib/async.js:600:34
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/lib/highLevelConsumer.js:399:29
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/node_modules/async/lib/async.js:144:21
at /Users/jcastillo/dev/svcbus/ConsumerSFSvc/node_modules/kafka-node/lib/highLevelConsumer.js:389:41

jcastill0 · 2015-03-23T22:47:12Z

Anyone else having this problem? It happens periodically for me when I start the consumer (HighLevel).
As I said above, my kafka setup is very simple, just one broker and one zookeeper. Usually after the second attempt it will not throw that exception. I'm using the latest version (0.2.24).
Any insight would be appreicated.

thanks

** julio

AhmedSoliman · 2015-04-05T13:31:52Z

I'm facing the same problem too, however, I'm generating the clientId in a random fashion to avoid creating two consumers (on the same topic) with the same clientId using:

clientId = "worker-" + Math.floor(Math.random() * 10000)

ericdolson · 2015-04-24T15:11:26Z

Is there any progress being made on this? I am deploying an application across several nodes elastically and am getting this error about every-other time I start up an instance. I tried upping the retry attempts to 30 (in the HighLevelConsumer's rebalance() function) and it would get up to as high as 24 before finally succeeding. I am nervous to just pick a big number and expect that to work though.

@jcastill0 In what way are you able to restart your high-level consumer? I am trying to use consumer.on('error', ...) as a way to catch and restart, but can I reuse my consumer? My client? I would appreciate a pointer :)

syymza · 2015-06-25T10:12:32Z

Same problem here and @AhmedSoliman 's solution does not seem to help... Any news?

CWSpear · 2015-06-29T22:21:06Z

Randomizing the group ID worked for my tests. I don't understand enough of kafka to know if that will mess up production if production uses a fixed group ID? Or maybe production can use a random client ID and all will be well?

Ovidiu-S · 2015-07-16T17:09:22Z

Is this a node issue or kafka ? I am having the same problem.

jezzalaycock · 2015-07-16T21:58:20Z

Node Exists is normally due to the zookeeper timeout. The ephemeral nodes under certain circumstances (CTRL-C for instance) don't get removed. If you're not bothered about balancing a number of consumers on a topic then I suggest you try the normal KafkaConsumer and not the HighLevel one

felipesabino · 2015-07-21T21:19:21Z

@CWSpear How are you dealing with offsets commits while using random consumer group ID?

I was sure that was what was used to keep track of consumed offset, and we use a fixed group ID in production for that reason...

CWSpear · 2015-07-21T23:08:02Z

@felipesabino no idea. I'm actually using a company-specific library wrapped around HighLevelConsumer, and I have dug through the code some, but I haven't been able to get very deep, so many of the inner-workings are over my head.

I'm pretty sure something's going on not in my code specifically, but either in the company's lib, or in kafka and just trying to get to the bottom of it. It's been rather bothersome, and I'm not the only one experiencing issues similar to this, but for now, randomizing the IDs works for tests. It's QA's problem now, right? ;-)

felipesabino · 2015-07-22T22:37:06Z

We managed to easily reproduce this errors in our environment by killing the consumer process and starting it again quickly.

We noticed that whenever our server restated, we tried reconnecting before zookeeper killed the connection (session) on its side this exception would be thrown.

To know that zookeeper killed the connection, look for a message that looks like the following:

INFO  [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14eb7c676540001, timeout of 30000ms exceeded

So far we manage to avoid any FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] error on servers restarts just by delaying any reconnection by at least this session timeout time. You can find this value for your server on you zookeeper config file on the maxSessionTimeout param - http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html

Also, this behavior is consistent with what @CWSpear reported, as randomizing the clientId will force zookeeper to create a new session for every new connection attempt and the exception would not be thrown. But that is far from ideal, as the clientId is what will be used to keep track of your committed offsets...

We are still observing if it will occur randomly, if that is the case may be a similar approach should be taken with the rebalance logic. Anyways, will keep you posted.

Ovidiu-S · 2015-08-12T16:54:14Z

I managed to get rid of the issue by setting

maxSessionTimeout=5000

and delaying the client connection by 6 seconds. Just use timeout() for that, and it will work just fine

felipesabino · 2015-08-12T18:28:40Z

@Ovidiu-S I could not find any maxSessionTimeout in the docs or code... do you mean sessionTimeout from node-zookeeper-client?

Ovidiu-S · 2015-08-13T13:42:32Z

@felipesabino I am referring to the zookeeper server config, not the client. It is the maxSessionTimeout in the zoo.cfg file

bendpx · 2015-09-08T11:02:15Z

+1

barockok · 2016-02-04T03:44:14Z

any update ? I'm facing the same problem 😦

Ovidiu-S · 2016-02-04T08:05:17Z

@barock19 I recommend switching to Kafka 0.9 and the new (zookeeper free) client, when it releases : https://github.com/oleksiyk/kafka

Until then ... just bypass the re-balancing issue with the above fix.

hyperlink · 2016-02-04T15:13:36Z

This problem went away when I add a handler to handler for the CTRL+C case. This ensures the consumer/client is cleaned up otherwise you are at the mercy of whenever the zookeeper node timesout.

process.on('SIGINT', function() {
    highLevelConsumer.close(true, function(){
        process.exit();
    })
});

jezzalaycock · 2016-02-04T17:28:31Z

As suggested by @hyperlink the problem is down to the fact that the ephemeral nodes are no relinquished in zk when issuing a cntrl-c (SIGINT). Under normal failure cases the nodes are released as expected.

Moving to kafka 0,9 will require wholesale changes to the node client - however I believe the kafka guys are creating the client node - so it might be that we can simply switch to using that when available.

mlcloudsec · 2016-03-25T18:03:03Z

I'm having the same issue. Tried changing the zoo.cfg maxSessionTimeout and also closing the high level consumer before SIGINT. Also tried to close the client in itself. Same result

rtorrero · 2016-04-29T10:03:33Z

Using the suggested handler, with a small modification fixed the issue for me:

Add a connection.close() on the callback
Put the process.exit() in the connection.close() callback.

springuper · 2016-05-06T03:37:55Z

@hyperlink's method works fine for me.

UPDATE: It still happens, and I have find out the real problem, please refer to #369

pcothenet · 2016-07-12T18:23:14Z

Same issue on my side. And I can't catch the SIGINT because AWS Elastic Beanstalk somehow does not send one. I'm pretty sure @springuper PR might fix this.

snatesan · 2016-07-13T22:02:22Z

I have seen this issue occurs when the client (zoo keeper) looses connection and soon after it get connection.
Rebalance logic should account for zookeeper session time out as already specified here #90 (comment)

pradeepsimha143 changed the title ~~FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110]~~ Keep on getting FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] Aug 25, 2014

dmitrig01 mentioned this issue Nov 17, 2014

High-level consumer rebalance algorithm having issues #124

Closed

felipesabino mentioned this issue Aug 25, 2015

Intermittent FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] - when consumer starts #234

Closed

springuper mentioned this issue May 11, 2016

bugfix: make rebalance the right way #369

Closed

crzidea mentioned this issue May 24, 2018

HighLevelConsumer throw FailedToRebalanceConsumerError: NODE_EXISTS when rebalancing #981

Merged

hyperlink closed this as completed Sep 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep on getting FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #90

Keep on getting FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #90

pradeepsimha143 commented Aug 25, 2014

jezzalaycock commented Aug 27, 2014

jcastill0 commented Mar 17, 2015

jcastill0 commented Mar 23, 2015

AhmedSoliman commented Apr 5, 2015

ericdolson commented Apr 24, 2015

syymza commented Jun 25, 2015

CWSpear commented Jun 29, 2015

Ovidiu-S commented Jul 16, 2015

jezzalaycock commented Jul 16, 2015

felipesabino commented Jul 21, 2015

CWSpear commented Jul 21, 2015

felipesabino commented Jul 22, 2015

Ovidiu-S commented Aug 12, 2015

felipesabino commented Aug 12, 2015

Ovidiu-S commented Aug 13, 2015

bendpx commented Sep 8, 2015

barockok commented Feb 4, 2016

Ovidiu-S commented Feb 4, 2016

hyperlink commented Feb 4, 2016

jezzalaycock commented Feb 4, 2016

mlcloudsec commented Mar 25, 2016

rtorrero commented Apr 29, 2016 •

edited

Loading

springuper commented May 6, 2016 •

edited

Loading

pcothenet commented Jul 12, 2016

snatesan commented Jul 13, 2016

Keep on getting FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #90

Keep on getting FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #90

Comments

pradeepsimha143 commented Aug 25, 2014

jezzalaycock commented Aug 27, 2014

jcastill0 commented Mar 17, 2015

jcastill0 commented Mar 23, 2015

AhmedSoliman commented Apr 5, 2015

ericdolson commented Apr 24, 2015

syymza commented Jun 25, 2015

CWSpear commented Jun 29, 2015

Ovidiu-S commented Jul 16, 2015

jezzalaycock commented Jul 16, 2015

felipesabino commented Jul 21, 2015

CWSpear commented Jul 21, 2015

felipesabino commented Jul 22, 2015

Ovidiu-S commented Aug 12, 2015

felipesabino commented Aug 12, 2015

Ovidiu-S commented Aug 13, 2015

bendpx commented Sep 8, 2015

barockok commented Feb 4, 2016

Ovidiu-S commented Feb 4, 2016

hyperlink commented Feb 4, 2016

jezzalaycock commented Feb 4, 2016

mlcloudsec commented Mar 25, 2016

rtorrero commented Apr 29, 2016 • edited Loading

springuper commented May 6, 2016 • edited Loading

pcothenet commented Jul 12, 2016

snatesan commented Jul 13, 2016

rtorrero commented Apr 29, 2016 •

edited

Loading

springuper commented May 6, 2016 •

edited

Loading