Zookeeper connections are precious and other ZK administrivia #52

cbaenziger · 2014-12-17T00:31:18Z

If one is running a large cluster and does not scale node[:bcpc][:hadoop][:zookeeper][:maxClientCnxns] then they will have the joys of Zookeeper unavailability and spewing log entries like:

2014-12-16 17:11:38,921 [myid:12] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /1.2.3.4 - max is 500

Ideally, we can create monitoring in Zabbix from the provided JMX metrics:
org.apache.ZooKeeperService -> ReplicatedServer_idnodeNumber -> replica.nodeNumber -> Attributes-> Follower ->

Attributes ->
- PendingRevalidationCount
- AvgRequestLatency
- MaxRequestLatency
- MinRequestLatency
- NumAliveConnections
- OutstandingRequests
- PacketsReceived
- PacketsSent
Connections ->
- client IP ->
  - connection ptr ->
    - OutstandingRequest
    - PacketsReceived
    - PacketsSent
    - MinRequestLatency
    - MaxRequestLatency
    - AvgRequestLatency
    - LastLatency
- ...
InMemoryDataTree ->
- Attributes ->
  - NodeCount
  - WatchCount

The question is how to see rejected connections which I'm not seeing here. Regardless I think a lot of useful cluster monitoring can be done here.

The text was updated successfully, but these errors were encountered:

cbaenziger · 2014-12-17T01:08:58Z

Some other things to likely care about (from http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_advancedConfiguration) are:

Auto-Purging Snapshots and Transaction Logs

 autopurge.snapRetainCount

    (No Java system property)

    New in 3.4.0: When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. Minimum value is 3.
autopurge.purgeInterval

    (No Java system property)

    New in 3.4.0: The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.

Leader only Coordinates

If we have more than three node in the quorum and are running for Kafka we probably want this.

 leaderServes

    (Java system property: zookeeper.leaderServes)

    Leader accepts client connections. Default value is "yes". The leader machine coordinates updates. For higher update throughput at thes slight expense of read throughput the leader can be configured to not accept clients and focus on coordination. The default to this option is yes, which means that a leader will accept client connections.
    Note

    Turning on leader selection is highly recommended when you have more than three ZooKeeper servers in an ensemble.

cbaenziger changed the title ~~Zookeeper connections are precious~~ Zookeeper connections are precious and other ZK administrivia Dec 17, 2014

cbaenziger added the Zookeeper label Dec 18, 2014

cbaenziger mentioned this issue Sep 3, 2015

Change to regularly clean ZooKeeper snapshot files #251

Merged

This was referenced Jun 27, 2018

Zookeeper Znodes Continually Growing #1232

Open

Zookeeper updates #1233

Merged

vt0r closed this as completed Nov 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zookeeper connections are precious and other ZK administrivia #52

Zookeeper connections are precious and other ZK administrivia #52

cbaenziger commented Dec 17, 2014

cbaenziger commented Dec 17, 2014

Zookeeper connections are precious and other ZK administrivia #52

Zookeeper connections are precious and other ZK administrivia #52

Comments

cbaenziger commented Dec 17, 2014

cbaenziger commented Dec 17, 2014

Auto-Purging Snapshots and Transaction Logs

Leader only Coordinates