Skip to content
This repository has been archived by the owner on Jan 8, 2019. It is now read-only.

Zookeeper connections are precious and other ZK administrivia #52

Closed
cbaenziger opened this issue Dec 17, 2014 · 1 comment
Closed

Zookeeper connections are precious and other ZK administrivia #52

cbaenziger opened this issue Dec 17, 2014 · 1 comment

Comments

@cbaenziger
Copy link
Member

If one is running a large cluster and does not scale node[:bcpc][:hadoop][:zookeeper][:maxClientCnxns] then they will have the joys of Zookeeper unavailability and spewing log entries like:

2014-12-16 17:11:38,921 [myid:12] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /1.2.3.4 - max is 500

Ideally, we can create monitoring in Zabbix from the provided JMX metrics:
org.apache.ZooKeeperService -> ReplicatedServer_idnodeNumber -> replica.nodeNumber -> Attributes-> Follower ->

  • Attributes ->
    • PendingRevalidationCount
    • AvgRequestLatency
    • MaxRequestLatency
    • MinRequestLatency
    • NumAliveConnections
    • OutstandingRequests
    • PacketsReceived
    • PacketsSent
  • Connections ->
    • client IP ->
      • connection ptr ->
        • OutstandingRequest
        • PacketsReceived
        • PacketsSent
        • MinRequestLatency
        • MaxRequestLatency
        • AvgRequestLatency
        • LastLatency
    • ...
  • InMemoryDataTree ->
    • Attributes ->
      • NodeCount
      • WatchCount

The question is how to see rejected connections which I'm not seeing here. Regardless I think a lot of useful cluster monitoring can be done here.

@cbaenziger cbaenziger changed the title Zookeeper connections are precious Zookeeper connections are precious and other ZK administrivia Dec 17, 2014
@cbaenziger
Copy link
Member Author

Some other things to likely care about (from http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_advancedConfiguration) are:

Auto-Purging Snapshots and Transaction Logs

 autopurge.snapRetainCount

    (No Java system property)

    New in 3.4.0: When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. Minimum value is 3.
autopurge.purgeInterval

    (No Java system property)

    New in 3.4.0: The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.

Leader only Coordinates

If we have more than three node in the quorum and are running for Kafka we probably want this.

 leaderServes

    (Java system property: zookeeper.leaderServes)

    Leader accepts client connections. Default value is "yes". The leader machine coordinates updates. For higher update throughput at thes slight expense of read throughput the leader can be configured to not accept clients and focus on coordination. The default to this option is yes, which means that a leader will accept client connections.
    Note

    Turning on leader selection is highly recommended when you have more than three ZooKeeper servers in an ensemble.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants