Skip to content

ZooKeeper Data Layout

Flavio Junqueira edited this page Jan 10, 2018 · 2 revisions

We use Apache ZooKeeper to store metadata of streams. On a per stream basis, we have the following znodes:

  1. segment table (size is function of number of segments)
  2. history table (size is function of number of scale operations)
  3. index table (size is function of number of scale operations)
  4. TruncationRecord (size is function of number of deleted segments)
  5. SealedSegmentsRecord (size is function of number of sealed segments)
  6. configuration (constant size)
  7. state (constant size)
  8. transaction records (active txns, completed txns).

According to this list, we have 7 znodes per stream, and an arbitrary number of txn records that depend on the volume of transactions. At the moment, we do not clean up txn records, so they will accumulate over time, but this is an issue we will have tackle sooner or later. Assuming an average of 10k bytes (rough estimate, not based on any deployment information) per stream on ZK, we can sustain about 100k streams at the moment per GB of ZK metadata.

There is no znode specific to a segment. Segments are recorded as part of the stream metadata. Consequently, we can have many more segments than we can have streams, which is actually desirable.

Clone this wiki locally