-
Notifications
You must be signed in to change notification settings - Fork 0
ZooKeeper Data Layout
We use Apache ZooKeeper to store metadata of streams. On a per stream basis, we have the following znodes:
- segment table (size is function of number of segments)
- history table (size is function of number of scale operations)
- index table (size is function of number of scale operations)
- TruncationRecord (size is function of number of deleted segments)
- SealedSegmentsRecord (size is function of number of sealed segments)
- configuration (constant size)
- state (constant size)
- transaction records (active txns, completed txns).
According to this list, we have 7 znodes per stream, and an arbitrary number of txn records that depend on the volume of transactions. At the moment, we do not clean up txn records, so they will accumulate over time, but this is an issue we will have tackle sooner or later. Assuming an average of 10k bytes (rough estimate, not based on any deployment information) per stream on ZK, we can sustain about 100k streams at the moment per GB of ZK metadata.
There is no znode specific to a segment. Segments are recorded as part of the stream metadata. Consequently, we can have many more segments than we can have streams, which is actually desirable.
Pravega - Streaming as a new software defined storage primitive
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn timeouts
- PDP-21: Protocol revisioning
- PDP-22: Bookkeeper based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross routing key ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-end Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move controller metadata to KVS
- PDP-36: Connection pooling
- PDP-37: Server-side compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables
- PDP-40: Consistent order guarantees for storage flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Healthcheck
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Message Queues