-
Notifications
You must be signed in to change notification settings - Fork 0
Stream Metadata Layout (draft)
Stream metadata is primarily stored in Table Segments. Here we summarize the main data structures used for stream metadata, and the main reference used to extract this information is this PDP:
https://github.com/pravega/pravega/wiki/PDP-32:-Improve-scalability-of-Controller-Metadata
Three data structures that are present for all streams are:
- Stream segment record: contains stream segment information -- size is 32 bytes
- Epoch record: contains information about an epoch and the segments the epoch comprises -- size 16 bytes + (# of segments) * 32 bytes
- History time series record: captures the delta between two consecutive epochs -- size is 16 + (# of segments created/sealed) * 32 bytes
When segments are sealed, they are added to a sealed segment shard. The segments are sharded to avoid having a single data structure growing large as may segments are sealed over time. There is one single data structure we need:
- Sealed segment map shard: contains a map of segments ids to size -- size is (# of segments in the shard) * 16 bytes
In the case the stream has a retention policy configured, the following data structures are also created and stored:
- Retention set: contains a list of stream cut reference records -- size is (# of records) * 16 bytes
- Stream cut reference record: records time and size -- size is 16 bytes
- Stream cut record: contains stream cut information for each cut-- size is 16 + (# of segments in stream cut) * 16
In general, it is difficult to estimate the byte overhead of stream metadata, since it changes as the stream evolves, and may contain temporary information such as transaction metadata.
Pravega - Streaming as a new software defined storage primitive
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn timeouts
- PDP-21: Protocol revisioning
- PDP-22: Bookkeeper based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross routing key ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-end Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move controller metadata to KVS
- PDP-36: Connection pooling
- PDP-37: Server-side compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables
- PDP-40: Consistent order guarantees for storage flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Healthcheck
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Message Queues