-
Notifications
You must be signed in to change notification settings - Fork 0
PDP 22: BK based Storage (Tier 2) for Pravega
Status: Under Discussion
Related issues: 1949, 431
There are multiple reasons for which users or Pravega may want a Tier-2 based on Apache bookkeeper. Here are a few of these:
- Trial deployments: The user just wants a very small distributed Pravega deployment without the hassle of deploying and managing a tier-2 storage. Currently Pravega needs a deployment of tier-1 (Apache Bookkeeper) as well as Tier-2. This may be too much of effort to try out Pravega initially.
- Deployment with small footprint: Users may want to use Pravega for not storing data long term. By using Apache Bookkeeper as Tier-2, they can use it as a temporary tier-2 for the short life time of the data.
- Two BK deployments: User deploys two Apache bookkeeper deployments, one configured for storing small amount of data on fast disks. This is used as tier-1. Another is configured to use relatively slower disks which can store large amounts of data.
Currently data is written to tier-1 in the form of durablelog and then written to Tier-2 permanently. If user has single deployment of Apache bookkeeper, it may make sense to store data directly in tier-1 instead of storing the data in the form of durablelog and then writing to tier-2.
If the user is using the same instance of apache bookkeeper deployment, it may be inefficient to write to the same deployment as a durablelog and then come back and write to the same as a tier-2 storage.
Status: Implementing this involves changing the basic design of Pravega. This needs more discussions and change in basic approach. It is not recommended to follow this approach.
This approach involves implementation of the Storage
interface using bookkeeper.
We have decided that we will go with option 2.
Apache bookkeeper provides low level primitives for a write ahead log. We need to build infrastructure around it called "Storage ledger" to ensure that it can be used as a general purpose storage implementation. Irrespective of the approach above, a storage ledger implementation is necessary. A storage ledger ties together number of bookkeeper ledgers together to represent one storage entity similar to a file. The metadata for this entity is stored in zookeeper.
Here are some of the interfaces that we need to implement the Storage
interface and the way it can be implemented:
-
Append at an offset / end: Bookkeeper ledger APIs support this.
-
Read at a given offset (Random access for read) Bookkeeper allows read at a given entry through
LedgerHandle.read
API. We need to build an algorithm to map offset to entry-id. -
Concat (atomic if possible) Concat will be a metadata operation.
-
Truncate (optional currently) This can be a metadata operation.
-
Update read/write permission This will be a metadata operation.
-
Store metadata Metadata will be stored in zookeeper.
Currently Pravega already has one implementation of storage ledger in BookkeeperLog.java
. This is more geared towards the durable log implementation and does not have a good amount of behavior as expected from storage interface.
Open Question: Shall we improve the existing BookkeeperLog
to implement a generic storage OR implement a different storage ledger for tier-2 purpose.
Update: It was decided that we will have separate implementations for bookkeeper based tier-1 and tier-2. This is mainly because the expectations and behavior of tier-1 and tier-2 are different.
In case ownership of the StorageLedger changes before the owner closes it successfully, it is possible that LAC may not be updated to the latest one. Because of this, data may not be visible to the current owner immediately. Pravega uses tier-2 to store segment states as well. This needs to be updated as well as immediately visible for correct representation of the segment state.
To overcome this shortcoming, openRead
and openWrite
both need to fence and own the ledger completely. Readonly as well as read-write access needs to explicitly own the ledger.
Pravega - Streaming as a new software defined storage primitive
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn timeouts
- PDP-21: Protocol revisioning
- PDP-22: Bookkeeper based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross routing key ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-end Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move controller metadata to KVS
- PDP-36: Connection pooling
- PDP-37: Server-side compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables
- PDP-40: Consistent order guarantees for storage flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Healthcheck
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Message Queues