forked from pravega/pravega
-
Notifications
You must be signed in to change notification settings - Fork 0
PDP 22: BK based Storage (Tier 2) for Pravega
arvindkandhare edited this page Oct 6, 2017
·
9 revisions
Status: Under Creation
Related issues:
There are multiple reasons for which users or Pravega may want a Tier-2 based on Apache bookkeeper. Here are a few of these:
- Trial deployments: The user just wants a very small distributed Pravega deployment without the hassle of deploying and managing a tier-2 storage. Currently Pravega needs a deployment of tier-1 (Apache Bookkeeper) as well as Tier-2. This may be too much of effort to try out Pravega initially.
- Deployment with small footprint: Users may want to use Pravega for not storing data long term. By using Apache Bookkeeper as Tier-2, they can use it as a temporary tier-2 for the short life time of the data.
- Two BK deployments: User deploys two Apache bookkeeper deployments, one configured for storing small amount of data on fast disks. This is used as tier-1. Another is configured to use relatively slower disks which can store large amounts of data.
- Write data only once to bookkeeper: Currently data is written to tier-1 in the form of durablelog and then written to Tier-2 permanently. If user has single deployment of Apache bookkeeper, it may make sense to store data directly in tier-1 instead of storing the data in the form of durablelog and then writing to tier-2 If the user is using the same instance of apache bookkeeper deployment, it may be inefficient to write to the same deployment as a durablelog and then come back and write to the same as a tier-2 storage. Status: This needs more discussions and change in basic approach. This needs to be discussed further.
- Use Apache bookkeeper as any other tier-2 We have decided that we will go with option 2.
Apache bookkeeper provides low level primitives for a write ahead log. We need to build infrastructure around it called "Storage ledger" to ensure that it can be used as a general purpose storage implementation.
Irrespective of the approach above, a storage ledger implementation is necessary. Here are some of the interfaces that we need to implement the Storage
interface:
- Append at an offset / end
- Read at a given offset (Random access for read)
- Concat (atomic if possible)
- Truncate (optional currently)
- Update read/write permission
- Store metadata
Currently Pravega already has one implementation of Managed ledger in
BookkeeperLog.java
. This is more geared towards the durable log implementation and does not have a good amount of behavior as expected from storage interface. Open Question: Shall we improve the existing managed ledger to implement a generic storage OR implement a different managed ledger for tier-2 purpose.
Pravega - Streaming as a new software defined storage primitive
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn timeouts
- PDP-21: Protocol revisioning
- PDP-22: Bookkeeper based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross routing key ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-end Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move controller metadata to KVS
- PDP-36: Connection pooling
- PDP-37: Server-side compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables
- PDP-40: Consistent order guarantees for storage flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Healthcheck
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Message Queues