-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to have read replicas backed by same set of SST files in the remote storage? #116
Comments
You would have to clone the original db and use the clone on another machine. cloning the db means that the saame set of cloud-sst files are used by the other instance as well. One example here: https://github.com/rockset/rocksdb-cloud/blob/master/cloud/examples/clone_example.cc |
Thanks @dhruba for the prompt response. Few more questions related this,
|
Hi @sanayak, when you open a rocksdb-cloud instance you specify a source cloud path and a destination cloud path. rocksdb-cloud will use all the files in those two directories to open the database. All new files that are produced by the db will be written out to the destination-cloud-path only. If you have multiple instance using the same database, then one will be the primary instance and one will be the replica instance. The primary instance will open the db with (srcpath=A, destpath=A). New writes to the primary are written to the cloud bucket path called A. If you want a replica, open the same rocksdb-cloud database with (srcpath= A, destpath=B). The replica will start off with a clone of the primary database at the time when the db is opened from path A. Any files created by the compaction process on the replica database will be written to path B. The replica db does not automatically follow the updates from the primary db, it is an independent clone db and has its own life. You can write to the clone if you want. If you want the replica db to follow the updates in the primary db, then periodically close the replica db and reclone it from the primary by reopening it with (srcpath=A, destpath=C). |
Thanks a lot @dhruba for the detailed explanation. Couple of follow-up questions :
|
|
Thanks @dhruba .
Just curious if there exists any mechanism to move the SSTs across the storage tiers (from local storage to remote storage) based on the time it got flushed/closed. Is it possible to define such tiering policies with rocksdb-cloud? That would approximately simulate the hot vs cold data semantics from rocksdb perspective. |
You might be able to use CheckPointToCloud() to manually copy all local sst files to the cloud https://github.com/rockset/rocksdb-cloud/blob/master/include/rocksdb/cloud/db_cloud.h#L55 |
Thanks @dhruba Looking at the implementation of the CheckPointToCloud() in https://github.com/rockset/rocksdb-cloud/blob/master/cloud/db_cloud_impl.cc it moves all the SST files to cloud storage. I think it might need some different implementation if we need to filter out SSTs which do not qualify for copying (i.e. SSTs which are created very recently). |
Hi @dhruba, I noticed that in rockset read replica, the read replica would tail new update from log. Does that mean every read replica would do compaction too? Wouldn't that be a waste of work? Is there a way to avoid that by having the read replica to share the same file with primary after primary compaction without reopen read replica? |
You can configure your rocksdb-cloud instances so that your replicas do not need to do compaction. The replica instance can periodically re-open the db with ephemeral_resync_on_open = true. |
Rocksdb-cloud provides a way to put the SSTs on remote storage for durability and to separate the compute from storage. As I understand, the main the benefit of using rocksdb-cloud is to scale the compute independent of the storage.
We have one instance of rocksdb-cloud running and which puts the SSTs file on the remote storage such S3 for example. Now, if we are not able to serve the read requests for the one instance of the rocksdb-cloud because of the compute limitations of one instance, is it possible to have read replicas which are backed by same set of SSTs in the remote storage, so that read requests can be load balanced across the read replicas to get the desired throughput.
The text was updated successfully, but these errors were encountered: