Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metasrv] Support sled tree level transaction #2422

Closed
2 of 10 tasks
Tracked by #2525
ariesdevil opened this issue Oct 25, 2021 · 14 comments
Closed
2 of 10 tasks
Tracked by #2525

[metasrv] Support sled tree level transaction #2422

ariesdevil opened this issue Oct 25, 2021 · 14 comments
Assignees
Labels
A-meta Area: databend meta serive v0.6 v0.6 version

Comments

@ariesdevil
Copy link
Collaborator

ariesdevil commented Oct 25, 2021

Summary

Support sled tree level transaction for metasrv

Tasks
Implementation path:

@ariesdevil ariesdevil self-assigned this Oct 25, 2021
@drmingdrmer
Copy link
Member

There seems to be a bunch update to be made to the code base to achieve this task.

Some sub-issues will let other understand this task much easier and let other people get involved. 😃

@BohuTANG BohuTANG added the A-meta Area: databend meta serive label Oct 27, 2021
@ariesdevil ariesdevil added the v0.6 v0.6 version label Nov 1, 2021
@BohuTANG BohuTANG changed the title [Store] Support sled tree level transaction [metasvr] Support sled tree level transaction Nov 14, 2021
@ariesdevil
Copy link
Collaborator Author

ariesdevil commented Nov 16, 2021

Implementation path:

  1. Identify all the codes that need transaction support.
  2. Define the necessary transaction API and data structures.
  3. Test.

Current problems

  1. sled's transaction API does not currently support range operation. (for now usage, we can bypass it)
  2. sled's transaction API is not well designed. (seems the sled author has not worked on it for a long time, we can consider maintaining this project ourselves).

problems ref: spacejam/sled#1143 (comment)

@ariesdevil
Copy link
Collaborator Author

ariesdevil commented Nov 16, 2021

current impl: #2871 #3102

The transaction API that discussed looks like this:

struct StateMachine {

    fn apply() {
        self.tree.transaction(|x:view| {
            let ourSledTree = TxnSledTree::from(x)
            let res = self.apply_cmd(ourSledTree, cmd);
            match res {
                // translate to our error code.
            }
        })
    }
}

trait TreeAPI {
    fn get();
    fn insert();
    fn remove();
    fn apply_batch();
}

struct TxnSledTree {

}

impl TreeAPI for TxnSledTree {}

impl TreeAPI for SledTree {}

@ariesdevil ariesdevil changed the title [metasvr] Support sled tree level transaction [metasrv] Support sled tree level transaction Nov 23, 2021
@drmingdrmer
Copy link
Member

Maybe it is time to have a discussion about what the plan for next move is about this issue?

@ariesdevil
Copy link
Collaborator Author

Yup, I'll list the ideal transaction APIs that I want the KV store to have tomorrow, and then we can discuss if we can achieve the goal.

@jyizheng
Copy link
Contributor

jyizheng commented Dec 7, 2021

I guess transaction is needed here because the range scan is not atomic in Bw-tree by design. In other words, sled (an implementation of bwtree) does not support snapshot isolation, which is very important for analytic queries. The paper below provides a way to implement snapshot isolation for another KV store called Kvell.

https://www.usenix.org/system/files/osdi20-lepers.pdf

The workload might be smaller than implementing a transaction manager from scratch.

@drmingdrmer
Copy link
Member

@jyizheng
Thank you man!

@ariesdevil
Copy link
Collaborator Author

Some investigation:

  1. Bw-tree used by Microsoft also not support the interactive transaction, they use a transaction component for this.
  2. CMU Database Group and Intel Labs do much hard work to implement a bw-tree based store, but the performance not as high as expected

In this work, we introduced OpenBw-Tree, our clean slate Bw-Tree
implementation. OpenBw-Tree incorporates a number of optimizations that were not described in the original Bw-Tree papers. Experimental results show that OpenBw-Tree outperforms the original
Bw-Tree design. Nevertheless, even our optimized OpenBw-Tree,
is still considerably slower than other state-of-the-art in-memory
index structures like SkipList, Masstree and ART. OpenBw-Tree is
also slower than a B+Tree that uses optimistic lock coupling, which
indicates that lock-freedom does not always pay off in comparison
with modern lock-based synchronization techniques.

  1. Rocksdb support both snapshot isolation and interactive transaction.

cc: @drmingdrmer @jyizheng @BohuTANG

@ariesdevil
Copy link
Collaborator Author

ariesdevil commented Dec 8, 2021

As a result, Sled meets our current needs, although the transaction implementation is not so elegant.

@ariesdevil
Copy link
Collaborator Author

ariesdevil commented Dec 8, 2021

Some concerns for switching to Rocksdb:

  1. Rocksdb has too many configs that need more time to learn and tune.
  2. Rocksdb is a C++ project, for personal reasons, I don't want a C++ project integrated into our Rust project :)
  3. Rocksdb's main goal is to support Facebook(Meta)'s MyRocks, so many features we don't want to have but if we use rocksdb, we may have to maintain it, at least we need to be familiar with its codes.

@ariesdevil
Copy link
Collaborator Author

I think we should list our metasrv roadmap to see if we have more performance and functional requirements, then we can make decisions to switch or not.

@drmingdrmer
Copy link
Member

spacejam/sled#1390

@drmingdrmer
Copy link
Member

@ariesdevil Let's update status on this issue:DDD

@ariesdevil
Copy link
Collaborator Author

We leave an issue(#3309 ) and close this issue now, if you have any ideas, feel free to reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-meta Area: databend meta serive v0.6 v0.6 version
Projects
None yet
Development

No branches or pull requests

4 participants