-
Notifications
You must be signed in to change notification settings - Fork 212
rocksdb performance bottom up
Tianyi Wang edited this page Jan 11, 2016
·
8 revisions
#Dismember an ox as Skillfully as a Butcher – understand the performance of our system step-by-step
- Performance of our basic components
- Local procedure calls (LPC via tasking::enqueue)
1. Intra-thread => how many cycles for each local EMPTY task
Tianyi: 1 thread, 3.1 million/s
Zhenyu: understand why 700+ instructions for each EMPTY task
1. Inter-thread => what is the cost of task-queue
Tianyi: 2 threads, 0.2 million/s
Zhenyu: is it reasonable overhead? What is the raw performance of the synchronization primitives used in task queues?
- Network providers => see whether we can achieve the raw device performance
1. Local machine
1. Remote machine - Rpc frameworks with simulated network provider => what is the CPU cost of rpc stack in rDSN
- Rpc framework with native/fast network provider => end-to-end rpc performance
1. Local machine
Tianyi: 280k/s
Zhenyu: What is the setting? Seems too good compared to 1.1.2
Tianyi: 1.1.2 is a blocking tic-tock alternatively en-queuing benchmark. 1. Remote machine - Aio performance => see whether we can achieve the raw device performance
1. Read/write performance on disk
1. Read/write performance on SSD - Synchronization primitives?
- Computation in replication => gap between this and 1.3 should be small~
- Start 1 replica server only and simulated network to avoid network communication
- Use empty aio provider or RAM disk to minimize disk operations
- Use 1 partition and max_replica_degree = 1 to minimize the interference of meta server and cross-thread contentions in replica server
Tianyi: qps = 100k for empty request, 40k for write request
Zhenyu: slowdown is too much even compared to 1.4, why? - Network + computation performance in replication/1 server => gap between this and 1.4 should be small
-
- native network provider atop of step 2
- IO performance of mutation log w/ native aio provider => gap between this and 1.5 should be small
- Write
- Replay
- Network + computation + IO performance in replication/1 server => gap between this and 1.5 should be small
-
- native aio provider atop of step 3
- Network + computation performance + throttling in replication/1 server
-
- large client concurrency atop of step 3
- Network + computation performance in replication/2 servers => ???
-
- replica_count = 2 atop of step 3
- Network + computation performance in replication/3 servers => ???
-
- replica_count = 3 atop of step 3
- // TODO: more combinations
- End-to-end using nativerun tool
- End-to-end using fastrun tool
To-be-fixed:
throttling mechanism
Content