forked from valkey-io/valkey-doc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RDMA is the abbreviation of remote direct memory access. It is a technology that enables computers in a network to exchange data in the main memory without involving the processor, cache, or operating system of either computer. This means RDMA has a better performance than TCP, the test results show Valkey Over RDMA has a ~2.5X QPS and lower latency. In recent years, RDMA gets popular in the data center, especially RoCE(RDMA over Converged Ethernet) architecture has been widely used. Cloud Vendors also start to support RDMA instance in order to accelerate networking performance. End-user would enjoy the improvement easily. Introduce Valkey Over RDMA protocol as a new transport for Valkey. For now, we defined 4 commands: - GetServerFeature & SetClientFeature: the two commands are used to negotiate features for further extension. There is no feature definition in this version. Flow control and multi-buffer may be supported in the future, this needs feature negotiation. - Keepalive - RegisterXferMemory: the heart to transfer the real payload. The 'TX buffer' and 'RX buffer' are designed by RDMA remote memory with RDMA write/write with imm, it's similar to several mechanisms introduced by papers(but not same): - Socksdirect: datacenter sockets can be fast and compatible <https://dl.acm.org/doi/10.1145/3341302.3342071> - LITE Kernel RDMA Support for Datacenter Applications <https://dl.acm.org/doi/abs/10.1145/3132747.3132762> - FaRM: Fast Remote Memory <https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf> Link: valkey-io/valkey#477 Co-authored-by: Xinhao Kong <[email protected]> Co-authored-by: Huaping Zhou <[email protected]> Co-authored-by: zhuo jiang <[email protected]> Co-authored-by: Yiming Zhang <[email protected]> Co-authored-by: Jianxi Ye <[email protected]> Signed-off-by: zhenwei pi <[email protected]>
- Loading branch information
1 parent
f4ce160
commit a84c13b
Showing
3 changed files
with
179 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
--- | ||
title: "RDMA support" | ||
linkTitle: "RDMA support" | ||
description: Valkey Over RDMA support | ||
--- | ||
|
||
Valkey supports the RDMA connection type as a module that requires compilation | ||
as a shared library and dynamic loading on demand. | ||
|
||
## Getting Started | ||
|
||
RDMA stands for Remote Direct Memory Access. It is a technology that allows computers in | ||
a network to exchange data directly in the main memory, bypassing the involvement of processors, | ||
caches, or operating systems on either computer. | ||
As a result, RDMA offers better performance compared to TCP. Test results indicate that | ||
Valkey Over RDMA achieves approximately 2 times higher QPS and lower latency. | ||
|
||
Please note that Valkey Over RDMA is currently supported only on Linux. | ||
|
||
## Running manually | ||
|
||
To manually run a Valkey server with RDMA mode: | ||
|
||
./src/valkey-server --protected-mode no \ | ||
--loadmodule src/valkey-rdma.so bind=192.168.122.100 port=6379 | ||
|
||
It's possible to change bind address/port of RDMA by runtime command: | ||
|
||
192.168.122.100:6379> CONFIG SET rdma-port 6380 | ||
|
||
It's also possible to have both RDMA and TCP available, and there is no | ||
conflict of TCP (6379) and RDMA (6379). Example: | ||
|
||
./src/valkey-server --protected-mode no \ | ||
--loadmodule src/valkey-rdma.so bind=192.168.122.100 port=6379 \ | ||
--port 6379 | ||
|
||
Note that the network interface (192.168.122.100 of this example) should support | ||
RDMA. To test a server supports RDMA or not: | ||
|
||
~# rdma dev show (a new version iproute2 package) | ||
Or: | ||
|
||
~# ibv_devices (ibverbs-utils package of Debian/Ubuntu) | ||
|
||
|
||
## Protocol | ||
|
||
Note that the protocol part defines the QP type RC (like TCP), | ||
communication commands, and payload exchange mechanism. | ||
This dependency is based solely on the RDMA (aka Infiniband) specification | ||
and is independent of both software (including the OS and user libraries) | ||
and hardware (including vendors and low-level transports). | ||
|
||
The Valkey Over RDMA protocol separates into control-plane | ||
(to exchange control message) and data-plane (to transfer the real payload for Valkey). | ||
|
||
### Control message | ||
|
||
For control message, use a fixed 32 bytes message which defines structures: | ||
```C | ||
typedef struct ValkeyRdmaFeature { | ||
/* defined as following Opcodes */ | ||
uint16_t opcode; | ||
/* select features */ | ||
uint16_t select; | ||
uint8_t rsvd[20]; | ||
/* feature bits */ | ||
uint64_t features; | ||
} ValkeyRdmaFeature; | ||
|
||
typedef struct ValkeyRdmaKeepalive { | ||
/* defined as following Opcodes */ | ||
uint16_t opcode; | ||
uint8_t rsvd[30]; | ||
} ValkeyRdmaKeepalive; | ||
|
||
typedef struct ValkeyRdmaMemory { | ||
/* defined as following Opcodes */ | ||
uint16_t opcode; | ||
uint8_t rsvd[14]; | ||
/* address of a transfer buffer which is used to receive remote streaming data, | ||
* aka 'RX buffer address'. The remote side should use this as 'TX buffer address' */ | ||
uint64_t addr; | ||
/* length of the 'RX buffer' */ | ||
uint32_t length; | ||
/* the RDMA remote key of 'RX buffer' */ | ||
uint32_t key; | ||
} ValkeyRdmaMemory; | ||
|
||
typedef union ValkeyRdmaCmd { | ||
ValkeyRdmaFeature feature; | ||
ValkeyRdmaKeepalive keepalive; | ||
ValkeyRdmaMemory memory; | ||
} ValkeyRdmaCmd; | ||
``` | ||
|
||
### Opcodes | ||
|Command| Value | Description | | ||
| :----: | :----: | :----: | | ||
| `GetServerFeature` | 0 | required, get the features offered by Valkey server | | ||
| `SetClientFeature` | 1 | required, negotiate features and set it to Valkey server | | ||
| `Keepalive` | 2 | required, detect unexpected orphan connection | | ||
| `RegisterXferMemory` | 3 | required, tell the 'RX transfer buffer' information to the remote side, and the remote side uses this as 'TX transfer buffer' | | ||
|
||
### Operations of RDMA | ||
- To send a control message by RDMA '**`ibv_post_send`**' with opcode '**`IBV_WR_SEND`**' with structure | ||
'ValkeyRdmaCmd'. | ||
- To receive a control message by RDMA '**`ibv_post_recv`**', and the received buffer | ||
size should be size of 'ValkeyRdmaCmd'. | ||
- To transfer stream data by RDMA '**`ibv_post_send`**' with opcode '**`IBV_WR_RDMA_WRITE`**' (optional) and | ||
'**`IBV_WR_RDMA_WRITE_WITH_IMM`**' (required), to write data segments into a connection by | ||
RDMA [WRITE][WRITE][WRITE]...[WRITE WITH IMM], the length of total buffer is described by | ||
immediate data (unsigned int 32). | ||
|
||
|
||
### Maximum WQEs of RDMA | ||
Currently no specific restriction is defined in this protocol. Recommended WQEs is 1024. | ||
Flow control for WQE MAY be defined/implemented in the future. | ||
|
||
|
||
### The workflow of this protocol | ||
``` | ||
valkey-server | ||
listen RDMA port | ||
valkey-client | ||
-------------------RDMA connect--------------------> | ||
accept connection | ||
<--------------- Establish RDMA -------------------- | ||
--------Get server feature [@IBV_WR_SEND] ---------> | ||
--------Set client feature [@IBV_WR_SEND] ---------> | ||
setup RX buffer | ||
<---- Register transfer memory [@IBV_WR_SEND] ------ | ||
[@ibv_post_recv] | ||
setup TX buffer | ||
----- Register transfer memory [@IBV_WR_SEND] -----> | ||
[@ibv_post_recv] | ||
setup TX buffer | ||
-- Valkey commands [@IBV_WR_RDMA_WRITE_WITH_IMM] --> | ||
<- Valkey response [@IBV_WR_RDMA_WRITE_WITH_IMM] --- | ||
....... | ||
-- Valkey commands [@IBV_WR_RDMA_WRITE_WITH_IMM] --> | ||
<- Valkey response [@IBV_WR_RDMA_WRITE_WITH_IMM] --- | ||
....... | ||
RX is full | ||
----- Register transfer memory [@IBV_WR_SEND] -----> | ||
[@ibv_post_recv] | ||
setup TX buffer | ||
<- Valkey response [@IBV_WR_RDMA_WRITE_WITH_IMM] --- | ||
....... | ||
RX is full | ||
<---- Register transfer memory [@IBV_WR_SEND] ------ | ||
[@ibv_post_recv] | ||
setup TX buffer | ||
-- Valkey commands [@IBV_WR_RDMA_WRITE_WITH_IMM] --> | ||
<- Valkey response [@IBV_WR_RDMA_WRITE_WITH_IMM] --- | ||
....... | ||
-------------------RDMA disconnect-----------------> | ||
<------------------RDMA disconnect------------------ | ||
``` | ||
|
||
The Valkey Over RDMA protocol is designed to efficiently transfer stream data and | ||
bears similarities to several mechanisms introduced in academic papers, albeit with some differences: | ||
|
||
* [Socksdirect: datacenter sockets can be fast and compatible](https://dl.acm.org/doi/10.1145/3341302.3342071) | ||
* [LITE Kernel RDMA Support for Datacenter Applications](https://dl.acm.org/doi/abs/10.1145/3132747.3132762) | ||
* [FaRM: Fast Remote Memory](https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters