Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add ‘BATCH_GET’ rpc request for read optimization #884

Closed
cauchy1988 opened this issue Jan 19, 2022 · 10 comments
Closed

Feature: add ‘BATCH_GET’ rpc request for read optimization #884

cauchy1988 opened this issue Jan 19, 2022 · 10 comments
Labels
type/enhancement Indicates new feature requests

Comments

@cauchy1988
Copy link
Contributor

cauchy1988 commented Jan 19, 2022

Feature Request

Is your feature request related to a problem? Please describe:

We would like to implement a reading optimization interface for the pegasus client

Describe the feature you'd like:

the way we wanna to realize is as follows:
(1) first, on the server side, we add a read interface with the binding task code 'RPC_RRDB_RRDB_BATCH_GET'. This new interface of each server handles the reading of multiple items in the same partition. The items being requested can have different hashkey which is the key difference to the 'RPC_RRDB_RRDB_MULTI_GET' interface
(2) second, we realize a interface in the client side, taking java client as an example, the declaration is as following:

public int batchGet3(List<Pair<byte[], byte[]>> keys, List<Pair<PException, byte[]>> results, int timeout) throws PException

(3)the main design is as follows:
a、RPC_RRDB_RRDB_BATCH_GET request rpc protocol

struct batch_get_request {
    1:list<full_key> keys;
}
 
struct full_key {
    1:base.blob hash_key;
    2:base.blob sorted_key;
}
 
struct batch_get_response {
    1:i32               error;
    2:list<full_data>   data;
    3:i32               app_id;
    4:i32               partition_index;
    6:string            server;
}
 
struct full_data {
    1:base.blob hash_key;
    2:base.blob sorted_key;
    3:base.blob value;
    4:bool exists //true 表述数据存在, false 表示数据不存在
}
 
service rrdb
{
  batch_get_response batch_get(1:batch_get_request request);
}

b、The main process of the client side
image

Looking forward to your opinions!

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

We have done a performance test with JMeter on a three node machine:
The comparison of time consumption and throughput is as follows:
  latenry-average(ms) latency-median(ms) 90%(ms) 95%(ms) 99%(ms) throughput
batchGet2 1 1 2 2 4 622.0/sec
batchGet3 1 1 2 2 3 849.1/sec
We generate 1000 pieces of data per request
As can be seen in the above figure: The performance of reads of using the java client interface batchGet3 by using BATCH_GET RPC is more preferable than the client interface batchGet2 by using GET RPC

@cauchy1988 cauchy1988 added the type/enhancement Indicates new feature requests label Jan 19, 2022
@cauchy1988 cauchy1988 changed the title add ‘BATCH_GET add ‘BATCH_GET’ rpc request for read optimization Jan 19, 2022
@foreverneverer
Copy link
Contributor

foreverneverer commented Jan 21, 2022

Consider whether there is a more general resolution to support all operations, including get, set, del and so on.

One example you can refer to checkAndMutate, which can batch send different write operation, and only support same hashKey.

Refer to checkAndMutate, You may need define rdb like this:

struct batch_request
{
...
    1:list<request>   batch_list;
...
}

struct request {
    1:operation_type operation; // it is very important, you need judge the request  type
    2: base.blob       hash_key // this is diff from checkAndmutate, which don't include it
    3:base.blob        sort_key;
    4:base.blob        value;
    5:i32              set_expire_ts_seconds;
}

On server side, you just need one method to judge operation_type and forward corresponding existed method, of course, which is also like checkAndMutate handler of server side.

This just is one suggestion, I haven't make sure whether it is feasible, However, I think the interface should be as general as possible, rather than one requirement need one interface.

@foreverneverer
Copy link
Contributor

You may need define rdb like this:

You may need provide two rdb type for on_client_write and on_client_read

@cauchy1988
Copy link
Contributor Author

cauchy1988 commented Jan 21, 2022

Consider whether there is a more general resolution to support all operations, including get, set, del and so on.

One example you can refer to checkAndMutate, which can batch send different write operation, and only support same hashKey.

Refer to checkAndMutate, You may need define rdb like this:

struct batch_request
{
...
    1:list<request>   batch_list;
...
}

struct request {
    1:operation_type operation; // it is very important, you need judge the request  type
    2: base.blob       hash_key // this is diff from checkAndmutate, which don't include it
    3:base.blob        sort_key;
    4:base.blob        value;
    5:i32              set_expire_ts_seconds;
}

On server side, you just need one method to judge operation_type and forward corresponding existed method, of course, which is also like checkAndMutate handler of server side.

This just is one suggestion, I haven't make sure whether it is feasible, However, I think the interface should be as general as possible, rather than one requirement need one interface.

Thank you, it is very helpful to me; I will think more roughly and come up with a more considerable solution, when this is done, i'll consult you for later 'niubility' advise

@cauchy1988
Copy link
Contributor Author

cauchy1988 commented Jan 23, 2022

You may need define rdb like this:

You may need provide two rdb type for on_client_write and on_client_read

I feel that adding a batch interface that mixes read, write, and delete operations is not very appropriate, for the following reasons:

  1. The example of check_and_mutate you quoted is an example of cas-like operation on a hashkey, which is different from the usual read, write, and delete.
  2. The rpc interface with a single responsibility is simple and clear. As far as the kv database is concerned, there are actually not many rpc interfaces for real reading and writing.
  3. 'BATCH_GET' rpc interface style is consistent with the previous interface styles of GET, SET, MULTI_GET, etc.
  4. The subsequent functions of "batch set" and "batch del" are still under discussion. I think pegasus may not support these two functions, because the success of "partial modification" will bring additional design and consideration, while "batch get" is only for reading, Even if the read fails, it will not affect the data

@acelyc111
Copy link
Member

acelyc111 commented Jan 24, 2022

checkAndMutate is a SET type interface in all cases, it is different from batchGet/Set/Del interfaces.

  1. The responses are different, for SET/DEL type interfaces, there are no values responed, but for GET type interfaces, you should design a thrfit struct to contain values, and the related hashkeys and sortkeys along.
  2. Mix all type of requests in one RPC will make code coupled. Imagine the pains we suffered from multi_get_request.

So I support to distinguish these RPCs, and try to avoid overdesign.

@foreverneverer
Copy link
Contributor

1

  1. there are no values responed, but for GET type interfaces,

Some explanations added before as follow:

You may need provide two rdb type for on_client_write and on_client_read.

So actually we need define wrtite_request and read_request rpc

2

  1. Mix all type of requests in one RPC will make code coupled

On the contrary, I think that using a unified interface will make code maintenance easier. In addition, we can refer https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html to consider it again

3

  1. The subsequent functions of "batch set" and "batch del" are still under discussion. I think pegasus may not support these two functions, because the success of "partial modification" will bring additional design and consideration

Haha, actually, I hope support batch set to improve online irrigation data rate.

I think may need others join to decide it. @neverchanje @hycdong @levy5307 @Smityz @GiantKing

@acelyc111
Copy link
Member

Doesn't dynamodb's BatchWriteItem API only operate on puts and deletes ? BatchGetItem is for get operations.

@foreverneverer
Copy link
Contributor

Yeah, so I say that define two type, read_request and write_request, read support get and so on, write support set, del and so on.

dynamodb don't have multi operation, We can consider that if support batch_multi_get.

@cauchy1988
Copy link
Contributor Author

Yeah, so I say that define two type, read_request and write_request, read support get and so on, write support set, del and so on.

dynamodb don't have multi operation, We can consider that if support batch_multi_get.

read operation only have 'one type' that is 'get'; so if i have not misunderstood your ideas, i can implement the "batch read" rpc interface based on following thrift definitions

struct read_request {
    1:dsn.blob hash_key;
    2:dsn.blob sort_key;
}

struct batch_read_request {
    1:list<read_request> requests;
}

struct read_response {
    1:dsn.blob hash_key;
    2:dsn.blob sort_key;
    3:dsn.blob value;
    4:bool exists // true mean data exists, false means not
}

struct batch_read_response {
    1:i32               error;
    2:list<read_response>   data;
    3:i32               app_id;
    4:i32               partition_index;
    6:string            server;
}

it's almostly the same with what i have initally comed up;

'batch set' and ' batch del' rpc interface use common thrift definition and add a 'operation_type' field to the definition to differentiate them ?

@cauchy1988
Copy link
Contributor Author

Yeah, so I say that define two type, read_request and write_request, read support get and so on, write support set, del and so on.

dynamodb don't have multi operation, We can consider that if support batch_multi_get.

if i misunderstood you ideas, then i want chat with you in weixin; heihei

@cauchy1988 cauchy1988 changed the title add ‘BATCH_GET’ rpc request for read optimization feat: add ‘BATCH_GET’ rpc request for read optimization Jan 28, 2022
@cauchy1988 cauchy1988 changed the title feat: add ‘BATCH_GET’ rpc request for read optimization Feature: add ‘BATCH_GET’ rpc request for read optimization Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

3 participants