feat(backup): 4. meta server send backup_request to replica #1112

hycdong · 2022-08-12T07:56:34Z

This pull request implement that meta server send backup_request to replica server:

meta server create backup_engine instance by init_backup, already in #1102
write app_metadata file into remote file system by calling write_app_info
update app backup_status from UNINITIALIZED into CHECKPOINTING by calling update_backup_item_on_remote_storage
send backup_request to all primary partition by calling backup_app_partition
handle response and resend request by calling on_backup_reply, will be implemented in further pr

Besides, this pr also update following related content:

update backup_request and backup_response in thrift
implement some common functions in backup_restore_common
rename cold_backup_constant into backup_constant
update cold_backup_root config into gflags
add related unit test

foreverneverer · 2022-08-17T03:28:42Z

src/rdsn/src/common/backup.thrift

+    3:i64               backup_id;
+    4:backup_status     status;
+    5:optional dsn.error_code   checkpoint_err;
+    6:optional dsn.error_code   upload_err;


why need two error_code? using one-error_code + hint is not ok?

Good idea, meta server can distinguish error with its backup_status.
Done.

Ok, what the different between 1:dsn.error_code err and 5:optional dsn.error_code checkpoint_upload_err;?

err is used for backup_request rpc, such as ERR_OK, ERR_INVALID_STATE. checkpoint_upload_err is used for backup checkpoint and upload. If backup checkpoint failed, checkpoint_upload_err won't be error_ok.

So I think that just err is still OK. In other rpc, I remember that we have always defined it using one err.

With your design, the sender needs to judge the response as follows:

if (rpc == ok) { if ( resp.err == ok) { if (resp.checkpoint_err == ok) { /** if you define `more clear err code as you say`, maybe need:**/ /** if (err_a == ...) { if (err_b == ...) } } } }

This logic is also redundant. don't suggest this design, you can refer the origin rpc defination, I think the origin is elegant

Actually, bulk load still seperete rpc_error, download_error and ingestion_err, you can reference sturcture partition_bulk_load_state.

And in my current design, won't have the too many if condition in your example. In my design, it just like

if (rpc != ok) { // handle this condition } if (resp.err != ok) { // handle this condition } if (resp.checkpoint_upload_err != ok) { // handle this condition }

If I combine error and checkpoint_upload_err into one, the code will be like:

if (rpc != ok) { // handle this condition } if (resp.err != ok) { if (original resp.err != ok){ // handle this condition } if (original resp.checkpoint_upload_err != ok ) { // handle this condition } }

I think it is okay to have two errors to distinguish different errors, if there is only one field redundant, but can make structure clear. In my previous design, checkpoint error and upload error also should be seperated, I have already compromised my logic :-)

Well, wait other's opinion, if they have no objections, I will be willing think it is a reasonable design

@acelyc111 Please give us your suggestion about this comment~ @foreverneverer has different opion with me, and we can not persude each other.

If I didn't miss anything, I didn't find where checkpoint_upload_err is used currently. We keep this point in mind and see how will it be used later, if find it's not so elegant, we can point it out then.

src/rdsn/src/common/backup_restore_common.cpp

acelyc111 · 2022-08-17T04:22:49Z

src/rdsn/src/common/backup.thrift

@@ -64,24 +64,28 @@ struct configuration_restore_request
    9:optional string   restore_path;
 }

+// meta -> replica
 struct backup_request


Will you use another RPC code? Is there any problem if a user use old version shell-tool attempt to control the cluster?

This rpc is not used from shell-tool to meta server, is meta server to replica server, won't trigger the control problem.

OK, thanks.
I meant how to keep compatablity, if the meta servers are in new version, and the replica server are in old version, what will happen if we ask the cluster to do backup? Is it neccessary to add new rpc code for the new implemention?

If meta server is in new version, replica server in old version. I don't consider this condition compatible, I think it is a dangerous case for meta server and replica server has different version, because new version especially a feature version, meta server will provide mamy new rpc which replica server can not recognize, and it is not necessary to add new rpc code for compatible only when new meta and old replica case.

Rolling update is a common case which many user have to face. We will not ensure every feature should work well at this case, but at least avoid the server crash.
If we rewrite the RPC message, the related RPC code would better to add a new one, and left the old one as deprecated.

Rolling update is a common case, but it is not recommended to update meta firstly, we recommendly update replica server firstly, so it is a seldom case that meta is new version but replica is old version.

No matter which one is new version. The new replica server will still crash if receive a old backup request from old version meta server?

src/rdsn/src/common/backup_restore_common.h

src/rdsn/src/common/backup_restore_common.cpp

src/rdsn/src/common/backup_restore_common.h

src/rdsn/src/meta/meta_backup_engine.cpp

src/rdsn/src/meta/meta_backup_engine.h

src/rdsn/src/meta/test/meta_backup_engine_test.cpp

acelyc111

LGTM
But still doubt if there is any compatablity problem.

hycdong · 2022-08-19T06:04:43Z

LGTM But still doubt if there is any compatablity problem.

Okay, I will add compatiable explaination after all code merged.

feat: implement meta send backup_request to replica

610456c

github-actions bot added cpp thrift labels Aug 12, 2022

hycdong added the component/backup_restore label Aug 15, 2022

hycdong marked this pull request as ready for review August 15, 2022 02:49

foreverneverer reviewed Aug 17, 2022

View reviewed changes

acelyc111 reviewed Aug 17, 2022

View reviewed changes

update by cr

9c6dda9

acelyc111 reviewed Aug 18, 2022

View reviewed changes

update by cr

162e45a

acelyc111 approved these changes Aug 19, 2022

View reviewed changes

foreverneverer approved these changes Aug 19, 2022

View reviewed changes

hycdong merged commit e2e3f7e into apache:backup_restore-dev Aug 19, 2022

hycdong deleted the backup_5 branch August 19, 2022 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backup): 4. meta server send backup_request to replica #1112

feat(backup): 4. meta server send backup_request to replica #1112

hycdong commented Aug 12, 2022 •

edited

Loading

foreverneverer Aug 17, 2022

hycdong Aug 17, 2022 •

edited

Loading

foreverneverer Aug 17, 2022

hycdong Aug 17, 2022

foreverneverer Aug 17, 2022

foreverneverer Aug 17, 2022 •

edited

Loading

hycdong Aug 18, 2022

foreverneverer Aug 18, 2022 •

edited

Loading

hycdong Aug 18, 2022

acelyc111 Aug 18, 2022 •

edited

Loading

acelyc111 Aug 17, 2022

hycdong Aug 17, 2022

acelyc111 Aug 18, 2022 •

edited

Loading

hycdong Aug 18, 2022

acelyc111 Aug 18, 2022

hycdong Aug 18, 2022

acelyc111 Aug 19, 2022 •

edited

Loading

acelyc111 left a comment

hycdong commented Aug 19, 2022

feat(backup): 4. meta server send backup_request to replica #1112

feat(backup): 4. meta server send backup_request to replica #1112

Conversation

hycdong commented Aug 12, 2022 • edited Loading

Choose a reason for hiding this comment

hycdong Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foreverneverer Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foreverneverer Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acelyc111 Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acelyc111 Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acelyc111 Aug 19, 2022 • edited Loading

Choose a reason for hiding this comment

acelyc111 left a comment

Choose a reason for hiding this comment

hycdong commented Aug 19, 2022

hycdong commented Aug 12, 2022 •

edited

Loading

hycdong Aug 17, 2022 •

edited

Loading

foreverneverer Aug 17, 2022 •

edited

Loading

foreverneverer Aug 18, 2022 •

edited

Loading

acelyc111 Aug 18, 2022 •

edited

Loading

acelyc111 Aug 18, 2022 •

edited

Loading

acelyc111 Aug 19, 2022 •

edited

Loading