Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backup): 4. meta server send backup_request to replica #1112

Merged
merged 3 commits into from
Aug 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions src/rdsn/src/common/backup.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -64,24 +64,27 @@ struct configuration_restore_request
9:optional string restore_path;
}

// meta -> replica
struct backup_request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you use another RPC code? Is there any problem if a user use old version shell-tool attempt to control the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rpc is not used from shell-tool to meta server, is meta server to replica server, won't trigger the control problem.

Copy link
Member

@acelyc111 acelyc111 Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks.
I meant how to keep compatablity, if the meta servers are in new version, and the replica server are in old version, what will happen if we ask the cluster to do backup? Is it neccessary to add new rpc code for the new implemention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If meta server is in new version, replica server in old version. I don't consider this condition compatible, I think it is a dangerous case for meta server and replica server has different version, because new version especially a feature version, meta server will provide mamy new rpc which replica server can not recognize, and it is not necessary to add new rpc code for compatible only when new meta and old replica case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolling update is a common case which many user have to face. We will not ensure every feature should work well at this case, but at least avoid the server crash.
If we rewrite the RPC message, the related RPC code would better to add a new one, and left the old one as deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolling update is a common case, but it is not recommended to update meta firstly, we recommendly update replica server firstly, so it is a seldom case that meta is new version but replica is old version.

Copy link
Member

@acelyc111 acelyc111 Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter which one is new version. The new replica server will still crash if receive a old backup request from old version meta server?

{
1:dsn.gpid pid;
2:policy_info policy;
3:string app_name;
4:i64 backup_id;
1:dsn.gpid pid;
2:string app_name;
3:i64 backup_id;
4:backup_status status;
5:string backup_provider_type;
// user specified backup_path.
5:optional string backup_path;
6:optional string backup_root_path;
}

struct backup_response
{
1:dsn.error_code err;
2:dsn.gpid pid;
3:i32 progress; // the progress of the cold_backup
4:string policy_name;
5:i64 backup_id;
6:i64 checkpoint_total_size;
3:i64 backup_id;
4:backup_status status;
5:optional dsn.error_code checkpoint_upload_err;
6:optional i32 upload_progress;
7:optional i64 checkpoint_total_size;
}

// clear all backup resources (including backup contexts and checkpoint dirs) of this policy.
Expand Down
54 changes: 48 additions & 6 deletions src/rdsn/src/common/backup_restore_common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,25 @@
// specific language governing permissions and limitations
// under the License.

#include <dsn/dist/fmt_logging.h>
#include <dsn/utility/filesystem.h>

#include "backup_restore_common.h"

namespace dsn {
namespace replication {
const std::string cold_backup_constant::APP_METADATA("app_metadata");
const std::string cold_backup_constant::APP_BACKUP_STATUS("app_backup_status");
const std::string cold_backup_constant::CURRENT_CHECKPOINT("current_checkpoint");
const std::string cold_backup_constant::BACKUP_METADATA("backup_metadata");
const std::string cold_backup_constant::BACKUP_INFO("backup_info");
const int32_t cold_backup_constant::PROGRESS_FINISHED = 1000;

DSN_DEFINE_string("replication",
cold_backup_root,
"",
"cold backup remote block service storage path prefix");

const std::string backup_constant::APP_METADATA("app_metadata");
const std::string backup_constant::APP_BACKUP_STATUS("app_backup_status");
const std::string backup_constant::CURRENT_CHECKPOINT("current_checkpoint");
const std::string backup_constant::BACKUP_METADATA("backup_metadata");
const std::string backup_constant::BACKUP_INFO("backup_info");
const int32_t backup_constant::PROGRESS_FINISHED = 1000;

const std::string backup_restore_constant::FORCE_RESTORE("restore.force_restore");
const std::string backup_restore_constant::BLOCK_SERVICE_PROVIDER("restore.block_service_provider");
Expand All @@ -36,5 +45,38 @@ const std::string backup_restore_constant::BACKUP_ID("restore.backup_id");
const std::string backup_restore_constant::SKIP_BAD_PARTITION("restore.skip_bad_partition");
const std::string backup_restore_constant::RESTORE_PATH("restore.restore_path");

std::string get_backup_root(const std::string &backup_root,
const std::string &user_defined_root_path)
{
if (user_defined_root_path.empty()) {
return backup_root;
}
return utils::filesystem::path_combine(user_defined_root_path, backup_root);
foreverneverer marked this conversation as resolved.
Show resolved Hide resolved
}

std::string get_backup_path(const std::string &root,
const std::string &app_name,
const int32_t app_id,
const int64_t backup_id,
const bool is_compatible)
{
std::string str_app = fmt::format("{}_{}", app_name, app_id);
if (!is_compatible) {
return fmt::format("{}/{}/{}", root, str_app, backup_id);
} else {
return fmt::format("{}/{}/{}", root, backup_id, str_app);
}
}

std::string get_backup_meta_path(const std::string &root,
const std::string &app_name,
const int32_t app_id,
const int64_t backup_id,
const bool is_compatible)
{
return fmt::format("{}/meta",
get_backup_path(root, app_name, app_id, backup_id, is_compatible));
}

} // namespace replication
} // namespace dsn
73 changes: 71 additions & 2 deletions src/rdsn/src/common/backup_restore_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,19 @@
#pragma once

#include <string>

#include <dsn/cpp/rpc_holder.h>
#include <dsn/tool-api/gpid.h>
#include <dsn/utility/flags.h>

#include "backup_types.h"
#include <dsn/cpp/rpc_holder.h>

namespace dsn {
namespace replication {

class cold_backup_constant
DSN_DECLARE_string(cold_backup_root);

class backup_constant
{
public:
static const std::string APP_METADATA;
Expand All @@ -52,5 +57,69 @@ class backup_restore_constant
static const std::string RESTORE_PATH;
};

/// The directory structure on block service
///
/// (<root> = <user_define_root>/<cluster>)
///
/// <root>/<app_name>_<app_id>/<backup_id>/<pidx>/chkpt_<ip>_<port>/***.sst
/// /<pidx>/chkpt_<ip>_<port>/CURRENT
/// /<pidx>/chkpt_<ip>_<port>/IDENTITY
/// /<pidx>/chkpt_<ip>_<port>/MANIFEST
/// /<pidx>/chkpt_<ip>_<port>/OPTIONS
/// /<pidx>/chkpt_<ip>_<port>/LOG
/// /<pidx>/chkpt_<ip>_<port>/backup_metadata
/// /<pidx>/current_checkpoint
/// /<pidx>/data_version
///
/// ......other partitions......
///
/// <root>/<app_name>_<app_id>/<backup_id>/meta/app_metadata
/// <root>/<app_name>_<app_id>/<backup_id>/backup_info
///

///
/// The usage of files:
/// 1, app_metadata : the metadata of the app, the same with the app's app_info
/// 2, backup_metadata : the file to statistic the information of a checkpoint, include all the
/// file's name, size and md5
/// 3, current_checkpoint : specifing which checkpoint directory is valid
/// 4, data_version: partition data_version
/// 5, backup_info : recording the information of this backup

// TODO(heyuchen): add other common functions
// get_compatible_backup_root is only used for restore compatible backup

// The backup root path on block service
// if user_defined_root_path is not empty
// - return <user_defined_root_path>/<backup_root>
// else
// - return <backup_root>
std::string get_backup_root(const std::string &backup_root,
acelyc111 marked this conversation as resolved.
Show resolved Hide resolved
const std::string &user_defined_root_path);

// This backup path on block service
// if is_compatible = false (root is the return value of get_backup_root function)
// - return <root>/<app_name>_<app_id>/<backup_id>
// else (only used for restore compatible backup, root is the return value of
// get_compatible_backup_root function)
// - return <root>/<backup_id>/<app_name>_<app_id>
std::string get_backup_path(const std::string &root,
hycdong marked this conversation as resolved.
Show resolved Hide resolved
const std::string &app_name,
const int32_t app_id,
const int64_t backup_id,
const bool is_compatible = false);

// This backup meta path on block service
// if is_compatible = false (root is the return value of get_backup_root function)
// - return <root>/<app_name>_<app_id>/<backup_id>/meta
// else (only used for restore compatible backup, root is the return value of
// get_compatible_backup_root function)
// - return <root>/<backup_id>/<app_name>_<app_id>/meta
std::string get_backup_meta_path(const std::string &root,
const std::string &app_name,
const int32_t app_id,
const int64_t backup_id,
const bool is_compatible = false);

} // namespace replication
} // namespace dsn
3 changes: 0 additions & 3 deletions src/rdsn/src/common/replication_common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -398,9 +398,6 @@ void replication_options::initialize()
learn_app_max_concurrent_count,
"max count of learning app concurrently");

cold_backup_root = dsn_config_get_value_string(
"replication", "cold_backup_root", "", "cold backup remote storage path prefix");

cold_backup_checkpoint_reserve_minutes =
(int)dsn_config_get_value_uint64("replication",
"cold_backup_checkpoint_reserve_minutes",
Expand Down
1 change: 0 additions & 1 deletion src/rdsn/src/common/replication_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ class replication_options

int32_t learn_app_max_concurrent_count;

std::string cold_backup_root;
int32_t cold_backup_checkpoint_reserve_minutes;

int32_t max_concurrent_bulk_load_downloading_count;
Expand Down
Loading