Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curvebs: support 512 aligned IO #1231

Closed
wants to merge 2 commits into from

Conversation

wu-hanqing
Copy link
Contributor

@wu-hanqing wu-hanqing commented Mar 31, 2022

What problem does this PR solve?

Issue Number: close #985

Problem Summary:

What is changed and how it works?

What's Changed:

How it Works:

Side effects(Breaking backward compatibility? Performance regression?):

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@wu-hanqing wu-hanqing force-pushed the bs/512-aligned-io branch 7 times, most recently from 2932059 to 09cef26 Compare April 3, 2022 01:55
@wu-hanqing
Copy link
Contributor Author

recheck

1 similar comment
@wu-hanqing
Copy link
Contributor Author

recheck

@wu-hanqing wu-hanqing force-pushed the bs/512-aligned-io branch 3 times, most recently from 726becd to a7fe2e2 Compare April 3, 2022 11:45
@wu-hanqing
Copy link
Contributor Author

recheck

@wu-hanqing wu-hanqing force-pushed the bs/512-aligned-io branch 2 times, most recently from 94f1947 to b668629 Compare June 23, 2022 12:36
@wu-hanqing
Copy link
Contributor Author

recheck

@wu-hanqing
Copy link
Contributor Author

recheck

# it should consist with `block_size` in chunkfilepool.meta_path and `mds.volume.blockSize` in MDS's configurations
# for clone chunk and snapshot chunk, it's also the minimum granularity that each bit represents
# if set to |512|, we need 4096 bytes bitmap for each chunk, so meta_page_size should be 8192 or larger.
global.block_size=4096
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this value be verified on the mds side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have three verifications:

  1. when a chunkserver starts, we will verify global.block_size in chunkserver's conf and chunkfilepool's metafile, if they aren't equal, chunkserver will quit.
  2. then, if the chunkserver is newly deployed, and in the registration process, we will verify chunkserver's block size and block size recorded in MDS, if they aren't equal, chunkserver registration will fail.
  3. MDS's conf also has a block_size, and when it starts, we will verify this item with block size recorded in etcd, if they aren't equal, mds will quit.

and for compatibility,

  • if chunkfilepool's metafile doesn't have block size, we treat it as 4096.
  • if etcd doesn't block size, we will insert it according to the value in conf.

@@ -25,3 +25,4 @@ s3.throttle.iopsWriteLimit={{ s3_throttle_iopsWriteLimit }}
s3.throttle.bpsTotalMB= {{ s3_throttle_bpsTotalLimit }}
s3.throttle.bpsReadMB= {{ s3_throttle_bpsReadLimit }}
s3.throttle.bpsWriteMB= {{ s3_throttle_bpsWriteLimit }}
s3.useVirtualAddressing=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this configuration for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was introduced by #1253

std::string path = "/sys/block/nbd" + std::to_string(nbd_index)
+ "/queue/hw_sector_size";
std::string path =
sys_block_path(nbd_index) + "/queue/logical_block_size";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the check items change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we check hw_sector_size before, but what we set is block size when mapping the device,

https://github.com/opencurve/curve-nbd/blob/6231d3bc4954a5f2427148475cf39a2fdc4e3bad/src/NBDController.cpp#L50

@@ -140,6 +142,27 @@ struct ChunkServiceOptions {
std::shared_ptr<InflightThrottle> inflightThrottle;
};

inline CopysetNodeOptions::CopysetNodeOptions()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default configuration is only used when unit testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't modify it, just moved it from source file to header, because it's trivial.

@@ -433,9 +438,9 @@ CSErrorCode CSChunkFile::Paste(const char * buf, off_t offset, size_t length) {

// The request above must be pagesize aligned
// the starting page index number of the paste area
uint32_t beginIndex = offset / pageSize_;
uint32_t beginIndex = offset / blockSize_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes need to be changed.

// The request above must be pagesize aligned
// the starting page index number of the paste area

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -488,9 +494,9 @@ CSErrorCode CSChunkFile::Read(char * buf, off_t offset, size_t length) {
if (isCloneChunk_) {
// The request above must be pagesize aligned
// the starting page index number of the paste area
uint32_t beginIndex = offset / pageSize_;
uint32_t beginIndex = offset / blockSize_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@@ -63,12 +63,14 @@ int Register::RegisterToMDS(ChunkServerMetadata *metadata) {
req.set_externalip(ops_.chunkserverExternalIp);
}
req.set_port(ops_.chunkserverPort);
req.set_blocksize(ops_.blockSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for verification on the mds side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@@ -0,0 +1,65 @@
/*
Copyright 2015 Glen Joseph Fernandes
([email protected])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as curvefs/src/common/fast_align.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix, curvefs/src/common/fast_align.h is removed

@@ -62,6 +62,8 @@ const char CLONEINFOKEYEND[] = "13";
const char DISCARDSEGMENTKEYPREFIX[] = "13";
const char DISCARDSEGMENTKEYEND[] = "14";

const char BLOCKSIZEKEY[] = "14blocksize";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use "15blocksize" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

}

auto bitmapBytes = FLAGS_fileSize / FLAGS_blockSize / 8;
if (bitmapBytes > FLAGS_metaPageSize * 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bitmapBytes > (FLAGS_metaPageSize-specified field), it should not work?
Here means if bitmapBytes = 4k, FLAGS_metaPageSize = 1k,that's ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, I added a hard code value 4096 - 512 to indicate minimum size usage for encoding and storing source location for clone chunk, so that, the meta page size should be great than or equal to 3584 + bitmap bytes

@@ -101,6 +101,8 @@ chunk_size=16777216
chunkserver_walfilepool_segment_size=8388608
retain_pool=False
walfilepool_use_chunk_file_pool=True
chunkserver_meta_page_size=4096
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider chunkserver_meta_page_size, which is automatically calculated by chunkserver according to the chunkserver_block_size?

@@ -500,6 +507,12 @@ void MDS::InitCurveFSOptions(CurveFSOption *curveFSOptions) {
"mds.curvefs.minFileLength", &curveFSOptions->minFileLength);
conf_->GetValueFatalIfFail(
"mds.curvefs.maxFileLength", &curveFSOptions->maxFileLength);
conf_->GetValueFatalIfFail("mds.curvefs.blockSize", &g_block_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not define g_block_size in curveFSOptions instead of global variable

void TopologyServiceManager::RegistChunkServer(
const ChunkServerRegistRequest *request,
ChunkServerRegistResponse *response) {
if (request->has_blocksize()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that chunkserver gets blocksize from mds instead of configuring blocksize individually and then verifying


DEFINE_validator(metaPageSize, &ValidateMetaPageSize);

DEFINE_uint32(blockSize, 4096, "minimum io alignment supported");
Copy link
Member

@xu-chaojie xu-chaojie Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we check blockSize and metaPageSize at chunkserver startup ?

@ilixiaocui
Copy link
Contributor

cc @wu-hanqing Please resubmit to the master branch if necessary.

@ilixiaocui ilixiaocui closed this Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

potential inconsistent data when map a clone volume by curve-nbd with 512 block-size
3 participants