Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid memcpy when using RocksDB slice #2183

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

liam-thunder
Copy link

What problem does this PR solve?

Issue Number: #1583

What is changed and how it works?

What's Changed:

  1. add a new interface to expose raw string buffer
  2. remove useless Value() called from iterator
  3. use butil::IOBufAppender to improve append

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

Copy link
Contributor

@wu-hanqing wu-hanqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. 👍

BTW, you can reply cicheck to trigger CI.

@@ -50,6 +50,8 @@ class Iterator {

virtual std::string Value() = 0;

virtual void Value(const char** data, size_t* size) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return absl::string_view and remove previous virtual std::string Value() = 0?

And also, virtual std::string Key() = 0

Copy link
Author

@liam-thunder liam-thunder Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose returning string_view from RocksDBStorageIterator is ok, but I'm not sure whether returning string_view from Key() / Value() is safe for other derived iterators, since their returned string's life time may not be longer than this string_view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only exception is

return InternalKey(entryType_, partitionId_, key);
, it returns std::string from a temporary.

Maybe we can store the temporary as member variable and returns string_view from it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only exception is

return InternalKey(entryType_, partitionId_, key);

, it returns std::string from a temporary.
Maybe we can store the temporary as member variable and returns string_view from it?

Yes, we could use a member variable to store every string returned from InternalKey to endure its life time.
But changing Iterator's Key() & Value() return type will involve changes to various upper layer logics, if this is acceptable, I can start to modify these logics. @wu-hanqing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only exception is

return InternalKey(entryType_, partitionId_, key);

, it returns std::string from a temporary.
Maybe we can store the temporary as member variable and returns string_view from it?

Yes, we could use a member variable to store every string returned from InternalKey to endure its life time. But changing Iterator's Key() & Value() return type will involve changes to various upper layer logics, if this is acceptable, I can start to modify these logics. @wu-hanqing

I think it's acceptable.

uint64_t chunkIndex,
const std::string& value) {
uint64_t chunkIndex,
const char* data, size_t data_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, if absl::string_view works, then we can use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same suggestion: https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h

// A `string_view` provides a lightweight view into the string data provided by
// a `std::string`, double-quoted string literal, character array, or even
// another `string_view`. A `string_view` does *not* own the string to which it
// points, and that data cannot be modified through the view.

Copy link
Author

@liam-thunder liam-thunder Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same suggestion: https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h

// A `string_view` provides a lightweight view into the string data provided by
// a `std::string`, double-quoted string literal, character array, or even
// another `string_view`. A `string_view` does *not* own the string to which it
// points, and that data cannot be modified through the view.

Yes, if we can get a string_view from Value(), we can use it here,

How about changing iterator's interface to void Value(absl::string_view& sv) = 0, and let each derived iterator decides its own way to implement its zero copy Value()?

returning string_view is definitely neater solution here, but I still have the same concern as I replayed above, using string_view as returned value of Value() is not safe for other derived iterators. As stated in string_view's comments:

// Due to lifetime issues, a `string_view` is sometimes a poor choice for a
// return value and usually a poor choice for a data member. If you do use a
// `string_view` this way, it is your responsibility to ensure that the object
// pointed to by the `string_view` outlives the `string_view`.

For RocksDBStorageIterator, since RocksDB has already guaranteed that its key & value will live as long as the iterator's life time, returning string_view is ok, but I don't see other iterators have such guarantee.

@ilixiaocui
Copy link
Contributor

image

Please commit by git commit -s to Signed-off

@liam-thunder
Copy link
Author

cicheck

@liam-thunder
Copy link
Author

image

Please commit by git commit -s to Signed-off

I've forced pushed a signed-off commit, but this DCO check doesn't restart by replying 'cicheck'..

@ilixiaocui
Copy link
Contributor

image Please commit by `git commit -s` to Signed-off

I've forced pushed a signed-off commit, but this DCO check doesn't restart by replying 'cicheck'..

I saw the corresponding commit message, and now manually set the pass.

@liam-thunder
Copy link
Author

cicheck

1 similar comment
@ilixiaocui
Copy link
Contributor

cicheck

Signed-off-by: liam-thunder <chenyou[dot]fdu[at]gmail.com>
@liam-thunder
Copy link
Author

@wu-hanqing @ilixiaocui
Updated initial version using string_view, still have two missing pieces:

  1. for StorageKey's derived class (Key4Inode, Prefix4AllInode, ...), their ParseFromString implementation uses StringToUl to convert string to unsigned long, but for string_view, I haven't found any solid implementation

    inline bool StringToUl(const std::string &value, uint32_t *out) noexcept {
    try {
    *out = std::stoul(value);
    return true;
    } catch (std::invalid_argument &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    } catch (std::out_of_range &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    }
    }

    bool Key4Inode::ParseFromString(const std::string& value) {
    std::vector<std::string> items;
    SplitString(value, ":", &items);
    return items.size() == 3 && CompareType(items[0], keyType_) &&
    StringToUl(items[1], &fsId) && StringToUll(items[2], &inodeId);
    }

  2. To avoid memcpy here, SDel's interface needs modification, this will involves changes to BaseStorage and all its derived classes

    // TODO: how to avoid copy here?
    for (auto skey : key2del) {
    if (!txn->SDel(std::string(table4S3ChunkInfo_), std::string(skey)).ok()) {
    LOG(ERROR) << "Delete key failed, skey=" << skey;
    return MetaStatusCode::STORAGE_INTERNAL_ERROR;
    }

Needs your advices on these two problems, thanks!

@liam-thunder
Copy link
Author

cicheck

@wu-hanqing
Copy link
Contributor

@wu-hanqing @ilixiaocui Updated initial version using string_view, still have two missing pieces:

  1. for StorageKey's derived class (Key4Inode, Prefix4AllInode, ...), their ParseFromString implementation uses StringToUl to convert string to unsigned long, but for string_view, I haven't found any solid implementation

    inline bool StringToUl(const std::string &value, uint32_t *out) noexcept {
    try {
    *out = std::stoul(value);
    return true;
    } catch (std::invalid_argument &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    } catch (std::out_of_range &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    }
    }

    bool Key4Inode::ParseFromString(const std::string& value) {
    std::vector<std::string> items;
    SplitString(value, ":", &items);
    return items.size() == 3 && CompareType(items[0], keyType_) &&
    StringToUl(items[1], &fsId) && StringToUll(items[2], &inodeId);
    }

  2. To avoid memcpy here, SDel's interface needs modification, this will involves changes to BaseStorage and all its derived classes

    // TODO: how to avoid copy here?
    for (auto skey : key2del) {
    if (!txn->SDel(std::string(table4S3ChunkInfo_), std::string(skey)).ok()) {
    LOG(ERROR) << "Delete key failed, skey=" << skey;
    return MetaStatusCode::STORAGE_INTERNAL_ERROR;
    }

Needs your advices on these two problems, thanks!

Both problems are tough.

For the first one, absl::from_chars only supports floating-point currently, and std::from_chars support integer types but it's C++17 feature.

For the second one, should we change all function parameters from const std::string & to absl::string_view? But it seems there is a lot of work.

I suggest we can keep the status quo.

@ilixiaocui What's your point?

@ilixiaocui
Copy link
Contributor

ilixiaocui commented Feb 3, 2023

@wu-hanqing @ilixiaocui Updated initial version using string_view, still have two missing pieces:

  1. for StorageKey's derived class (Key4Inode, Prefix4AllInode, ...), their ParseFromString implementation uses StringToUl to convert string to unsigned long, but for string_view, I haven't found any solid implementation

    inline bool StringToUl(const std::string &value, uint32_t *out) noexcept {
    try {
    *out = std::stoul(value);
    return true;
    } catch (std::invalid_argument &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    } catch (std::out_of_range &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    }
    }

    bool Key4Inode::ParseFromString(const std::string& value) {
    std::vector<std::string> items;
    SplitString(value, ":", &items);
    return items.size() == 3 && CompareType(items[0], keyType_) &&
    StringToUl(items[1], &fsId) && StringToUll(items[2], &inodeId);
    }

  2. To avoid memcpy here, SDel's interface needs modification, this will involves changes to BaseStorage and all its derived classes

    // TODO: how to avoid copy here?
    for (auto skey : key2del) {
    if (!txn->SDel(std::string(table4S3ChunkInfo_), std::string(skey)).ok()) {
    LOG(ERROR) << "Delete key failed, skey=" << skey;
    return MetaStatusCode::STORAGE_INTERNAL_ERROR;
    }

Needs your advices on these two problems, thanks!

Both problems are tough.

For the first one, absl::from_chars only supports floating-point currently, and std::from_chars support integer types but it's C++17 feature.

For the second one, should we change all function parameters from const std::string & to absl::string_view? But it seems there is a lot of work.

I suggest we can keep the status quo.

@ilixiaocui What's your point?

Indeed trouble. @liam-thunder Are you interested in testing the delay ratio of this interface on the current calling chain? If this part has a large proportion of delay in the call chain, it is still very meaningful to modify. If it is not the main blocking point on the call chain, maintain the status quo?

@liam-thunder
Copy link
Author

liam-thunder commented Feb 6, 2023

How about using unit test and MemoryStorage to identify these interfaces and StringToUl's delay ratio? I think I can try to do it. But I don't have enough resource to identify delay ratio in deployed CurveFS with a end2end test. @ilixiaocui @wu-hanqing

@wu-hanqing @ilixiaocui Updated initial version using string_view, still have two missing pieces:

  1. for StorageKey's derived class (Key4Inode, Prefix4AllInode, ...), their ParseFromString implementation uses StringToUl to convert string to unsigned long, but for string_view, I haven't found any solid implementation

    inline bool StringToUl(const std::string &value, uint32_t *out) noexcept {
    try {
    *out = std::stoul(value);
    return true;
    } catch (std::invalid_argument &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    } catch (std::out_of_range &e) {
    LOG(ERROR) << "decode string:{" << value << "} to number err:"
    << e.what();
    return false;
    }
    }

    bool Key4Inode::ParseFromString(const std::string& value) {
    std::vector<std::string> items;
    SplitString(value, ":", &items);
    return items.size() == 3 && CompareType(items[0], keyType_) &&
    StringToUl(items[1], &fsId) && StringToUll(items[2], &inodeId);
    }

  2. To avoid memcpy here, SDel's interface needs modification, this will involves changes to BaseStorage and all its derived classes

    // TODO: how to avoid copy here?
    for (auto skey : key2del) {
    if (!txn->SDel(std::string(table4S3ChunkInfo_), std::string(skey)).ok()) {
    LOG(ERROR) << "Delete key failed, skey=" << skey;
    return MetaStatusCode::STORAGE_INTERNAL_ERROR;
    }

Needs your advices on these two problems, thanks!

Both problems are tough.
For the first one, absl::from_chars only supports floating-point currently, and std::from_chars support integer types but it's C++17 feature.
For the second one, should we change all function parameters from const std::string & to absl::string_view? But it seems there is a lot of work.
I suggest we can keep the status quo.
@ilixiaocui What's your point?

Indeed trouble. @liam-thunder Are you interested in testing the delay ratio of this interface on the current calling chain? If this part has a large proportion of delay in the call chain, it is still very meaningful to modify. If it is not the main blocking point on the call chain, maintain the status quo?

@ilixiaocui
Copy link
Contributor

@liam-thunder Hope this artical can help you.

The monitoring deployment of curvefs can refer to: https://github.com/opencurve/curveadm/wiki/curvefs-monitor-deployment.

rocksdb is used to store metadata, and the operation frequency is relatively high to add, delete, modify and check inodes and dentry. After mounting the curvefs file system, you can execute ls, mkdir and other commands, and then check the time-consuming of each processing part involved through monitoring.

@wuhongsong wuhongsong added enhancement improve feature repairing bug repairing labels Feb 16, 2023
@wuhongsong
Copy link
Contributor

@liam-thunder Keep on going? or any help?

@liam-thunder
Copy link
Author

@liam-thunder Keep on going? or any help?

Sorry for late response, I'm quite busy recently, and also couldn't spare enough test machines to do the performance test. I'll try to get back to this issue later.

@liam-thunder
Copy link
Author

liam-thunder commented Mar 30, 2023

@ilixiaocui @wu-hanqing @wuhongsong I'm working on this PR again, and try to build a single node CurveFS, but I'm not sure whether a single node CurveFS enough for this benchmark test.

@Cyber-SiKu
Copy link
Contributor

@liam-thunder Keep on going? or any help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants