Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: implement value_shcema_v0 #722

Merged
merged 20 commits into from
May 25, 2021

Conversation

levy5307
Copy link
Contributor

@levy5307 levy5307 commented Apr 20, 2021

What problem does this PR solve?

proposal https://github.com/apache/incubator-pegasus/blob/master/rfcs/2020-10-09-data-version-v3.md
implement value_schema_v0, which represents data version 0.

Checklist

Tests
  • Unit test

@neverchanje
Copy link
Contributor

If your intention of this PR is to add new value schema more conveniently, I can give a simple solution:

struct value_schema_v2
{
    static constexpr size_t version_field_offset = 0;
    typedef version_type = uint8_t;

    static constexpr size_t expire_ts_field_offset = 1;
    typedef expire_ts_type = uint32_t;

    static constexpr size_t timetag_field_offset = 5;
    typedef timetag_type = uint64_t;

    value_schema_v2() {
      _write_buf.resize(sizeof(uint8_t) + sizeof(uint32_t) + sizeof(uint64_t));
      _write_slices.resize(2);
    }

    rocksdb::SliceParts generate_value(uint8_t version, uint32_t expire_ts, uint64_t timetag, dsn::string_view user_data) {
      data_output out(_write_buf);
      out.write<uint8_t>(version);
      out.write<uint32_t>(expire_ts);
      out.write<uint64_t>(user_data);

      _write_buf[0] = rocksdb::Slice(_write_buf.data, )
      return {&_write_slices[0], static_cast<int>(_write_slices.size())};
    }

    static uint8_t extract_version(string_view val);
    static uint32_t extract_expire_ts(string_view val);
    static uint64_t extract_timetag(string_view val);
    static blob:: extract_user_data(std::string&& val);

private:
    std::string _write_buf;
    std::vector<rocksdb::Slice> _write_slices;
};

void extract_expire_ts_from_value(int version, string_view user_data);
void extract_timetag_from_value(int version, string_view user_data);
void extract_version_from_value(int version, string_view user_data);

As you can see, this code is very similar to our thrift-generated code. It can be generated with a very simple template language, for example, yaml.

0:
  - expire_ts : 
      type: uint32_t
      size: 4

1:
  - expire_ts : 
      type: uint32_t
      size: 4
  - timetag : 
      type: uint64_t
      size: 8

2:
  - version : 
      type: uint8_t
      size: 1
  - expire_ts : 
      type: uint32_t
      size: 4
  - timetag : 
      type: uint64_t
      size: 8

I would suggest implementing a tool that can generate the above code according to the template, instead of using class inheritance.
The value encoding/decoding is extremely critical and requires efficiency. I think we should avoid runtime type deduction as much as we can, for example:

void extract_expire_ts_from_value(value_schema *schema, string_view user_data) {
    auto field = schema->extract_field(raw_value, pegasus::value_field_type::EXPIRE_TIMESTAMP); // type deduction #1
    auto expire_ts_field = static_cast<expire_timestamp_field *>(segment.get()); // type deduction #2
    return expire_ts_field->expire_ts;

There're 2 deductions here. I'm not indicating that this code will definitely perform poorly. But technically, I would prefer to write code without potential slow points.

Using a code generator can also avoid bugs so long as the tool itself is stable. So when we add another field in the future, we only need to modify the template file, rather than introduce more code.

src/base/value_schema_manager.cpp Outdated Show resolved Hide resolved
src/base/test/value_schema_test.cpp Outdated Show resolved Hide resolved
src/base/value_schema_v0.cpp Outdated Show resolved Hide resolved
@levy5307
Copy link
Contributor Author

levy5307 commented Apr 27, 2021

If your intention of this PR is to add new value schema more conveniently, I can give a simple solution:

struct value_schema_v2
{
    static constexpr size_t version_field_offset = 0;
    typedef version_type = uint8_t;

    static constexpr size_t expire_ts_field_offset = 1;
    typedef expire_ts_type = uint32_t;

    static constexpr size_t timetag_field_offset = 5;
    typedef timetag_type = uint64_t;

    value_schema_v2() {
      _write_buf.resize(sizeof(uint8_t) + sizeof(uint32_t) + sizeof(uint64_t));
      _write_slices.resize(2);
    }

    rocksdb::SliceParts generate_value(uint8_t version, uint32_t expire_ts, uint64_t timetag, dsn::string_view user_data) {
      data_output out(_write_buf);
      out.write<uint8_t>(version);
      out.write<uint32_t>(expire_ts);
      out.write<uint64_t>(user_data);

      _write_buf[0] = rocksdb::Slice(_write_buf.data, )
      return {&_write_slices[0], static_cast<int>(_write_slices.size())};
    }

    static uint8_t extract_version(string_view val);
    static uint32_t extract_expire_ts(string_view val);
    static uint64_t extract_timetag(string_view val);
    static blob:: extract_user_data(std::string&& val);

private:
    std::string _write_buf;
    std::vector<rocksdb::Slice> _write_slices;
};

void extract_expire_ts_from_value(int version, string_view user_data);
void extract_timetag_from_value(int version, string_view user_data);
void extract_version_from_value(int version, string_view user_data);

As you can see, this code is very similar to our thrift-generated code. It can be generated with a very simple template language, for example, yaml.

0:
  - expire_ts : 
      type: uint32_t
      size: 4

1:
  - expire_ts : 
      type: uint32_t
      size: 4
  - timetag : 
      type: uint64_t
      size: 8

2:
  - version : 
      type: uint8_t
      size: 1
  - expire_ts : 
      type: uint32_t
      size: 4
  - timetag : 
      type: uint64_t
      size: 8

I would suggest implementing a tool that can generate the above code according to the template, instead of using class inheritance.
The value encoding/decoding is extremely critical and requires efficiency. I think we should avoid runtime type deduction as much as we can, for example:

void extract_expire_ts_from_value(value_schema *schema, string_view user_data) {
    auto field = schema->extract_field(raw_value, pegasus::value_field_type::EXPIRE_TIMESTAMP); // type deduction #1
    auto expire_ts_field = static_cast<expire_timestamp_field *>(segment.get()); // type deduction #2
    return expire_ts_field->expire_ts;

There're 2 deductions here. I'm not indicating that this code will definitely perform poorly. But technically, I would prefer to write code without potential slow points.

Using a code generator can also avoid bugs so long as the tool itself is stable. So when we add another field in the future, we only need to modify the template file, rather than introduce more code.

I don't think it's a good idea

  1. we should add a different interface for each filed, for example: extract_version_from_value. This means you should add a interface if you add a filed. And In this interface, you should deal with all of these version even if this version does't support it. The code is not elegant enough if we implement it like this. To be honest, that's what I did initially, but all of us thought it wasn't good enough.
  2. static_cast doesn't cost too much time. And there is only one type deduction, not two.
  3. It's too complex to implement a tool to generate these code.

src/base/pegasus_value_schema.h Outdated Show resolved Hide resolved
src/base/value_field.h Outdated Show resolved Hide resolved
src/base/value_field.h Outdated Show resolved Hide resolved
src/base/value_field.h Outdated Show resolved Hide resolved
src/base/value_field.h Outdated Show resolved Hide resolved
src/base/value_field.h Outdated Show resolved Hide resolved
src/base/value_field.h Outdated Show resolved Hide resolved
src/base/value_schema_v0.cpp Show resolved Hide resolved
@hycdong hycdong merged commit 79bb927 into apache:master May 25, 2021
@hycdong hycdong mentioned this pull request Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants