Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use flexible format when persisting Region #8590

Closed
CalvinNeo opened this issue Dec 26, 2023 · 1 comment · Fixed by #8614
Closed

Use flexible format when persisting Region #8590

CalvinNeo opened this issue Dec 26, 2023 · 1 comment · Fixed by #8614
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@CalvinNeo
Copy link
Member

CalvinNeo commented Dec 26, 2023

Enhancement

The format of region is currently

|-------------------------|
|      Region Meta        |
| Extra Region Meta Flag  |
|    Extra Region Meta    |
|       Region Data       |
|-------------------------|

We are going to change it to what we called "flexible" format.

|-------------------------|
|      Region Meta        |
|         flag 1          |
|        length 1         |
|        payload 1        |
|         flag 2          |
|        length 2         |
|        payload 2        |
|   ..................    |
|       Region Data       |
|-------------------------|

We will support several special extensions:

  1. EagerTruncate is a compatibility for previous HAS_EAGER_TRUNCATE_INDEX. It will have no length field.
  2. Finished should be the last flag of the whole extensions, when parsed, the extension flags parser will stop parsing, and hand over read buffer to RegionData parser.
  3. Unrecognizable means the flag is not recognizable. It is because the flag is written from a cluster with newer version.

Some other changes:

  1. We will not check if the read storage version is below current version after V3

Benefits

Currently, we use bits from a UInt32 to mark configurations from extensions like eager gc on serverless clusters, and this strategy raises the following problems:

  1. In further development of TiFlash, the 32 bits can only stand for 32 features, and it potentially to be used up. In order to mitigate this problem, a solution is to introduce the 31st bit as an extension mark which indicates to read some following structures for other extension feature data. However, it is hard to choose the pattern of "following structures". Should it be 32bit? Then what if it's used up again? Should it be 32kb? Does it waste too much? Should we use flexible format? Then why don't we use it directly now?
  2. Consider downgrading. We may meet some unrecognized flags. There are two ways to fix this. One way it to move the "Extra Region Meta" stuff to behind RegionData, and read until the first unrecognized flag, and then stop reading, discard all rest data. This also relies on we use allocate bits from LSB to MSB. The other way it to change it to flexible format, with some compation we will talk later.
  3. Consider we will deprecate some features in later version. If we use flexible format, we can only clean the data once, and persist region without the flag. Then we will get rid of this struct forever.
  4. Consider we have many features, some are exclusive to serverless mode, some are exclusive to op mode. For example, we introduced an op-exclusive mode B after the eager gc feature which is only useful in serverless mode. When parsing feature B, we will have to parse feature A first. It won't cost too much, but is awful.
@CalvinNeo CalvinNeo added the type/enhancement The issue or PR belongs to an enhancement. label Dec 26, 2023
@CalvinNeo
Copy link
Member Author

CalvinNeo commented Dec 28, 2023

Since the 7.5.x TiFlash will continue to use V2, so we will have the following changes:

  1. We don't use EagerGc now, it is considered to be a permanent field of TiFlash.
  2. We don't use Finished now, since we can't parse the extra Finish field, and we can't tell the difference between the old version of V2 and the new version of V2
  3. We don't use Unrecognized for unknow fields, but we introduce MaxKnownFields to record the maximum known fields.

So, the layout will be

|------- 31 bits for size of extensions -------|-- 1 bit to compat eager gc --|
|------------------------------ eager gc -------------------------------------|
|---------------------------- ext_type 1 -------------------------------------|
|----------------------------- length 1 --------------------------------------|
|----------------------------- payload 1 -------------------------------------|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant