Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track each SST's timestamp information as user properties #9093

Closed
wants to merge 6 commits into from

Conversation

sunlike-Lipo
Copy link
Contributor

@sunlike-Lipo sunlike-Lipo commented Oct 29, 2021

Track each SST's timestamp information as user properties #8959

Rockdb has supported user-defined timestamp feature. Application can specify a timestamp
when writing each k-v pair. When data flush from memory to disk file called SST files.
Each SST files consist of multiple data blocks and several metadata blocks. Among the metadata
blocks, there is one called Properties block that tracks some pre-defined properties of this SST file.

This PR is for collecting the properties of min and max timestamps of all keys in the file. With those
properties the SST file is more convenient to tell whether the keys in the SST have timestamps or not.

The changes involved are as follows:

  1. Add a class TimestampTablePropertiesCollector to collect min/max timestamp when add keys to table,
    The way TimestampTablePropertiesCollector use to compare timestamp of key should defined by
    user by implementing the Comparator::CompareTimestamp function in the user defined comparator.
  2. Add corresponding unit tests.

@sunlike-Lipo sunlike-Lipo changed the title Track each SST's timestamp information as user properties #8959 Track each SST's timestamp information as user properties Oct 29, 2021
@sunlike-Lipo sunlike-Lipo force-pushed the develop branch 4 times, most recently from b5db9d7 to 7bdb3a6 Compare October 29, 2021 12:57
@wolfkdy
Copy link
Contributor

wolfkdy commented Nov 4, 2021

@riversand963 I've reviewed this mr and my colleague post it.
The original mr is splitted into two mrs.
What you commented in the original mr are all fixed.
Shall you take a look when free, thanks!

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sunlike-Lipo for the PR. I think this PR needs more testing once #9092 gets merged. cc @wolfkdy

db/table_properties_collector.h Outdated Show resolved Hide resolved
}

Status Finish(UserCollectedProperties* properties) override {
properties->insert({"min_timestamp", min_timestamp});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check the size of min/max timestamps are both equal to cmp_->timestamp_size()?

Copy link
Contributor Author

@sunlike-Lipo sunlike-Lipo Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. Already fixed it.

db/table_properties_collector.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
table/block_based/block_based_table_builder.cc Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
@wolfkdy
Copy link
Contributor

wolfkdy commented Nov 4, 2021

@riversand963

Thanks @sunlike-Lipo for the review. I think this PR needs more testing once #9092 gets merged. cc @wolfkdy

Thanks, we plan to follow other unittests for TablePropertiesCollector's subclass and add unittests for TimestampTablePropertiesCollector.

@sunlike-Lipo
Copy link
Contributor Author

@riversand963
I fixed review comments and add a unitTest. Could you please help me to review what else needs to be modified if that is convenient.

@riversand963 riversand963 linked an issue Nov 12, 2021 that may be closed by this pull request
@riversand963
Copy link
Contributor

@sunlike-Lipo Will take a look.
One question: do you by any chance know the overhead introduced by tracking min/max timestamps for each file, since the comparison will be done for each key? It would be nice to include some benchmarking results here. We can assume timestamp size of 8 bytes.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

db/table_properties_collector.h Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
db/dbformat.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
db/table_properties_collector.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@sunlike-Lipo
Copy link
Contributor Author

@sunlike-Lipo Will take a look. One question: do you by any chance know the overhead introduced by tracking min/max timestamps for each file, since the comparison will be done for each key? It would be nice to include some benchmarking results here. We can assume timestamp size of 8 bytes.

Sure. I'll do the test and post the benchmarking results as soon as possible.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@sunlike-Lipo
Copy link
Contributor Author

sunlike-Lipo commented Nov 16, 2021

@sunlike-Lipo Will take a look. One question: do you by any chance know the overhead introduced by tracking min/max timestamps for each file, since the comparison will be done for each key? It would be nice to include some benchmarking results here. We can assume timestamp size of 8 bytes.

@riversand963
I wrote a unittest(sunlike-Lipo@164ea93) for the benchmark,
In general, the test runs for 10 times and collects the total time used for sstfiles' Flush.
I use the MockEnv to eliminate disk-IO's effect.

The variables are value size and collect timestamp or not. The comparator used is ComparatorWithU64Ts.
the test results are list below

value_size=10 value_size=100 value_size=1000
collect-ts 4763ms 1544ms 659ms
not-collect-ts 4399ms 1489ms 644ms
downgrade 8.3% 3.7% 2.3%

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@riversand963
Copy link
Contributor

Thanks @sunlike-Lipo for addressing the comments. Will take another look soon.

I wrote a unittest

I wonder if we can move the code to a standalone source file (similar to examples/) and compile with rocksdb library built with DEBUG_LEVEL=0 and see the performance difference in opt mode too. You can also use db_bench. In unit test, I think the DEBUG_LEVEl is at least 1, thus may already have some overhead, making the performance overhead of ts comparison less obvious.

@sunlike-Lipo
Copy link
Contributor Author

sunlike-Lipo commented Nov 17, 2021

Thanks @sunlike-Lipo for addressing the comments. Will take another look soon.

I wrote a unittest

I wonder if we can move the code to a standalone source file (similar to examples/) and compile with rocksdb library built with DEBUG_LEVEL=0 and see the performance difference in opt mode too. You can also use db_bench. In unit test, I think the DEBUG_LEVEl is at least 1, thus may already have some overhead, making the performance overhead of ts comparison less obvious.

I fixed it to a stand alone test (sunlike-Lipo@50a1152) for the benchmark.
The testing logical is not changed:
In general, the test runs for 10 times and collects the total time used for sstfiles' Flush.
I use the MemEnv to eliminate disk-IO's effect.

The variables are value size and collect timestamp or not,. The comparator still is ComparatorWithU64Ts.
the test results are list below

value_size=10 value_size=100 value_size=1000
collect-ts 6188ms 2206ms 255ms
not-collect-ts 5806ms 2164ms 249ms
downgrade 6.5% 1.9% 2.4%

I will also test the db_bench, the result will be soon.

@sunlike-Lipo
Copy link
Contributor Author

sunlike-Lipo commented Nov 18, 2021

Thanks @sunlike-Lipo for addressing the comments. Will take another look soon.

I wrote a unittest

I wonder if we can move the code to a standalone source file (similar to examples/) and compile with rocksdb library built with DEBUG_LEVEL=0 and see the performance difference in opt mode too. You can also use db_bench. In unit test, I think the DEBUG_LEVEl is at least 1, thus may already have some overhead, making the performance overhead of ts comparison less obvious.

@riversand963
The db_bench test result is here:
command:

$ TEST_TMPDIR=/tmp/rocksdb ./db_bench -benchmarks=filluniquerandom -level0_slowdown_writes_trigger 102400 -level0_stop_writes_trigger 102400  -disable_auto_compactions true -user_timestamp_size 8 -compression_type=none  -num=20000000

With ts collector : 22.8MB/s, 22.5MB/s, 22.5MB/s, 22.6MB/s, 22.6MB/s
With out ts collector : 22.5MB/s, 22.3MB/s, 22.6MB/s, 22.7MB/s, 22.6MB/s

The detailed results are as follows:
With ts collector :
5.193 micros/op 192573 ops/sec 22.8 MB/S
5.263 micros/op 190004 ops/sec 22.5 MB/S
5.266 micros/op 189885 ops/sec 22.5 MB/S
5.242 micros/op 190750 ops/sec 22.6 MB/S
5.228 micros/op 191274 ops/sec 22.6 MB/S

Without ts collector :
5.256 micros/op 190257 ops/sec 22.5 MB/s
5.298 micros/op 188735 ops/sec 22.3 MB/s
5.229 micros/op 191234 ops/sec 22.6 MB/s
5.204 micros/op 192158 ops/sec 22.7 MB/s
5.233 micros/op 191098 ops/sec 22.6 MB/s

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sunlike-Lipo for the PR. Left some final minor comments before stamping.

// internal key when Add() is invoked.
//
// @param cmp the Comparator to compare timestamp of key by the
// Comparator::CompareTimestamp function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but the indentation here seems weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 154 to 155
properties->insert({"rocksdb.min_timestamp", min_timestamp_});
properties->insert({"rocksdb.max_timestamp", max_timestamp_});
Copy link
Contributor

@riversand963 riversand963 Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used sst_dump to dump the properties of generated SST files, and I can see

...
 # rocksdb.max_timestamp: 0x03000000000000000000000000000000
  # rocksdb.merge.operands: 0x00
  # rocksdb.min_timestamp: 0x03000000000000000000000000000000

This makes me wonder whether we can instead use rocksdb.timestamp_min and rocksdb.timestamp_max so that they can be next to each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

Comment on lines 164 to 165
return {{"rocksdb.min_timestamp", Slice(min_timestamp_).ToString(true)},
{"rocksdb.max_timestamp", Slice(max_timestamp_).ToString(true)}};
Copy link
Contributor

@riversand963 riversand963 Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous one.
Can we instead use rocksdb.timestamp_min and rocksdb.timestamp_max so that they can be next to each other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

Comment on lines 170 to 171
std::string max_timestamp_;
std::string min_timestamp_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these two be explicitly initialized to kDisableTimestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

ExtractTimestampFromUserKey(user_key, cmp_->timestamp_size());
if (max_timestamp_ == kDisableUserTimestamp ||
cmp_->CompareTimestamp(timestamp_in_key, max_timestamp_) > 0) {
max_timestamp_ = timestamp_in_key.ToString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do

max_timestamp_.assign(timestamp_in_key.data(), timestamp_in_key.size());

will it save us some string creation and copy caused by calling ToString()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

}
if (min_timestamp_ == kDisableUserTimestamp ||
cmp_->CompareTimestamp(min_timestamp_, timestamp_in_key) > 0) {
min_timestamp_ = timestamp_in_key.ToString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

class TimestampTablePropertiesCollector : public IntTblPropCollector {
public:
explicit TimestampTablePropertiesCollector(const Comparator* cmp)
: cmp_(cmp) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we can explicitly initialize max_timestamp_ and min_timestamp_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@sunlike-Lipo
Copy link
Contributor Author

sunlike-Lipo commented Nov 19, 2021

@riversand963
I made some tests again after changing string = to string.assign

The db_bench result are as follows:
With ts collector :
5.175 micros/op 193250 ops/sec 22.9 MB/S
5.149 micros/op 194219 ops/sec 23.0 MB/S
5.223 micros/op 191443 ops/sec 22.6 MB/S
5.167 micros/op 193534 ops/sec 22.9 MB/S
5.150 micros/op 194183 ops/sec 23.0 MB/S

Without ts collector :
5.209 micros/op 191963 ops/sec 22.7 MB/s
5.201 micros/op 192256 ops/sec 22.7 MB/s
5.147 micros/op 194269 ops/sec 23.0 MB/s
5.230 micros/op 191211 ops/sec 22.6 MB/s
5.269 micros/op 189781 ops/sec 22.4 MB/s

The standalone test from sunlike-Lipo@50a1152 result are as follows:
use string.assign

value_size=10 value_size=100 value_size=1000
collect-ts 3851ms 1219ms 202ms
not-collect-ts 3745ms 1181ms 189ms

use string.operator=

value_size=10 value_size=100 value_size=1000
collect-ts 3868ms 1239ms 204ms
not-collect-ts 3767ms 1189ms 195ms

It seems there is little difference between the result using string.assign and using string.operator=, perhaps the compiler is
clever enough to optmize this.
Any way, I'll use string.assign because it ought to be faster than string.operator=.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 merged this pull request in e12753e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track each SST's timestamp information as user properties
5 participants