Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2157 Avoid to decompress Strings while sorting them. Instead use fast comparison provided by the interner #7892

Merged
merged 24 commits into from
Aug 1, 2024

Conversation

nicola-cab
Copy link
Member

@nicola-cab nicola-cab commented Jul 16, 2024

What, How & Why?

This PR is about comparing the compressed string views via their string ids, while sorting results for queries.
Right now, we are decompressing strings and then sorting, which is more expensive.

@nicola-cab nicola-cab added no-changelog no-jira-ticket Skip checking the PR title for Jira reference and removed no-jira-ticket Skip checking the PR title for Jira reference labels Jul 18, 2024
Copy link

coveralls-official bot commented Jul 18, 2024

Pull Request Test Coverage Report for Build nicola.cabiddu_1882

Details

  • 384 of 394 (97.46%) changed or added relevant lines in 8 files are covered.
  • 40 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+0.009%) to 90.966%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/obj.cpp 18 19 94.74%
src/realm/obj.hpp 0 3 0.0%
src/realm/sort_descriptor.hpp 6 12 50.0%
Files with Coverage Reduction New Missed Lines %
src/realm/dictionary.cpp 1 85.52%
src/realm/index_string.hpp 1 93.48%
test/test_index_string.cpp 1 93.33%
src/realm/sync/noinst/client_impl_base.cpp 2 82.27%
src/realm/sync/noinst/server/server.cpp 2 73.49%
test/object-store/sync/client_reset.cpp 2 99.0%
src/realm/bplustree.cpp 6 73.08%
src/realm/index_string.cpp 8 84.63%
test/fuzz_group.cpp 17 45.44%
Totals Coverage Status
Change from base Build jorgen.edelbo_372: 0.009%
Covered Lines: 219716
Relevant Lines: 241536

💛 - Coveralls

@nicola-cab nicola-cab changed the title RCORE-2157 WIP: Avoid to decompress Strings while sorting them. Instead use fast comparison provided by the interner RCORE-2157 Avoid to decompress Strings while sorting them. Instead use fast comparison provided by the interner Jul 18, 2024
@nicola-cab nicola-cab marked this pull request as ready for review July 18, 2024 15:54
@nicola-cab nicola-cab requested a review from ironage July 18, 2024 16:00
src/realm/path.hpp Outdated Show resolved Hide resolved
@@ -434,20 +476,33 @@ bool BaseDescriptor::Sorter::operator()(IndexPair i, IndexPair j, bool total_ord
const auto& obj = m_columns[t].table->get_object(key_i);
const auto& col_key = m_columns[t].col_key;

cache_i.value = col_key.get_value(obj);
// store stringID instead of the actual string if possible
cache_i.cached_string_id = col_key.get_string_id(obj);
Copy link
Contributor

@ironage ironage Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add unnecessary overhead per comparison when sorting on a non-string column. Ideally instead of doing the check inside the loop, we would have two separate specialized loops to keep other types fast. This might be complex to refactor though because there are already multiple optimizations piled in here.
ie. instead of this pattern:

for (..) {
    if interned string 
        do special comparison
    else 
        do value comparison
    }
}

we should do something like this:

if column supports string interning
    for (...) {
        do special intern comparison if possible
    }
} else {
    for (...) {
        do normal value comparison
    }
}

I think we should have some benchmarks on sorting integers that can back up my claim, I wonder how much this change costs them?

Copy link
Member Author

@nicola-cab nicola-cab Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of fetching the String ID for a non-mixed/non-string properties is only the cost of checking that the property type may have a string interner associated (check on the column type) + an extra check for the optional type.
Also in the code, we iterate column by column, so the check per col type can only be perform when we are checking the column itself. I can try to run the benchmarks and see..

Copy link
Member Author

@nicola-cab nicola-cab Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think statically it does not make a huge difference, probably we are going to slow down a little bit sorting for all properties that are not strings or mixed, but, it is really difficult to gauge and also doesn't it depend on also by the data we have?

Req runs:    6  SortInt (MemOnly, EncryptionOff):             min  57.16ms (+3.73%)            max  76.90ms (+30.61%)           med  66.13ms (+17.93%)           avg  66.87ms (+18.40%)           stddev  8.98ms (+586.99%)     

Req runs:    5  SortIntList (MemOnly, EncryptionOff):         min 321.11ms (+15.11%)           max 397.76ms (+27.72%)           med 333.63ms (+9.98%)            avg 353.77ms (+18.18%)           stddev 36.49ms (+194.74%)     

Req runs:   10  SortIntDictionary (MemOnly, EncryptionOff):     min  37.68ms (-0.91%)            max     56ms (+7.95%)            med  40.05ms (-5.75%)            avg  42.62ms (-1.18%)            stddev  6.44ms (+38.90%)      

Req runs:  225  SortThenLimit (MemOnly, EncryptionOff):         min   1.89ms (+14.04%)           max   3.82ms (+44.19%)           med   2.09ms (+21.53%)           avg   2.17ms (+23.60%)           stddev   284us (+144.66%) 

Another run (same ref data as starting cmp input)


Req runs:    7  SortInt (MemOnly, EncryptionOff):             min  56.62ms (+2.75%)            max  64.11ms (+8.90%)            med  57.26ms (+2.11%)            avg  58.38ms (+3.36%)            stddev  2.67ms (+104.06%)     

Req runs:    5  SortIntList (MemOnly, EncryptionOff):         min 298.46ms (+6.99%)            max 332.43ms (+6.74%)            med 304.79ms (+0.47%)            avg 309.70ms (+3.46%)            stddev 13.23ms (+6.88%)       

Req runs:   11  SortIntDictionary (MemOnly, EncryptionOff):     min  36.85ms (-3.08%)            max  39.14ms (-24.55%)           med  37.59ms (-11.52%)           avg  37.72ms (-12.54%)           stddev   629us (-86.44%)      

Req runs:  241  SortThenLimit (MemOnly, EncryptionOff):         min   1.87ms (+12.72%)           max   3.40ms (+28.36%)           med   1.98ms (+14.96%)           avg   2.06ms (+17.05%)           stddev   228us (+95.75%)   

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a pretty big perf regression to me?

Copy link
Member Author

@nicola-cab nicola-cab Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug in my code, I was constantly passing by copy the Extended Column, these are the right perfs:

Req runs:    8  SortInt (MemOnly, EncryptionOff):             min  58.66ms (+6.49%)            max  66.55ms (+11.33%)           med  60.24ms (+4.87%)            avg  61.30ms (+6.42%)            stddev  3.06ms (+70.90%)      

Req runs:    5  SortIntList (MemOnly, EncryptionOff):         min 290.56ms (+3.83%)            max 304.51ms (-1.68%)            med 302.37ms (+3.53%)            avg 299.28ms (+1.51%)            stddev  6.12ms (-45.79%)      

Req runs:   10  SortIntDictionary (MemOnly, EncryptionOff):     min  37.89ms (-3.70%)            max  45.80ms (-12.86%)           med  41.27ms (-0.43%)            avg  41.59ms (-3.61%)            stddev  2.53ms (-36.80%)      

Req runs:  231  SortThenLimit (MemOnly, EncryptionOff):         min   1.84ms (+10.23%)           max   2.93ms (-24.94%)           med   1.95ms (+6.03%)            avg   1.98ms (+1.44%)            stddev   144us (-58.33%)      

I am still failing to see why this is a massive problem for non-mixed or non-string columns. It does not seem like it is to me, but I must be wrong. Because you are both complaining about this :-)

I agree that Mixed and Strings are going to be impacted, though. Especially Mixed, because a mixed can hold anything in it…

Copy link
Member Author

@nicola-cab nicola-cab Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I tried BenchmarkSort already. And it is slower. But at the same time it is not the best benchmark either IMO.

Req runs: 5 Sort (MemOnly, EncryptionOff): * min 207.91ms (+72.16%) max 214.60ms (+48.24%) * med 210.27ms (+62.98%) * avg 211.07ms (+60.69%) stddev 2.99ms (-65.96%)

Mainly for these 3 reasons:

  1. The internal DS that the string interner is keeping is not lock free, we are constantly grabbing a lock. Especially for sorting stuff, it is a real problem (since we may be calling the sorting operator a lot) . Jira to cover this: (https://jira.mongodb.org/browse/RCORE-2214)

  2. We are constantly loading a lot of data when we need to access the compressed string leaves in the Trie we built, we need to improve the internal DS: (https://jira.mongodb.org/browse/RCORE-2125)

  3. BenchmarkSort is writing 200k randomly generated strings which are representing some number. This is very much the worse scenario for interning stuff. Also, in the feature branch we are not limiting compression of strings like we do for integers (where we check if there is a sizeable gain before compressing). We just compress all (I suspect we will introduce something).

I've added a benchmark for testing the sorting of strings that are in compressed format (as you suggested). Passing 2 parameters to the test:

  • N length of the strings to store
  • M the number of duplicated strings

You can see that already, with all these limitations, there is a point after that it pays off to sort using the compressed string ID.

Req runs:    5  SortCompressedSmall(10,500) (MemOnly, EncryptionOff):   * min 124.38ms (+30.21%)           max 143.33ms (+17.40%)         * med 138.98ms (+27.34%)         * avg 136.07ms (+24.38%)           stddev  7.89ms (-27.27%)      

Req runs:    5  SortCompressedMedium(50,500) (MemOnly, EncryptionOff):   * min 125.46ms (+11.46%)           max 131.65ms (+9.01%)          * med 128.17ms (+12.81%)         * avg 128.03ms (+11.23%)           stddev  2.47ms (-25.63%)      

Req runs:    5  SortCompressedLarge(100,500) (MemOnly, EncryptionOff):   * min 128.91ms (+16.33%)           max 145.32ms (+26.75%)         * med 131.26ms (+17.72%)         * avg 134.01ms (+19.24%)           stddev  6.69ms (+302.40%)     

Req runs:    5  SortCompressedLarge(1000,5000) (MemOnly, EncryptionOff):     min 196.71ms (-30.88%)           max 208.25ms (-36.51%)           med 202.42ms (-29.76%)           avg 202.50ms (-31.61%)           stddev  4.09ms (-77.60%)      

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the non-string benchmarks, we appear to be less than 10% worse which I think seems reasonable. Ideally, that code would not be affected at all, and there may be a way to do this by templating out the non-string sorting, but I'm not sure it is worth the duplication that will create.

For the BenchmarkSort case, I don't think we should ship this feature until we get that regression down by quite a bit. Can you create a ticket for the third item you suggested so that we don't forget about it? As long as we are tracking those 3 approaches and get them and recheck the benchmarks later, it shouldn't block this PR now.

Copy link
Member Author

@nicola-cab nicola-cab Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I agree.

  1. There is a bit of tech debt in the sorting logic. I tried to make the code templated based on column type, but I quit, because the code was basically duplicated, also we need to extract the column type in at least 2 cases. First one when we set the index pair and the other one when we are sorting the index pairs. I concluded it was too much hassle. But maybe there is a better way...

  2. Yes I agree with you for string benchmarks, in fact all the code is not going into next major. We do have already a jira for this (https://jira.mongodb.org/browse/RCORE-2125). I will drop a comment in order to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact all the code is not going into next major

What do you mean by this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: that the code is not finished, we need to work on the optimizations I mentioned. The feature branch is just our base. Once we complete all the features, it will of course go into next major, like we did for integers.

test/test_query.cpp Outdated Show resolved Hide resolved
@nicola-cab nicola-cab force-pushed the nc/RCORE-2157 branch 7 times, most recently from 0f8e791 to 5841658 Compare July 30, 2024 13:45
return -interner->compare(m_j.get_string(), (StringID)m_i.get_int());
}

// 4. compare string vs any other non-string (this is going to be slow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one side is a string and the other isn't then we don't actually need to fetch the string value because the Mixed comparison is only about the type ordering, not the values. So you could speed this up by just using a constant empty string.

return distribution(generator);
}

static std::string gen_random_string(size_t length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to test random strings. I'd love to see some utf8 strings tested as well. My suggestion would be to look at the tests such as UTF8_Compare_Strings (and the others nearby) where you could modify the str_compare lambda to use compression as well and verify that both modes produce the same sort order.

@ironage
Copy link
Contributor

ironage commented Jul 30, 2024

Could you add a benchmark for sorting compressed strings? The sorting logic is getting very messy and is due to a clean up, and having a benchmark in place for when that happens would be very helpful.

@@ -169,6 +171,45 @@ TEST(UTF8_Compare_Core_utf8)
CHECK_EQUAL(false, str_compare(u16sur2, u16sur2));
}

TEST(UTF8_Compare_Core_utf8_interned_strings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to not make a copy of the tests. I'd suggest modifying the existing tests to do the interned sort inside of the str_compare lambda and check that it is the same result as the non-interned comparison. The benefit of doing it that way, is that we can easily add more values to both cases, and we can directly see if there are any differences in the comparisons rather than doing a diff across tests.

// M is the numer of times we want store the string (number of dups)
template <size_t N, size_t M>
struct BenchmarkSortCompressed : BenchmarkWithStrings {
mutable std::string _name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this mutable? the leading underscore is also unusual for our naming conventions.

@@ -1630,6 +1630,72 @@ struct BenchmarkSort : BenchmarkWithStrings {
}
};

// benchmark for testing compressed string sorting.
// N is the size of the string to generate
// M is the numer of times we want store the string (number of dups)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// M is the numer of times we want store the string (number of dups)
// M is the number of times we want store the string (number of dups)

: BenchmarkWithStrings()
{
if constexpr (N <= 15) {
_name = util::format("SortCompressedSmall(%1,%2)", std::to_string(N), std::to_string(M));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the std::to_string(N) necessary? I would expect util::format to do the right thing with a constant number.


void before_all(DBRef group)
{
BenchmarkWithStringsTable::before_all(group);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this adds a bunch of strings to the table, do you really want those in there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no.. removed.


Obj obj = t->create_object();
obj.set<StringData>(m_col, str);
m_keys.push_back(obj.get_key());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_keys appears to be unused?

@@ -2623,10 +2623,6 @@ TEST(Test_Commit_Compression_Strings)
auto set_s = obj.get_set<String>(col_key_set_string);
auto dictionary_s = obj.get_dictionary(col_key_dict_string);

CHECK_EQUAL(list_s.size(), i + 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this change, but in one of the N runs I happened to hit a generation of the same string twice. Checking the size is not smart, It is OK to just fetch the value.


parent.destroy_deep();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but I'd suggest using a DeepArrayDestroyGuard on the parent array at the top instead.

Comment on lines 273 to 274
auto id1 = interner.lookup(a);
auto id2 = interner.lookup(b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could intern the strings on the fly, if they don't exist already. Then you don't need to worry about adding them manually. That should reduce the code duplication and make it easier to add more strings in the future.

With this pattern, you should be able to easily adapt the UTF8_Compare_Strings test as well.

Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now. The discussion here assures that this feature won't be shipped without another analysis of the performance numbers.

@nicola-cab nicola-cab merged commit 9278ebe into feature/string-compression Aug 1, 2024
20 of 21 checks passed
@nicola-cab nicola-cab deleted the nc/RCORE-2157 branch August 1, 2024 18:46
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants