Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only compute the sibling of a SERVICE once #1556

Merged
merged 14 commits into from
Oct 26, 2024

Conversation

UNEXENU
Copy link
Contributor

@UNEXENU UNEXENU commented Oct 15, 2024

Previously, the sibling of a SERVICE (see #1341 for details) was computed twice. This was not a problem for small siblings result, because they were cached, making the second computation actually a cheap cache read. It was however a problem if the result was too large for the cache. It then was computed twice, although only one of the computations was actually used, because the SERVICE discarded the result because it was too large.
This commit fixes this behavior. The sibling is now only computed once, and only gets explicitly shared with the SERVICE if it is sufficiently small.

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.07%. Comparing base (db089f8) to head (d991ef1).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1556      +/-   ##
==========================================
+ Coverage   89.03%   89.07%   +0.03%     
==========================================
  Files         368      368              
  Lines       34183    34227      +44     
  Branches     3860     3866       +6     
==========================================
+ Hits        30434    30486      +52     
+ Misses       2482     2480       -2     
+ Partials     1267     1261       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial approach looks clean (especially not too much code), but we need support for laziness.
Let me know if you need help etc.

Comment on lines 492 to 520
// ____________________________________________________________________________
void Service::precomputeSiblingResult(std::shared_ptr<Operation> left,
std::shared_ptr<Operation> right,
bool rightOnly) {
AD_CORRECTNESS_CHECK(left && right);
auto a = std::dynamic_pointer_cast<Service>(left);
auto b = std::dynamic_pointer_cast<Service>(right);

if ((rightOnly && !b) ||
(!rightOnly && static_cast<bool>(a) == static_cast<bool>(b))) {
return;
}

const auto& [service, sibling] = [&]() {
if (a) {
return std::tie(a, right);
} else {
AD_CORRECTNESS_CHECK(b);
return std::tie(b, left);
}
}();

auto siblingResult = sibling->getResult();
sibling->precomputedResultBecauseSiblingOfService_ = siblingResult;
if (siblingResult->idTable().size() <=
RuntimeParameters().get<"service-max-value-rows">()) {
service->precomputedSiblingResult_ = siblingResult;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although currently neither of the JOIN-like operations support laziness, we still need to prepare the support here.
This function should take another parameter requestLaziness which is then passed to the getResult() call.
If the result is in fact lazy (akak not fullyMaterialized, then you basically store the returned IdTables until either you've consumed everything, or you have too many rows. Then you create a new result with a new generator that first yields all the blocks you've already retrieved and then continues consuming the downstream generator.
For the service result you can just concatenate the tables to a single IdTable to get a materialized result.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pass on everything but the tests.

@@ -92,6 +93,12 @@ string OptionalJoin::getDescriptor() const {
ProtoResult OptionalJoin::computeResult([[maybe_unused]] bool requestLaziness) {
LOG(DEBUG) << "OptionalJoin result computation..." << endl;

// If the right of the RootOperations is a Service, precompute it's sibling's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the right of the RootOperations is a Service, precompute it's sibling's
// If the right of the RootOperations is a Service, precompute the result of its sibling.

Similar typo (its vs it's ) in the other operations.

Comment on lines 505 to 510
auto skipSortOperation = [](std::shared_ptr<Operation>& op) {
auto children = op->getChildren();
if (children.size() == 1) {
op = children[0]->getRootOperation();
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely wrong. There are many other operations with only one child, in particular GROUP BY, DISTINCT, FILTER.... So you have to check whether it's a sort via a dynamic cast.

if ((rightOnly && !static_cast<bool>(b)) ||
(!rightOnly && static_cast<bool>(a) == static_cast<bool>(b))) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both are service requests, you could still in principle compute one of them here, and hope that it's the smaller one.
At least some comment should be left here. IF in the future we figure out a way to better estimate the result size of a service, we could then use the smaller estimate here. But that's not important for now.
As said: A comment why we currently don't support two services should be there at least.

: ComputationMode::FULLY_MATERIALIZED);

if (siblingResult->isFullyMaterialized()) {
sibling->precomputedResultBecauseSiblingOfService_ = siblingResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do this step as the second one, then you can std::move(...) the siblingResult.

Comment on lines 557 to 559
cppcoro::generator<Result::IdTableVocabPair> prevGenerator,
cppcoro::generator<Result::IdTableVocabPair>::iterator it)
-> cppcoro::generator<Result::IdTableVocabPair> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an alias inside the Result class for this long type (Result::Generator) or something, please use it.

rows += pair.idTable_.size();
resultPairs.push_back(std::move(pair));

if (rows > RuntimeParameters().get<"service-max-value-rows">()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could access that constant once in advance, and not in the loop. (minor optimization).

sibling->getExecutionContext()->getAllocator()},
LocalVocab{});

for (auto& pair : resultPairs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know or can easily compute the total number of rows and the reserve the idTable.

@@ -109,11 +95,20 @@ class Service : public Operation {
static std::optional<std::string> idToValueForValuesClause(
const Index& index, Id id, const LocalVocab& localVocab);

// Given two child-operations of a Join-, OptionalJoin- or Minus operation,
// this method tries to precompute the result of one if the other one
// (it's sibling) is a Service operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (it's sibling) is a Service operation.
// (its sibling) is a Service operation.

// Given two child-operations of a Join-, OptionalJoin- or Minus operation,
// this method tries to precompute the result of one if the other one
// (it's sibling) is a Service operation.
// If rightOnly is true (used by OptionalJoin and Minus),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If rightOnly is true (used by OptionalJoin and Minus),
// If `rightOnly` is true (used by `OptionalJoin` and `Minus`),

(try to use this convention also consistently for your other comments)

idTable.clone(), std::vector<ColumnIndex>{0}, LocalVocab{});
operation.precomputedResultBecauseSiblingOfService_ =
std::make_optional(result);
ASSERT_EQ(operation.getResult(), result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also test that it is cleared after returning it.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very very minor nitpicky requests.
Almost ready to merge, great!

src/engine/Operation.h Show resolved Hide resolved
auto children = op->getChildren();
if (children.size() == 1) {
if (static_cast<bool>(std::dynamic_pointer_cast<Sort>(op))) {
auto children = op->getChildren();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto children = op->getChildren();
const auto& children = op->getChildren();

probably suffices (assuming that getRootOperation is const. else leave it and tell me).

@sparql-conformance
Copy link

Copy link

sonarqubecloud bot commented Oct 26, 2024

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joka921 joka921 changed the title Fix computing SERVICE Sibling twice Only compute the sibling of a SERVICE once Oct 26, 2024
@joka921 joka921 merged commit 9e65d28 into ad-freiburg:master Oct 26, 2024
22 checks passed
joka921 pushed a commit that referenced this pull request Dec 6, 2024
This fixes #1659, an issue introduced in #1556 that made the precomputation of a `SERVICE` sibling in `JOIN` operations impossible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants