-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only compute the sibling of a SERVICE
once
#1556
Only compute the sibling of a SERVICE
once
#1556
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1556 +/- ##
==========================================
+ Coverage 89.03% 89.07% +0.03%
==========================================
Files 368 368
Lines 34183 34227 +44
Branches 3860 3866 +6
==========================================
+ Hits 30434 30486 +52
+ Misses 2482 2480 -2
+ Partials 1267 1261 -6 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial approach looks clean (especially not too much code), but we need support for laziness.
Let me know if you need help etc.
src/engine/Service.cpp
Outdated
// ____________________________________________________________________________ | ||
void Service::precomputeSiblingResult(std::shared_ptr<Operation> left, | ||
std::shared_ptr<Operation> right, | ||
bool rightOnly) { | ||
AD_CORRECTNESS_CHECK(left && right); | ||
auto a = std::dynamic_pointer_cast<Service>(left); | ||
auto b = std::dynamic_pointer_cast<Service>(right); | ||
|
||
if ((rightOnly && !b) || | ||
(!rightOnly && static_cast<bool>(a) == static_cast<bool>(b))) { | ||
return; | ||
} | ||
|
||
const auto& [service, sibling] = [&]() { | ||
if (a) { | ||
return std::tie(a, right); | ||
} else { | ||
AD_CORRECTNESS_CHECK(b); | ||
return std::tie(b, left); | ||
} | ||
}(); | ||
|
||
auto siblingResult = sibling->getResult(); | ||
sibling->precomputedResultBecauseSiblingOfService_ = siblingResult; | ||
if (siblingResult->idTable().size() <= | ||
RuntimeParameters().get<"service-max-value-rows">()) { | ||
service->precomputedSiblingResult_ = siblingResult; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although currently neither of the JOIN-like operations support laziness, we still need to prepare the support here.
This function should take another parameter requestLaziness
which is then passed to the getResult()
call.
If the result is in fact lazy (akak not fullyMaterialized
, then you basically store the returned IdTables until either you've consumed everything, or you have too many rows. Then you create a new result with a new generator that first yields all the blocks you've already retrieved and then continues consuming the downstream generator.
For the service result you can just concatenate the tables to a single IdTable to get a materialized result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pass on everything but the tests.
src/engine/OptionalJoin.cpp
Outdated
@@ -92,6 +93,12 @@ string OptionalJoin::getDescriptor() const { | |||
ProtoResult OptionalJoin::computeResult([[maybe_unused]] bool requestLaziness) { | |||
LOG(DEBUG) << "OptionalJoin result computation..." << endl; | |||
|
|||
// If the right of the RootOperations is a Service, precompute it's sibling's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the right of the RootOperations is a Service, precompute it's sibling's | |
// If the right of the RootOperations is a Service, precompute the result of its sibling. |
Similar typo (its vs it's ) in the other operations.
src/engine/Service.cpp
Outdated
auto skipSortOperation = [](std::shared_ptr<Operation>& op) { | ||
auto children = op->getChildren(); | ||
if (children.size() == 1) { | ||
op = children[0]->getRootOperation(); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely wrong. There are many other operations with only one child, in particular GROUP BY, DISTINCT, FILTER.... So you have to check whether it's a sort via a dynamic cast
.
if ((rightOnly && !static_cast<bool>(b)) || | ||
(!rightOnly && static_cast<bool>(a) == static_cast<bool>(b))) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both are service requests, you could still in principle compute one of them here, and hope that it's the smaller one.
At least some comment should be left here. IF in the future we figure out a way to better estimate the result size of a service, we could then use the smaller estimate here. But that's not important for now.
As said: A comment why we currently don't support two services should be there at least.
src/engine/Service.cpp
Outdated
: ComputationMode::FULLY_MATERIALIZED); | ||
|
||
if (siblingResult->isFullyMaterialized()) { | ||
sibling->precomputedResultBecauseSiblingOfService_ = siblingResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do this step as the second one, then you can std::move(...)
the siblingResult
.
src/engine/Service.cpp
Outdated
cppcoro::generator<Result::IdTableVocabPair> prevGenerator, | ||
cppcoro::generator<Result::IdTableVocabPair>::iterator it) | ||
-> cppcoro::generator<Result::IdTableVocabPair> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an alias inside the Result
class for this long type (Result::Generator) or something, please use it.
src/engine/Service.cpp
Outdated
rows += pair.idTable_.size(); | ||
resultPairs.push_back(std::move(pair)); | ||
|
||
if (rows > RuntimeParameters().get<"service-max-value-rows">()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could access that constant once in advance, and not in the loop. (minor optimization).
sibling->getExecutionContext()->getAllocator()}, | ||
LocalVocab{}); | ||
|
||
for (auto& pair : resultPairs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know or can easily compute the total number of rows and the reserve
the idTable
.
src/engine/Service.h
Outdated
@@ -109,11 +95,20 @@ class Service : public Operation { | |||
static std::optional<std::string> idToValueForValuesClause( | |||
const Index& index, Id id, const LocalVocab& localVocab); | |||
|
|||
// Given two child-operations of a Join-, OptionalJoin- or Minus operation, | |||
// this method tries to precompute the result of one if the other one | |||
// (it's sibling) is a Service operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// (it's sibling) is a Service operation. | |
// (its sibling) is a Service operation. |
src/engine/Service.h
Outdated
// Given two child-operations of a Join-, OptionalJoin- or Minus operation, | ||
// this method tries to precompute the result of one if the other one | ||
// (it's sibling) is a Service operation. | ||
// If rightOnly is true (used by OptionalJoin and Minus), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If rightOnly is true (used by OptionalJoin and Minus), | |
// If `rightOnly` is true (used by `OptionalJoin` and `Minus`), |
(try to use this convention also consistently for your other comments)
test/OperationTest.cpp
Outdated
idTable.clone(), std::vector<ColumnIndex>{0}, LocalVocab{}); | ||
operation.precomputedResultBecauseSiblingOfService_ = | ||
std::make_optional(result); | ||
ASSERT_EQ(operation.getResult(), result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also test that it is cleared after returning it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two very very minor nitpicky requests.
Almost ready to merge, great!
src/engine/Service.cpp
Outdated
auto children = op->getChildren(); | ||
if (children.size() == 1) { | ||
if (static_cast<bool>(std::dynamic_pointer_cast<Sort>(op))) { | ||
auto children = op->getChildren(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto children = op->getChildren(); | |
const auto& children = op->getChildren(); |
probably suffices (assuming that getRootOperation
is const. else leave it and tell me).
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
SERVICE
once
Previously, the sibling of a
SERVICE
(see #1341 for details) was computed twice. This was not a problem for small siblings result, because they were cached, making the second computation actually a cheap cache read. It was however a problem if the result was too large for the cache. It then was computed twice, although only one of the computations was actually used, because the SERVICE discarded the result because it was too large.This commit fixes this behavior. The sibling is now only computed once, and only gets explicitly shared with the
SERVICE
if it is sufficiently small.