Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide framework for generic lazily evaluated operation results #1350

Merged
merged 152 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from 149 commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
80667bd
Rename ResultTable -> Result
RobinTF Apr 19, 2024
31b2c11
Wrap idTable in variant
RobinTF Apr 19, 2024
4d0204c
Add ability to create `Result` from generator
RobinTF Apr 19, 2024
515ed0c
Start fixing caching issues
RobinTF Apr 22, 2024
ca1cbed
Avoid another class of exceptions
RobinTF Apr 22, 2024
9e7f3cb
Optimize imports
RobinTF Apr 23, 2024
4c75d42
Introduce ReusableGenerator class
RobinTF Apr 23, 2024
892e4a5
Try to make caching work
RobinTF Apr 23, 2024
586365c
Fiddle around with const a bit
RobinTF Apr 23, 2024
80e2dbd
Add more TODOs
RobinTF Apr 23, 2024
18ca5b1
Fix TextLimit code after rebase
RobinTF Apr 28, 2024
86a9f4b
Fix compilation issues for ReusableGenerator
RobinTF Apr 28, 2024
7f0a5e7
Remove offset calculations from exporter
RobinTF Apr 28, 2024
aee20dd
Fix typo
RobinTF Apr 28, 2024
7576b2e
Add comments
RobinTF Apr 28, 2024
7765a25
Make supportsLimit private to avoid misuse
RobinTF Apr 28, 2024
f815be8
Properly use minimum limit if present
RobinTF May 1, 2024
90cca50
Start adding code to manipulate code after cache extraction
RobinTF May 1, 2024
694c21f
Implement fallback mechanism for failed cache share
RobinTF May 5, 2024
ea8b81f
Fix accidental edit of Usage.md
RobinTF May 5, 2024
50e4529
Consume result as master
RobinTF May 5, 2024
16eedd8
Add proper condition variables
RobinTF May 10, 2024
bf8f085
Implement code that allows for proper recomputation of cache size
RobinTF May 10, 2024
771eb5b
Refactor a bit
RobinTF May 12, 2024
8aa9060
Aggregate tables at the end of lazy results
RobinTF May 12, 2024
b499c6e
Overload constructor of Result class
RobinTF May 17, 2024
6b3f05c
Try to properly calculate duration
RobinTF May 18, 2024
ff5a6ea
Apply formatting
RobinTF May 18, 2024
b974c7d
Fix compilation on gcc 11 and gcc 12
RobinTF May 18, 2024
8b99020
Add correct visibility modifiers
RobinTF May 18, 2024
5f8ab65
Try fixing the compilation issue for real this time
RobinTF May 18, 2024
3a95ef5
Try to fix compilation issue on macOS
RobinTF May 18, 2024
e2dc667
Implement PoC lazy operation for index scan and filter operations
RobinTF May 20, 2024
070494f
Fix double limit offset row
RobinTF May 20, 2024
f631c13
Fix wrong assertion
RobinTF May 20, 2024
c249628
Properly request lazy results when limit clause is present
RobinTF May 20, 2024
ea7fd79
Fix bugs and segfaults
RobinTF May 20, 2024
1e9d0d6
Formatting
RobinTF May 21, 2024
dfb7ba6
Apply small refactoring change
RobinTF May 21, 2024
7d01e59
Correct wrong order of ternary statement
RobinTF May 22, 2024
5b2335a
Add TODO
RobinTF May 22, 2024
93a5892
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF May 23, 2024
9c445ea
Change how maxSend works
RobinTF May 23, 2024
b9ca4aa
Correct call order
RobinTF May 23, 2024
43dddd0
Correct call order
RobinTF May 23, 2024
4ac7892
Rethink approach to apply limits and offset
RobinTF May 27, 2024
ef17e67
Add back headers
RobinTF May 27, 2024
aabb81b
Add back result limiter for subqueries
RobinTF Jun 1, 2024
66a38b4
Try to fix subtle bug with runtime information detail
RobinTF Jun 1, 2024
999baee
Merge branch 'max-send-changes' into refactor-result-table
RobinTF Jun 5, 2024
c291ff7
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jun 5, 2024
9f17e07
Add back comment
RobinTF Jun 5, 2024
389f3f1
Rename `resultTable` -> `result`
RobinTF Jun 5, 2024
000af28
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jun 6, 2024
ba142a0
Add correctness check to prevent double move due to race condition
RobinTF Jun 9, 2024
44562c7
Start implementing tests for new cache feature and fixing bugs along …
RobinTF Jun 13, 2024
0f3a59a
Some Test cleanup
RobinTF Jun 13, 2024
d226849
Mark variable as maybe_unused
RobinTF Jun 13, 2024
552a268
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jun 13, 2024
cde135a
Restructure recomputeSize a bit to avoid unwanted behaviour
RobinTF Jun 13, 2024
cf6b4c9
Add remaining cache tests
RobinTF Jun 14, 2024
b2138bf
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jun 14, 2024
0c589e3
Add tests for `IteratorWrapper`
RobinTF Jun 14, 2024
c465685
Fix line endings
RobinTF Jun 15, 2024
d17fc7d
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jun 28, 2024
93c2360
Add tests for CacheableGenerator
RobinTF Jun 29, 2024
15b435e
Add Filter tests
RobinTF Jun 29, 2024
633bf06
Clear Cache before running tests
RobinTF Jun 29, 2024
6d5a95e
Add test to fix coverage
RobinTF Jun 30, 2024
55b4fec
Address some sonarcloud issues
RobinTF Jun 30, 2024
e5ceacc
Add tests for ExportQueryExecutionTrees
RobinTF Jun 30, 2024
d172dc8
Divide Result class into 3 dedicated classes
RobinTF Jul 5, 2024
5b004b8
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jul 5, 2024
b95edfd
Remove parameter for supportsLimit
RobinTF Jul 5, 2024
acc99c3
Fix formatting
RobinTF Jul 5, 2024
7900619
Format again
RobinTF Jul 5, 2024
0d0133a
Also perform definedness check for lazy results
RobinTF Jul 5, 2024
27ab692
Drop definedness caching mechanism
RobinTF Jul 5, 2024
2da169f
Add comment
RobinTF Jul 5, 2024
0cbb47d
Split lambdas into dedicated functions
RobinTF Jul 6, 2024
5adda07
Make move/copy constructors explicit
RobinTF Jul 6, 2024
e0cdf18
Fix undefined behaviour
RobinTF Jul 6, 2024
5ad5b8a
Workaround segfault
RobinTF Jul 6, 2024
db187f0
Try different attempt to fix double locking
RobinTF Jul 7, 2024
4ba81bd
Avoid pseudo false-positive thread sanitizer warning
RobinTF Jul 7, 2024
373f009
Restructure code to avoid class of race conditions
RobinTF Jul 11, 2024
27e451e
Clarify currently buggy behaviour
RobinTF Jul 11, 2024
96982aa
Fix macOS build
RobinTF Jul 11, 2024
58a193d
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jul 23, 2024
93a71cc
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jul 24, 2024
9c07e4f
Fix wrong merge conflict resolution
RobinTF Jul 24, 2024
1da9b17
Use `#pragma once`
RobinTF Jul 24, 2024
c6aa641
Simplify caching structure to fix bugs
RobinTF Jul 25, 2024
838a97e
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Jul 31, 2024
d74b89d
Fix size calculation
RobinTF Jul 31, 2024
bfc9d8f
Remove complicated atomic mutex check mechanism
RobinTF Aug 1, 2024
592c670
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 1, 2024
564d54b
Avoid copy
RobinTF Aug 1, 2024
d566d5b
Use timer class to simplify calls
RobinTF Aug 1, 2024
a164ef4
Start adding some documentation
RobinTF Aug 1, 2024
f9f22f4
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 1, 2024
23713cc
Unlock before notifying
RobinTF Aug 1, 2024
c56dd82
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 2, 2024
603a48b
Adjust Cache to simplify code a bit
RobinTF Aug 2, 2024
389a9bf
Merge branch 'helper-master' into refactor-result-table
RobinTF Aug 7, 2024
934015c
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 7, 2024
c7d58a4
Rework code (again) so that generator does not get cached
RobinTF Aug 7, 2024
4b93109
Make diff smaller and add some more tests
RobinTF Aug 7, 2024
7294b10
Fix problems with `RuntimeInformation`
RobinTF Aug 13, 2024
3b9fee8
Simplify timing calculation
RobinTF Aug 13, 2024
8acb05a
Fix sonarqube issues
RobinTF Aug 13, 2024
3ee5510
Use const ref
RobinTF Aug 13, 2024
72d1033
Apply microoptimization with mutex
RobinTF Aug 13, 2024
767a6cb
Another correction for reported result sizes
RobinTF Aug 14, 2024
8c834da
Rename member function
RobinTF Aug 14, 2024
4555dab
compute value directly after waiting without success
RobinTF Aug 14, 2024
d8f7fc1
Check size before aggregating id tables
RobinTF Aug 14, 2024
bbe10ef
Merge Result and ProtoResult back together
RobinTF Aug 14, 2024
345d79c
Re-arrange functions to make diff smaller
RobinTF Aug 14, 2024
a101433
Fix bugs with limit and optimize iteration
RobinTF Aug 15, 2024
b390f4f
Use higher precision timing for `RuntimeInformation`
RobinTF Aug 15, 2024
ce62baa
Incorporate PR comments for `CacheableGenerator` and `ConcurrentCache`
RobinTF Aug 15, 2024
9755da8
Address more PR comments
RobinTF Aug 15, 2024
c0ef64b
Fix unused warnings
RobinTF Aug 15, 2024
44d103f
Fix flickering timing information
RobinTF Aug 15, 2024
4a3f09b
Actually fix the timing issue
RobinTF Aug 15, 2024
0164d9c
Small improvements to query updates
RobinTF Aug 15, 2024
f23849c
Add a lot of documentation
RobinTF Aug 15, 2024
3c69ebf
Rename some functions
RobinTF Aug 15, 2024
383696b
Fix spelling error
RobinTF Aug 15, 2024
d89e896
Fix build on macOS
RobinTF Aug 15, 2024
b5dd60c
Address even more PR comments
RobinTF Aug 16, 2024
4f71329
Add Unit tests for edge case
RobinTF Aug 16, 2024
21beae5
Fix sortedBy and is defined check for multiple `IdTable`s
RobinTF Aug 16, 2024
eadbc01
Consistent separator comment length
RobinTF Aug 16, 2024
cb86d14
Reorder function to fix build
RobinTF Aug 16, 2024
f0434b3
Add unit tests for newly added `ConcurrentCache` features
RobinTF Aug 17, 2024
d181a9f
Add unit tests for lazy index scans
RobinTF Aug 17, 2024
68dcb6f
Fix wrong filter calculations
RobinTF Aug 18, 2024
6c785b2
Add tests for new `Operation` functionality
RobinTF Aug 18, 2024
ef0f872
Rename variable and fix functionality
RobinTF Aug 19, 2024
77b6f58
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 19, 2024
dff4910
Add unit tests for `Result` class
RobinTF Aug 19, 2024
a431021
Add tests for more coverage and skip tests relying on expensive checks
RobinTF Aug 19, 2024
476a691
Add test for edge case and skip test on expensive checks disabled
RobinTF Aug 19, 2024
b8b03f9
Fix sonarcloud issues
RobinTF Aug 20, 2024
1d83e86
Adjust limit values for better coverage
RobinTF Aug 20, 2024
538814f
Address PR comments
RobinTF Aug 20, 2024
d7fcc98
Reduce code duplication in tests
RobinTF Aug 21, 2024
a471bcf
More PR comments
RobinTF Aug 21, 2024
226b577
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 21, 2024
0437773
Merge remote-tracking branch 'ad-freiburg/master' into refactor-resul…
RobinTF Aug 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
303 changes: 169 additions & 134 deletions src/engine/ExportQueryExecutionTrees.cpp

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions src/engine/ExportQueryExecutionTrees.h
Original file line number Diff line number Diff line change
Expand Up @@ -177,4 +177,34 @@ class ExportQueryExecutionTrees {
const QueryExecutionTree& qet,
const parsedQuery::SelectClause& selectClause,
LimitOffsetClause limitAndOffset, CancellationHandle cancellationHandle);

// Helper type that contains an `IdTable` and a view with related indices to
// access the `IdTable` with.
struct TableWithRange {
const IdTable& idTable_;
std::ranges::iota_view<uint64_t, uint64_t> view_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the original absolute indices into the IdTable?
Otherwise you could return a single
std::ranges::subrange<const IdTable&>
and all your loops become singular again, and all the sonarcloud stuff goes away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the values as well as a reference to the original IdTable to fill in the ConstructQueryExportContext

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible however, to refactor ConstructQueryExportContext to get some row reference instead, but this might be better suited for a follow-up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it it doesn't solve the original issue. yielding subranges still requires 2 loops

};

// Yield all `IdTables` provided by the given `result`.
static cppcoro::generator<const IdTable&> getIdTables(const Result& result);

// Return a range that contains the indices of the rows that have to be
// exported from the `idTable` given the `LimitOffsetClause`. It takes into
// account the LIMIT, the OFFSET, and the actual size of the `idTable`
static cppcoro::generator<TableWithRange> getRowIndices(
LimitOffsetClause limitOffset, const Result& result);

FRIEND_TEST(ExportQueryExecutionTrees, getIdTablesReturnsSingletonIterator);
FRIEND_TEST(ExportQueryExecutionTrees, getIdTablesMirrorsGenerator);
FRIEND_TEST(ExportQueryExecutionTrees, ensureCorrectSlicingOfSingleIdTable);
FRIEND_TEST(ExportQueryExecutionTrees,
ensureCorrectSlicingOfIdTablesWhenFirstIsSkipped);
FRIEND_TEST(ExportQueryExecutionTrees,
ensureCorrectSlicingOfIdTablesWhenLastIsSkipped);
FRIEND_TEST(ExportQueryExecutionTrees,
ensureCorrectSlicingOfIdTablesWhenFirstAndSecondArePartial);
FRIEND_TEST(ExportQueryExecutionTrees,
ensureCorrectSlicingOfIdTablesWhenFirstAndLastArePartial);
FRIEND_TEST(ExportQueryExecutionTrees,
ensureGeneratorIsNotConsumedWhenNotRequired);
};
62 changes: 40 additions & 22 deletions src/engine/Filter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,41 +43,59 @@ string Filter::getDescriptor() const {
}

// _____________________________________________________________________________
ProtoResult Filter::computeResult([[maybe_unused]] bool requestLaziness) {
ProtoResult Filter::computeResult(bool requestLaziness) {
LOG(DEBUG) << "Getting sub-result for Filter result computation..." << endl;
std::shared_ptr<const Result> subRes = _subtree->getResult();
std::shared_ptr<const Result> subRes = _subtree->getResult(requestLaziness);
LOG(DEBUG) << "Filter result computation..." << endl;
checkCancellation();

IdTable idTable{getExecutionContext()->getAllocator()};
idTable.setNumColumns(subRes->idTable().numColumns());
if (subRes->isFullyMaterialized()) {
IdTable result = filterIdTable(subRes, subRes->idTable());
LOG(DEBUG) << "Filter result computation done." << endl;

return {std::move(result), resultSortedOn(), subRes->getSharedLocalVocab()};
}
auto localVocab = subRes->getSharedLocalVocab();
return {[](auto subRes, auto* self) -> cppcoro::generator<IdTable> {
for (IdTable& idTable : subRes->idTables()) {
IdTable result = self->filterIdTable(subRes, idTable);
co_yield result;
}
}(std::move(subRes), this),
resultSortedOn(), std::move(localVocab)};
}

size_t width = idTable.numColumns();
CALL_FIXED_SIZE(width, &Filter::computeFilterImpl, this, &idTable, *subRes);
LOG(DEBUG) << "Filter result computation done." << endl;
checkCancellation();
// _____________________________________________________________________________
IdTable Filter::filterIdTable(const std::shared_ptr<const Result>& subRes,
const IdTable& idTable) {
sparqlExpression::EvaluationContext evaluationContext(
*getExecutionContext(), _subtree->getVariableColumns(), idTable,
getExecutionContext()->getAllocator(), subRes->localVocab(),
cancellationHandle_, deadline_);

return {std::move(idTable), resultSortedOn(), subRes->getSharedLocalVocab()};
// TODO<joka921> This should be a mandatory argument to the
// EvaluationContext constructor.
evaluationContext._columnsByWhichResultIsSorted = subRes->sortedBy();

size_t width = evaluationContext._inputTable.numColumns();
IdTable result = CALL_FIXED_SIZE(width, &Filter::computeFilterImpl, this,
evaluationContext);
checkCancellation();
return result;
}

// _____________________________________________________________________________
template <size_t WIDTH>
void Filter::computeFilterImpl(IdTable* outputIdTable,
const Result& inputResultTable) {
sparqlExpression::EvaluationContext evaluationContext(
*getExecutionContext(), _subtree->getVariableColumns(),
inputResultTable.idTable(), getExecutionContext()->getAllocator(),
inputResultTable.localVocab(), cancellationHandle_, deadline_);

// TODO<joka921> This should be a mandatory argument to the EvaluationContext
// constructor.
evaluationContext._columnsByWhichResultIsSorted = inputResultTable.sortedBy();
IdTable Filter::computeFilterImpl(
sparqlExpression::EvaluationContext& evaluationContext) {
IdTable idTable{getExecutionContext()->getAllocator()};
idTable.setNumColumns(evaluationContext._inputTable.numColumns());

sparqlExpression::ExpressionResult expressionResult =
_expression.getPimpl()->evaluate(&evaluationContext);

const auto input = inputResultTable.idTable().asStaticView<WIDTH>();
auto output = std::move(*outputIdTable).toStatic<WIDTH>();
const auto input = evaluationContext._inputTable.asStaticView<WIDTH>();
auto output = std::move(idTable).toStatic<WIDTH>();
// Clang 17 seems to incorrectly deduce the type, so try to trick it
std::remove_const_t<decltype(output)>& output2 = output;

Expand Down Expand Up @@ -123,7 +141,7 @@ void Filter::computeFilterImpl(IdTable* outputIdTable,

std::visit(visitor, std::move(expressionResult));

*outputIdTable = std::move(output).toDynamic();
return std::move(output).toDynamic();
}

// _____________________________________________________________________________
Expand Down
12 changes: 9 additions & 3 deletions src/engine/Filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,15 @@ class Filter : public Operation {
return _subtree->getVariableColumns();
}

ProtoResult computeResult([[maybe_unused]] bool requestLaziness) override;
ProtoResult computeResult(bool requestLaziness) override;

// Perform the actual filter operation of the data provided by
// `evaluationContext`.
template <size_t WIDTH>
void computeFilterImpl(IdTable* outputIdTable,
const Result& inputResultTable);
IdTable computeFilterImpl(
sparqlExpression::EvaluationContext& evaluationContext);

// Run `computeFilterImpl` on the provided IdTable
IdTable filterIdTable(const std::shared_ptr<const Result>& subRes,
const IdTable& idTable);
};
21 changes: 20 additions & 1 deletion src/engine/IndexScan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,28 @@ VariableToColumnMap IndexScan::computeVariableToColumnMap() const {
std::ranges::for_each(additionalVariables_, addCol);
return variableToColumnMap;
}

// _____________________________________________________________________________
ProtoResult IndexScan::computeResult([[maybe_unused]] bool requestLaziness) {
cppcoro::generator<IdTable> IndexScan::scanInChunks() const {
auto metadata = getMetadataForScan(*this);
if (!metadata.has_value()) {
co_return;
}
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
auto blocksSpan =
CompressedRelationReader::getBlocksFromMetadata(metadata.value());
std::vector<CompressedBlockMetadata> blocks{blocksSpan.begin(),
blocksSpan.end()};
for (IdTable& idTable : getLazyScan(*this, std::move(blocks))) {
co_yield std::move(idTable);
}
}

// _____________________________________________________________________________
ProtoResult IndexScan::computeResult(bool requestLaziness) {
LOG(DEBUG) << "IndexScan result computation...\n";
if (requestLaziness) {
return {scanInChunks(), resultSortedOn(), LocalVocab{}};
}
IdTable idTable{getExecutionContext()->getAllocator()};

using enum Permutation::Enum;
Expand Down
4 changes: 3 additions & 1 deletion src/engine/IndexScan.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ class IndexScan final : public Operation {
ScanSpecificationAsTripleComponent getScanSpecification() const;

private:
ProtoResult computeResult([[maybe_unused]] bool requestLaziness) override;
ProtoResult computeResult(bool requestLaziness) override;

vector<QueryExecutionTree*> getChildren() override { return {}; }

Expand All @@ -115,6 +115,8 @@ class IndexScan final : public Operation {

VariableToColumnMap computeVariableToColumnMap() const override;

cppcoro::generator<IdTable> scanInChunks() const;

// Helper functions for the public `getLazyScanFor...` functions (see above).
static Permutation::IdTableGenerator getLazyScan(
const IndexScan& s, std::vector<CompressedBlockMetadata> blocks);
Expand Down
Loading
Loading