Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy materialization of index scans #1003

Merged
merged 179 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
179 commits
Select commit Hold shift + click to select a range
84070c9
A function to merge a join column with a block.
joka921 Apr 25, 2023
06225fb
Continued the refactoring.
joka921 Apr 25, 2023
5e5352b
Implemented a (serial)
joka921 Apr 25, 2023
620231a
In the middle of something not yet working.
joka921 Apr 28, 2023
f2bc0ce
In the middle of everything, first clean up the permutation business.
joka921 May 4, 2023
efe72a7
Changing everything to a single class of MetaData.
joka921 May 4, 2023
6ffc83e
Also delete some additional unused Code.
joka921 May 4, 2023
975c59a
Completely removed the
joka921 May 4, 2023
31883bc
Moved many functions from IndexImpl.h to IndexImpl.cpp
joka921 May 4, 2023
e8388e6
Added some unit tests.
joka921 May 4, 2023
c86d7d0
Changed the members from
joka921 May 4, 2023
6079437
Fixed some code smells.
joka921 May 4, 2023
cacf92d
Merge branch 'no_templated_permutations' into join_with_scan
joka921 May 4, 2023
75897aa
Continuing after merge.
joka921 May 4, 2023
16b45a5
Implemented the IndexScan::lazyScansForJoin method.
joka921 May 4, 2023
b7ee48d
Removed some code smells.
joka921 May 5, 2023
a25914b
Started to write some tests.
joka921 May 5, 2023
1c6db64
Merge branch 'master' into no_templated_permutations
joka921 May 5, 2023
e87b15c
Added some unit tests and
joka921 May 5, 2023
a5aae12
Merge branch 'master' into no_templated_permutations
joka921 May 5, 2023
f0cbb21
Merge in the updated master.
joka921 May 5, 2023
6509d53
Merge branch 'no_templated_permutations' into join_with_scan
joka921 May 5, 2023
a5bd352
In the middle of figuring stuff out.
joka921 May 5, 2023
2a42cfa
I am beginning to understand
joka921 May 25, 2023
3ae1055
Merge branch 'master' into join_with_scan
joka921 May 26, 2023
96c3829
Fixed the merge and compilation and tests
joka921 May 26, 2023
b138441
In the middle of everything
joka921 May 28, 2023
d1de8ab
Continuing some work on this.
joka921 Jun 9, 2023
d70fe89
A first draft of the
joka921 Jun 12, 2023
ecde3a6
The code now passes some unit tests.
joka921 Jun 12, 2023
79f291b
Fixed some more unit tests.
joka921 Jun 12, 2023
1ba3c79
More unit tests run through, but there's still some stuff that I don'…
joka921 Jun 12, 2023
712eb52
Delete a comment.
joka921 Jun 12, 2023
8df3277
The `getBlocksForJoin` method is semantically wrong.
joka921 Jun 13, 2023
76f7320
Currently not yet working, but we are getting there.
joka921 Jun 13, 2023
3b9bbee
The first version seems to work (joining two index scans)
joka921 Jun 14, 2023
c171d6d
The tests compile again, but they fail. TODO<joka921> handle that stuff.
joka921 Jun 14, 2023
f826526
Only unimportant unit tests fail now.
joka921 Jun 14, 2023
8928684
Started to cleanup the central routine.
joka921 Jun 14, 2023
551d567
Several additional cleanups.
joka921 Jun 14, 2023
8006784
Completely cleaned up the JoinAlgorithms.h and the corresponding tests.
joka921 Jun 15, 2023
f53b49f
Merge branch 'master' into join_with_scan
joka921 Jun 15, 2023
c98c5b3
It compiles and the tests run.
joka921 Jun 15, 2023
520c98f
Several simplifications and moved a lot of stuff into the Permutation…
joka921 Jun 15, 2023
83aec3b
A lot of additional cleanups.
joka921 Jun 15, 2023
4afd602
Addressed several sonar and other issues.
joka921 Jun 15, 2023
1ca7406
Start the threadsafe ordered queue and don't materialize unneeded res…
joka921 Jun 15, 2023
5abe837
Fix the merge
joka921 Jun 16, 2023
f816eab
Added declaration of needed function
joka921 Jun 19, 2023
a615359
Merge branch 'master' into only_small_scans_when_query_planning
joka921 Jun 19, 2023
c3ab80a
Merge branch 'refactor_index_scan' into only_small_scans_when_query_p…
joka921 Jun 19, 2023
7284272
Do not read the index scans during query planning.
joka921 Jun 19, 2023
cd9f98c
Remove some code smells and eliminate some code duplication.
joka921 Jun 19, 2023
7b59e3f
Fixed the format.
joka921 Jun 19, 2023
32601bc
Merge branch 'master' into refactor_index_scan
joka921 Jun 19, 2023
feb0117
Changes from a review with Hannah.
joka921 Jun 19, 2023
2cedbc9
Merge branch 'refactor_index_scan' into only_small_scans_when_query_p…
joka921 Jun 19, 2023
b76d825
Merged in the approved version of the other PR.
joka921 Jun 19, 2023
25538af
Merge branch 'master' into only_small_scans_when_query_planning
joka921 Jun 20, 2023
c621c7d
Merge branch 'only_small_scans_when_query_planning' into join_with_scan
joka921 Jun 20, 2023
9a30097
Merged in the lattest other PR.
joka921 Jun 20, 2023
a40bab6
Clang format.
joka921 Jun 20, 2023
dffe587
Very minimal changes.
joka921 Jun 20, 2023
c43f66a
Fix two subtle bugs.
joka921 Jun 21, 2023
b72666e
Parallel reading of the blocks.
joka921 Jun 21, 2023
245a87a
Merge branch 'master' into join_with_scan
joka921 Jun 21, 2023
e67d818
Merged in the master and formatted everything.
joka921 Jun 21, 2023
557e719
Added tests for the threadsafe queue.
joka921 Jun 21, 2023
a10c84d
Added tests for the threadsafe queue and fixed the tests for now beca…
joka921 Jun 21, 2023
c07e8d4
Fixed warnings
joka921 Jun 21, 2023
92ac4d9
Fixed several bugs, this should now work.
joka921 Jun 22, 2023
35bbf4f
Commented the tests for the threadsafe queue.
joka921 Jun 22, 2023
79ad8ca
Fixed some tests
joka921 Jun 26, 2023
82adab5
Implement an ordered threadsafe queue and test it.
joka921 Jun 26, 2023
00ea579
Got rid of some code smells in the threadsafe queue.
joka921 Jun 26, 2023
723c043
Got rid of a lot of code duplication
joka921 Jun 26, 2023
0846369
Use 10 blocks.
joka921 Jun 26, 2023
31e2c50
Make the tests more robust for the threadsafe queue.
joka921 Jun 26, 2023
ce58398
Make the tests pass consistently.
joka921 Jun 26, 2023
519d797
Small changes
joka921 Jun 26, 2023
b6e94ee
Fixed the error from yesterday.
joka921 Jun 27, 2023
736b490
Merge branch 'master' into join_with_scan
joka921 Jun 27, 2023
b164415
Simplified the interface.
joka921 Jun 27, 2023
3e2a00f
Make it possible to only get the result from an operation if it can b…
joka921 Jun 27, 2023
64b6652
Add comments.
joka921 Jun 27, 2023
54e4444
Fix the build.
joka921 Jun 27, 2023
50741d0
Hopefully fix a bug on mac, and addressed some code smells.
joka921 Jun 27, 2023
cf94ba4
Make the tests more robust.
joka921 Jun 27, 2023
0453f07
Merge branch 'get_only_cached_result' into join_with_scan
joka921 Jun 27, 2023
66ff69e
Do not use lazy scans when the full result of the scans is already ca…
joka921 Jun 27, 2023
ccff07b
Add some initial tests for the so far untested stuff.
joka921 Jun 27, 2023
a494db2
Some initial unit tests.
joka921 Jun 27, 2023
536de70
Last changes from a review.
joka921 Jun 27, 2023
97ba0f9
Last rename to finish.
joka921 Jun 27, 2023
7148fab
Merge branch 'master' into join_with_scan
joka921 Jun 28, 2023
05a1ee1
Fixed some code smells and removed some code duplication.
joka921 Jun 28, 2023
f983e79
Added comments and several refactorings.
joka921 Jun 28, 2023
07891b8
Some refactorings in the CompresseddRelation.cpp file that reduce the…
joka921 Jun 28, 2023
c367566
Several further refactorings.
joka921 Jun 28, 2023
a6d8d54
Added comments etc.
joka921 Jun 28, 2023
213cf8b
Not much done yet.
joka921 Jun 28, 2023
ce774c7
Merge branch 'master' into join_with_scan
joka921 Jun 28, 2023
e5f0e4c
This doesn't work, find the bug.
joka921 Jun 29, 2023
322423c
Several refactorings that fortunately work again.
joka921 Jun 29, 2023
207fff0
Almost all scans return IdTable now.
joka921 Jun 29, 2023
6eb0b68
Changes from Hannah with review.
joka921 Jun 29, 2023
0c906a6
Several additional improvementts.
joka921 Jun 30, 2023
5fa348c
Changes from a round of reviews with Hannah.
joka921 Jun 30, 2023
c6104a5
Extend the `generator` class by the functionality to add details.
joka921 Jun 30, 2023
b64da37
Merge branch 'generators_with_additional_info' into join_with_scan
joka921 Jun 30, 2023
dcbae54
Add additional information.
joka921 Jun 30, 2023
0a97511
Add additional details and size information.
joka921 Jun 30, 2023
0e8d2bb
Changed the interface and the bugs.
joka921 Jul 3, 2023
8a86653
Started to do stuff for perfomannce, currently not yet done.
joka921 Jul 3, 2023
6aaa445
Fix the unit tests.
joka921 Jul 3, 2023
e45d064
Still in the middle of anything.
joka921 Jul 3, 2023
34d4a21
Small changes from a review with Hannahh.
joka921 Jul 4, 2023
d0d1110
Merge branch 'master' into join_with_scan
joka921 Jul 4, 2023
461c92c
Several performance improvements.
joka921 Jul 4, 2023
5d51c24
Improved some infrastructure and added missing runtime information.
joka921 Jul 5, 2023
6168e39
Added an empty details class as the default.
joka921 Jul 5, 2023
2ac317f
Merge remote-tracking branch 'origin/master' into join_with_scan
Jul 5, 2023
f6fff69
Merged in hannah's merge.
joka921 Jul 5, 2023
99f219f
Fix the unit tests.
joka921 Jul 3, 2023
fc08333
Small changes from a review with Hannahh.
joka921 Jul 4, 2023
63ff2f1
Added an empty details class as the default.
joka921 Jul 5, 2023
a4e5db2
Several additional cleanups.
joka921 Jul 5, 2023
180456a
Better comments.
joka921 Jul 5, 2023
16fdc9a
Add the possibility to register an external details object.
joka921 Jul 5, 2023
784ce8a
Merge branch 'generators_with_additional_info' into join_with_scan
joka921 Jul 5, 2023
907992a
Several smaller cleanups and fixed as possible source of the ubsan un…
joka921 Jul 5, 2023
ceee114
Add unit tests for the `getBlocksForJoin` function.
joka921 Jul 5, 2023
9f71a65
A stub for unit tests.
joka921 Jul 5, 2023
25b3249
Tiny changes from a review.
joka921 Jul 5, 2023
79a5b97
Some changes from a review.
joka921 Jul 5, 2023
33d9b14
Changes from review.
joka921 Jul 5, 2023
6a7e901
Try of fix
joka921 Jul 5, 2023
461f594
Make the blocksize of the Permutations configurable via the `Index` c…
joka921 Jul 6, 2023
0205654
Merge branch 'configurable_block_size' into join_with_scan
joka921 Jul 6, 2023
f72c972
Started to write tests for the index scan class.
joka921 Jul 6, 2023
f4b2d1c
Add unit tests for the changes in the `IndexScan`
joka921 Jul 6, 2023
2bb795c
Only the changes for the `AddCombinedRowToTable` etc, without the gen…
joka921 Jul 6, 2023
88c6f3a
Fixed a bug in the generator.
joka921 Jul 6, 2023
5480f6b
Some initial tests for the new methods of the Join class.
joka921 Jul 6, 2023
0b47bc0
Hopefully make the tests pass on MacOS without introducing deadlocks.
joka921 Jul 7, 2023
033c73f
Several additional changes, excluding the ones for updating the status.
joka921 Jul 7, 2023
081b076
Include functionality for explicitly setting the status of an operati…
joka921 Jul 7, 2023
2406aec
Actually use the new status.
joka921 Jul 7, 2023
ad91a03
Safer finishing of the Threadsafe queue.
joka921 Jul 7, 2023
f070a7c
Some improvements, we mostly are lacking comments and a few tests.
joka921 Jul 7, 2023
3e32390
Commented the test for the`IndexScan` class
joka921 Jul 7, 2023
0706a9f
Better stuff.
joka921 Jul 7, 2023
9aa5a5c
Merge branch 'master' into join_with_scan
joka921 Jul 7, 2023
93882ff
Merged in the master and several changes from a review.
joka921 Jul 7, 2023
3837029
Small change from a review.
joka921 Jul 7, 2023
6adc20a
Clang format.
joka921 Jul 7, 2023
ed20f3a
Merge branch 'master' into join_with_scan
joka921 Jul 7, 2023
8b0ac05
Fixed the merge.
joka921 Jul 7, 2023
5f7c172
Fixed the merge.
joka921 Jul 7, 2023
7b256fd
Make the lazy joins with very small
joka921 Jul 10, 2023
30835cc
Materialize small-ish scans
joka921 Jul 10, 2023
0030114
Several improvements and comments.
joka921 Jul 10, 2023
aed7ea0
Use supported comparison.
joka921 Jul 10, 2023
944ed2a
formatted this
joka921 Jul 10, 2023
b620bff
Small changes from a review with Hannah.
joka921 Jul 10, 2023
f82ad65
Better stuff.
joka921 Jul 10, 2023
94cb174
Improve test coverage.
joka921 Jul 11, 2023
54ace64
Implement a useful helper.
joka921 Jul 11, 2023
ef58a42
Merge branch 'threadsafe_ordered_queue' into join_with_scan
joka921 Jul 11, 2023
d57319a
Several improvements of the test coverage. This should be it (once we…
joka921 Jul 11, 2023
c4fa334
Fixed the test.
joka921 Jul 11, 2023
9c8ebcd
Add a simple coroutine that makes working with the quees much simpler.
joka921 Jul 11, 2023
bb6dec0
Merge branch 'noexcept_finish_threadsafe_queue' into join_with_scan
joka921 Jul 11, 2023
65790fc
Merged in the current threadsafe queue, and
joka921 Jul 11, 2023
e9d427b
Merge branch 'master' into join_with_scan
joka921 Jul 11, 2023
dc478dd
Merge branch 'master' into join_with_scan
joka921 Jul 11, 2023
4be2f92
Small changes from a review with Hannah.
joka921 Jul 11, 2023
bb2dbde
Fix a possible race condition in the case that multiple threads are p…
joka921 Jul 12, 2023
2519a74
Fix clang format.
joka921 Jul 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 71 additions & 20 deletions src/engine/AddCombinedRowToTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@
namespace ad_utility {
// This class handles the efficient writing of the results of a JOIN operation
// to a column-based `IdTable`. The underlying assumption is that in both inputs
// the join columns are the first columns.
// the join columns are the first columns. On each call to `addRow`, we only
// store the indices of the matching rows. When a certain buffer size
// (configurable, default value 100'000) is reached, the results are actually
// written to the table.
class AddCombinedRowToIdTable {
std::vector<size_t> numUndefinedPerColumn_;
size_t numJoinColumns_;
IdTableView<0> inputLeft_;
IdTableView<0> inputRight_;
std::optional<std::array<IdTableView<0>, 2>> inputs_;
IdTable resultTable_;

// This struct stores the information, which row indices from the input are
Expand Down Expand Up @@ -58,23 +60,27 @@ class AddCombinedRowToIdTable {
public:
// Construct from the number of join columns, the two inputs, and the output.
// The `bufferSize` can be configured for testing.
explicit AddCombinedRowToIdTable(size_t numJoinColumns,
const IdTableView<0>& input1,
const IdTableView<0>& input2, IdTable output,
explicit AddCombinedRowToIdTable(size_t numJoinColumns, IdTableView<0> input1,
IdTableView<0> input2, IdTable output,
size_t bufferSize = 100'000)
: numUndefinedPerColumn_(output.numColumns()),
numJoinColumns_{numJoinColumns},
inputLeft_{input1},
inputRight_{input2},
inputs_{std::array{std::move(input1), std::move(input2)}},
resultTable_{std::move(output)},
bufferSize_{bufferSize} {
AD_CORRECTNESS_CHECK(resultTable_.numColumns() == input1.numColumns() +
input2.numColumns() -
numJoinColumns);
AD_CORRECTNESS_CHECK(input1.numColumns() >= numJoinColumns &&
input2.numColumns() >= numJoinColumns);
AD_CORRECTNESS_CHECK(resultTable_.empty());
checkNumColumns();
}
// Similar to the previous constructor, but the inputs are not given.
// This means that the inputs have to be set to an explicit
// call to `setInput` before adding rows. This is used for the lazy join
// operations (see Join.cpp) where the input changes over time.
explicit AddCombinedRowToIdTable(size_t numJoinColumns, IdTable output,
size_t bufferSize = 100'000)
: numUndefinedPerColumn_(output.numColumns()),
numJoinColumns_{numJoinColumns},
inputs_{std::nullopt},
resultTable_{std::move(output)},
bufferSize_{bufferSize} {}

// Return the number of UNDEF values per column.
const std::vector<size_t>& numUndefinedPerColumn() {
Expand All @@ -85,6 +91,7 @@ class AddCombinedRowToIdTable {
// The next free row in the output will be created from
// `inputLeft_[rowIndexA]` and `inputRight_[rowIndexB]`.
void addRow(size_t rowIndexA, size_t rowIndexB) {
AD_EXPENSIVE_CHECK(inputs_.has_value());
indexBuffer_.push_back(
TargetIndexAndRowIndices{nextIndex_, {rowIndexA, rowIndexB}});
++nextIndex_;
Expand All @@ -93,10 +100,33 @@ class AddCombinedRowToIdTable {
}
}

// Set or reset the input. All following calls to `addRow` then refer to
// indices in the new input. Before resetting, `flush()` is called, so all the
// rows from the previous inputs get materialized before deleting the old
// inputs. The arguments to `inputLeft` and `inputRight` can either be
// `IdTable` or `IdTableView<0>`, or any other type that has a
// `asStaticView<0>` method that returns an `IdTableView<0>`.
void setInput(const auto& inputLeft, const auto& inputRight) {
auto toView = []<typename T>(const T& table) {
if constexpr (requires { table.template asStaticView<0>(); }) {
return table.template asStaticView<0>();
} else {
return table;
}
};
if (nextIndex_ != 0) {
AD_CORRECTNESS_CHECK(inputs_.has_value());
flush();
}
inputs_ = std::array{toView(inputLeft), toView(inputRight)};
checkNumColumns();
}

// The next free row in the output will be created from
// `inputLeft_[rowIndexA]`. The columns from `inputRight_` will all be set to
// UNDEF
void addOptionalRow(size_t rowIndexA) {
AD_EXPENSIVE_CHECK(inputs_.has_value());
optionalIndexBuffer_.push_back(
TargetIndexAndRowIndex{nextIndex_, rowIndexA});
++nextIndex_;
Expand Down Expand Up @@ -129,6 +159,16 @@ class AddCombinedRowToIdTable {
size_t oldSize = result.size();
AD_CORRECTNESS_CHECK(nextIndex_ ==
indexBuffer_.size() + optionalIndexBuffer_.size());
// Sometimes the left input and right input are not valid anymore, because
// the `IdTable`s they point to have already been destroyed. This case is
// okay, as long as there was a manual call to `flush` (after which
// `nextIndex_ == 0`) before the inputs went out of scope. However, the call
// to `resultTable()` will still unconditionally flush. The following check
// makes this behavior defined.
if (nextIndex_ == 0) {
return;
}
AD_CORRECTNESS_CHECK(inputs_.has_value());
result.resize(oldSize + nextIndex_);

// Sometimes columns are combined where one value is UNDEF and the other one
Expand All @@ -147,8 +187,8 @@ class AddCombinedRowToIdTable {
// `nextResultColIdx`-th column of the result.
auto writeJoinColumn = [&result, &mergeWithUndefined, oldSize, this](
size_t colIdx, size_t resultColIdx) {
const auto& colLeft = inputLeft_.getColumn(colIdx);
const auto& colRight = inputRight_.getColumn(colIdx);
const auto& colLeft = inputLeft().getColumn(colIdx);
const auto& colRight = inputRight().getColumn(colIdx);
// TODO<joka921> Implement prefetching.
decltype(auto) resultCol = result.getColumn(resultColIdx);
size_t& numUndef = numUndefinedPerColumn_.at(resultColIdx);
Expand Down Expand Up @@ -178,8 +218,8 @@ class AddCombinedRowToIdTable {
// code that was very hard to read for humans.
auto writeNonJoinColumn = [&result, oldSize, this]<bool isColFromLeft>(
size_t colIdx, size_t resultColIdx) {
decltype(auto) col = isColFromLeft ? inputLeft_.getColumn(colIdx)
: inputRight_.getColumn(colIdx);
decltype(auto) col = isColFromLeft ? inputLeft().getColumn(colIdx)
: inputRight().getColumn(colIdx);
// TODO<joka921> Implement prefetching.
decltype(auto) resultCol = result.getColumn(resultColIdx);
size_t& numUndef = numUndefinedPerColumn_.at(resultColIdx);
Expand Down Expand Up @@ -217,13 +257,13 @@ class AddCombinedRowToIdTable {
}

// Then the remaining columns from the first input.
for (size_t col = numJoinColumns_; col < inputLeft_.numColumns(); ++col) {
for (size_t col = numJoinColumns_; col < inputLeft().numColumns(); ++col) {
writeNonJoinColumn.operator()<true>(col, nextResultColIdx);
++nextResultColIdx;
}

// Then the remaining columns from the second input.
for (size_t col = numJoinColumns_; col < inputRight_.numColumns(); col++) {
for (size_t col = numJoinColumns_; col < inputRight().numColumns(); col++) {
writeNonJoinColumn.operator()<false>(col, nextResultColIdx);
++nextResultColIdx;
}
Expand All @@ -232,5 +272,16 @@ class AddCombinedRowToIdTable {
optionalIndexBuffer_.clear();
nextIndex_ = 0;
}
const IdTableView<0>& inputLeft() const { return inputs_.value()[0]; }

const IdTableView<0>& inputRight() const { return inputs_.value()[1]; }

void checkNumColumns() const {
AD_CONTRACT_CHECK(inputLeft().numColumns() >= numJoinColumns_);
AD_CONTRACT_CHECK(inputRight().numColumns() >= numJoinColumns_);
AD_CONTRACT_CHECK(resultTable_.numColumns() ==
inputLeft().numColumns() + inputRight().numColumns() -
numJoinColumns_);
}
};
} // namespace ad_utility
95 changes: 89 additions & 6 deletions src/engine/IndexScan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,16 @@ ResultTable IndexScan::computeResult() {
const auto& index = _executionContext->getIndex();
const auto permutedTriple = getPermutedTriple();
if (numVariables_ == 2) {
index.scan(*permutedTriple[0], &idTable, permutation_, _timeoutTimer);
idTable = index.scan(*permutedTriple[0], std::nullopt, permutation_,
_timeoutTimer);
} else if (numVariables_ == 1) {
index.scan(*permutedTriple[0], *permutedTriple[1], &idTable, permutation_,
_timeoutTimer);
idTable = index.scan(*permutedTriple[0], *permutedTriple[1], permutation_,
_timeoutTimer);
} else {
AD_CORRECTNESS_CHECK(numVariables_ == 3);
computeFullScan(&idTable, permutation_);
}
AD_CORRECTNESS_CHECK(idTable.numColumns() == numVariables_);
LOG(DEBUG) << "IndexScan result computation done.\n";

return {std::move(idTable), resultSortedOn(), LocalVocab{}};
Expand Down Expand Up @@ -257,9 +259,8 @@ void IndexScan::computeFullScan(IdTable* result,
size_t i = 0;
const auto& permutationImpl =
getExecutionContext()->getIndex().getImpl().getPermutation(permutation);
auto triplesView =
TriplesView(permutationImpl, getExecutionContext()->getAllocator(),
ignoredRanges, isTripleIgnored);
auto triplesView = TriplesView(permutationImpl, ignoredRanges,
isTripleIgnored, _timeoutTimer);
for (const auto& triple : triplesView) {
if (i >= resultSize) {
break;
Expand All @@ -278,3 +279,85 @@ std::array<const TripleComponent* const, 3> IndexScan::getPermutedTriple()
return {triple[permutation[0]], triple[permutation[1]],
triple[permutation[2]]};
}

// ___________________________________________________________________________
Permutation::IdTableGenerator IndexScan::getLazyScan(
const IndexScan& s, std::vector<CompressedBlockMetadata> blocks) {
const IndexImpl& index = s.getIndex().getImpl();
Id col0Id = s.getPermutedTriple()[0]->toValueId(index.getVocab()).value();
std::optional<Id> col1Id;
if (s.numVariables_ == 1) {
col1Id = s.getPermutedTriple()[1]->toValueId(index.getVocab()).value();
}
return index.getPermutation(s.permutation())
.lazyScan(col0Id, col1Id, std::move(blocks), s._timeoutTimer);
};

// ________________________________________________________________
std::optional<Permutation::MetadataAndBlocks> IndexScan::getMetadataForScan(
const IndexScan& s) {
auto permutedTriple = s.getPermutedTriple();
const IndexImpl& index = s.getIndex().getImpl();
std::optional<Id> col0Id = permutedTriple[0]->toValueId(index.getVocab());
std::optional<Id> col1Id =
s.numVariables_ == 2 ? std::nullopt
: permutedTriple[1]->toValueId(index.getVocab());
if (!col0Id.has_value() || (!col1Id.has_value() && s.numVariables_ == 1)) {
return std::nullopt;
}

return index.getPermutation(s.permutation())
.getMetadataAndBlocks(col0Id.value(), col1Id);
};

// ________________________________________________________________
std::array<Permutation::IdTableGenerator, 2>
IndexScan::lazyScanForJoinOfTwoScans(const IndexScan& s1, const IndexScan& s2) {
AD_CONTRACT_CHECK(s1.numVariables_ < 3 && s2.numVariables_ < 3);

// This function only works for single column joins. This means that the first
// variable of both scans must be equal, but the second variables of the scans
// (if present) must be different.
const auto& getFirstVariable = [](const IndexScan& scan) {
return scan.numVariables_ == 2 ? *scan.getPermutedTriple()[1]
: *scan.getPermutedTriple()[2];
};

AD_CONTRACT_CHECK(getFirstVariable(s1) == getFirstVariable(s2));
if (s1.numVariables_ == 2 && s2.numVariables_ == 2) {
AD_CONTRACT_CHECK(*s1.getPermutedTriple()[2] != *s2.getPermutedTriple()[2]);
}

auto metaBlocks1 = getMetadataForScan(s1);
auto metaBlocks2 = getMetadataForScan(s2);

if (!metaBlocks1.has_value() || !metaBlocks2.has_value()) {
return {{}};
}
auto [blocks1, blocks2] = CompressedRelationReader::getBlocksForJoin(
metaBlocks1.value(), metaBlocks2.value());

std::array result{getLazyScan(s1, blocks1), getLazyScan(s2, blocks2)};
result[0].details().numBlocksAll_ = metaBlocks1.value().blockMetadata_.size();
result[1].details().numBlocksAll_ = metaBlocks2.value().blockMetadata_.size();
return result;
}

// ________________________________________________________________
Permutation::IdTableGenerator IndexScan::lazyScanForJoinOfColumnWithScan(
std::span<const Id> joinColumn, const IndexScan& s) {
AD_EXPENSIVE_CHECK(std::ranges::is_sorted(joinColumn));
AD_CONTRACT_CHECK(s.numVariables_ == 1 || s.numVariables_ == 2);

auto metaBlocks1 = getMetadataForScan(s);

if (!metaBlocks1.has_value()) {
return {};
}
auto blocks = CompressedRelationReader::getBlocksForJoin(joinColumn,
metaBlocks1.value());

auto result = getLazyScan(s, blocks);
result.details().numBlocksAll_ = metaBlocks1.value().blockMetadata_.size();
return result;
}
20 changes: 20 additions & 0 deletions src/engine/IndexScan.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@ class IndexScan : public Operation {
// can be read from the Metadata.
size_t getExactSize() const { return sizeEstimate_; }

// Return two generators that lazily yield the results of `s1` and `s2` in
// blocks, but only the blocks that can theoretically contain matching rows
// when performing a join on the first column of the result of `s1` with the
// first column of the result of `s2`.
static std::array<Permutation::IdTableGenerator, 2> lazyScanForJoinOfTwoScans(
const IndexScan& s1, const IndexScan& s2);

// Return a generator that lazily yields the result of `s` in blocks, but only
// the blocks that can theoretically contain matching rows when performing a
// join between the first column of the result of `s` with the `joinColumn`.
// Requires that the `joinColumn` is sorted, else the behavior is undefined.
static Permutation::IdTableGenerator lazyScanForJoinOfColumnWithScan(
std::span<const Id> joinColumn, const IndexScan& s);

private:
// TODO<joka921> Make the `getSizeEstimateBeforeLimit()` function `const` for
// ALL the `Operations`.
Expand Down Expand Up @@ -89,4 +103,10 @@ class IndexScan : public Operation {
// `permutation_`. For example if `permutation_ == PSO` then the result is
// {&predicate_, &subject_, &object_}
std::array<const TripleComponent* const, 3> getPermutedTriple() const;

// Helper functions for the public `getLazyScanFor...` functions (see above).
static Permutation::IdTableGenerator getLazyScan(
const IndexScan& s, std::vector<CompressedBlockMetadata> blocks);
static std::optional<Permutation::MetadataAndBlocks> getMetadataForScan(
const IndexScan& s);
};
Loading