Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefiltering on CompressedBlockMetadata using binary search #1503

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
99da434
Added conversion str to int
realHannes Apr 25, 2024
424d023
templated function for toNumeric, add declaration to NaryExpression.h
realHannes Apr 25, 2024
0117e82
str to num for SparqlExpression implemented + added test
realHannes Apr 25, 2024
35fd0b1
Merge branch 'ad-freiburg:master' into master
realHannes Apr 26, 2024
94356c2
Update src/engine/sparqlExpressions/StringExpressions.cpp
realHannes Apr 26, 2024
decc8ba
Update src/engine/sparqlExpressions/StringExpressions.cpp
realHannes Apr 26, 2024
850152c
Update src/engine/sparqlExpressions/StringExpressions.cpp
realHannes Apr 26, 2024
d650d67
Update src/engine/sparqlExpressions/StringExpressions.cpp
realHannes Apr 26, 2024
46cc697
using now absl::from_chars() and stripping whitespaces for string to …
realHannes Apr 26, 2024
7fc5c28
added new functions to processIriFuntionCall() (for string to number)
realHannes Apr 26, 2024
efb0e24
renaming to: toIntExpression and toDoubleExpression for later more ge…
realHannes Apr 26, 2024
a88537c
made format (clang-format-16)
realHannes Apr 26, 2024
ca1e2e0
Update src/parser/sparqlParser/SparqlQleverVisitor.cpp
realHannes Apr 29, 2024
4adc831
Update src/parser/sparqlParser/SparqlQleverVisitor.cpp
realHannes Apr 29, 2024
d0f0d63
renaming in NaryExpression.h for accordance with other function, addi…
realHannes Apr 29, 2024
a118609
added test coverage for function calls makeIntExpression and make Dou…
realHannes Apr 29, 2024
062052e
toNumeric has now correct behavior and uses absl::from_chars() and st…
realHannes Apr 29, 2024
6d0f42a
made clang-format for NaryExpressionImpl.h
realHannes Apr 29, 2024
f90b8e2
Merge branch 'ad-freiburg:master' into master
realHannes May 6, 2024
fb88493
Merge branch 'ad-freiburg:master' into master
realHannes May 15, 2024
b2eb514
Merge remote-tracking branch 'upstream/master'
realHannes May 16, 2024
b165ac1
Merge branch 'ad-freiburg:master' into master
realHannes Jun 1, 2024
7a3dfb2
Merge branch 'master' of https://github.com/realHannes/qlever
realHannes Jun 1, 2024
fc0ad3a
Merge branch 'ad-freiburg:master' into master
realHannes Jun 6, 2024
f3e6086
Merge branch 'ad-freiburg:master' into master
realHannes Jun 7, 2024
fd4c351
Merge branch 'ad-freiburg:master' into master
realHannes Jun 10, 2024
220c9bf
Merge branch 'ad-freiburg:master' into master
realHannes Jun 12, 2024
a81cb8a
Merge branch 'ad-freiburg:master' into master
realHannes Jun 13, 2024
acc0109
Merge branch 'ad-freiburg:master' into master
realHannes Jun 14, 2024
cb8e560
Merge branch 'ad-freiburg:master' into master
realHannes Jun 20, 2024
2b39970
Merge branch 'ad-freiburg:master' into master
realHannes Jun 28, 2024
a0101e4
Merge branch 'ad-freiburg:master' into master
realHannes Jun 30, 2024
47dee1d
Merge branch 'ad-freiburg:master' into master
realHannes Sep 4, 2024
a8625f9
Merge branch 'ad-freiburg:master' into master
realHannes Sep 9, 2024
7f95069
Merge branch 'ad-freiburg:master' into master
realHannes Sep 12, 2024
28b8a69
Merge branch 'ad-freiburg:master' into master
realHannes Sep 13, 2024
7bb6e01
Merge branch 'ad-freiburg:master' into master
realHannes Sep 14, 2024
edc974a
Add implementation for pre-filtering blocks on their metadata
realHannes Sep 18, 2024
ca6c2ff
Merge branch 'master' into prefilter_on_metadata_relational
realHannes Sep 18, 2024
46be51a
fix spelling error
realHannes Sep 18, 2024
0385b56
add namespace for CompressedBlockPrefiltering.h
realHannes Sep 18, 2024
5707338
use std::ranges::set_union, std::ranges::upper_bound/lower_bound
realHannes Sep 18, 2024
c9687ec
move template specializations into namespace prefilterExpressions
realHannes Sep 18, 2024
d8246f7
fix
realHannes Sep 18, 2024
66b4872
hopefully fix namespace issue
realHannes Sep 18, 2024
e4eddc4
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Sep 20, 2024
b5d7dd7
first adaptations regarding the comments
realHannes Sep 20, 2024
802e3aa
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Sep 23, 2024
662b1d4
applied suggestions from review
realHannes Sep 24, 2024
510aa9e
fix std::move
realHannes Sep 24, 2024
0d7f75b
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Sep 25, 2024
c119200
adjusted CompressedBlockPrefiltering
realHannes Sep 26, 2024
af17181
mark method with override
realHannes Sep 26, 2024
3d7238d
Merge branch 'master' into prefilter_on_metadata_relational
realHannes Oct 8, 2024
5af6138
make further changes to prefiltering procedure and its tests
realHannes Oct 8, 2024
b02dc57
add additional testing on IDs from other datatypes
realHannes Oct 8, 2024
eb86f1b
addition to test Not Expression with And and Or
realHannes Oct 8, 2024
69ed019
changes to test and fix index check fail
realHannes Oct 9, 2024
854322b
increase test coverage for Not expression
realHannes Oct 9, 2024
b2aa689
just add all mixed blocks
realHannes Oct 9, 2024
546c3b3
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Oct 9, 2024
7e61cb1
remove unnecessary values in test context
realHannes Oct 9, 2024
009ebed
for sonar check
realHannes Oct 10, 2024
6c19bf6
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Oct 10, 2024
643aed3
Merge branch 'master' into prefilter_on_metadata_relational
realHannes Oct 23, 2024
027536d
added test fixture (use TEST_F)
realHannes Oct 23, 2024
dbdd467
full test restructuring
realHannes Oct 24, 2024
57831d1
last changes w.r.t. comments
realHannes Oct 24, 2024
a5ba72c
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Oct 25, 2024
861152c
changes for sonar
realHannes Oct 25, 2024
f98d4e7
Merge branch 'ad-freiburg:master' into prefilter_on_metadata_relational
realHannes Oct 27, 2024
e5c3a9f
adjust checkOrder + add comments in test
realHannes Oct 27, 2024
a60b51f
small change to checkOrder
realHannes Oct 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions src/global/ValueIdComparators.h
Original file line number Diff line number Diff line change
Expand Up @@ -349,9 +349,12 @@ inline std::vector<std::pair<RandomIt, RandomIt>> getRangesForIndexTypes(
// Helper function: Sort the non-overlapping ranges in `input` by the first
// element, remove the empty ranges, and merge directly adjacent ranges
inline auto simplifyRanges =
[]<typename RandomIt>(std::vector<std::pair<RandomIt, RandomIt>> input) {
// Eliminate empty ranges
std::erase_if(input, [](const auto& p) { return p.first == p.second; });
[]<typename RandomIt>(std::vector<std::pair<RandomIt, RandomIt>> input,
bool removeEmptyRanges = true) {
if (removeEmptyRanges) {
// Eliminate empty ranges
std::erase_if(input, [](const auto& p) { return p.first == p.second; });
}
std::sort(input.begin(), input.end());
if (input.empty()) {
return input;
Expand All @@ -378,9 +381,13 @@ inline auto simplifyRanges =
// 2. The condition x `comparison` value is fulfilled, where value is the value
// of `valueId`.
// 3. The datatype of x and `valueId` are compatible.
//
// When setting the flag argument `removeEmptyRanges` to false, empty ranges
// [`begin`, `end`] where `begin` is equal to `end` will not be discarded.
template <typename RandomIt>
inline std::vector<std::pair<RandomIt, RandomIt>> getRangesForId(
RandomIt begin, RandomIt end, ValueId valueId, Comparison comparison) {
RandomIt begin, RandomIt end, ValueId valueId, Comparison comparison,
bool removeEmptyRanges = true) {
// For the evaluation of FILTERs, comparisons that involve undefined values
// are always false.
if (valueId.getDatatype() == Datatype::Undefined) {
Expand All @@ -389,11 +396,15 @@ inline std::vector<std::pair<RandomIt, RandomIt>> getRangesForId(
// This lambda enforces the invariants `non-empty` and `sorted`.
switch (valueId.getDatatype()) {
case Datatype::Double:
return detail::simplifyRanges(detail::getRangesForIntsAndDoubles(
begin, end, valueId.getDouble(), comparison));
return detail::simplifyRanges(
detail::getRangesForIntsAndDoubles(begin, end, valueId.getDouble(),
comparison),
removeEmptyRanges);
case Datatype::Int:
return detail::simplifyRanges(detail::getRangesForIntsAndDoubles(
begin, end, valueId.getInt(), comparison));
return detail::simplifyRanges(
detail::getRangesForIntsAndDoubles(begin, end, valueId.getInt(),
comparison),
removeEmptyRanges);
case Datatype::Undefined:
case Datatype::VocabIndex:
case Datatype::LocalVocabIndex:
Expand All @@ -405,7 +416,8 @@ inline std::vector<std::pair<RandomIt, RandomIt>> getRangesForId(
case Datatype::BlankNodeIndex:
// For `Date` the trivial comparison via bits is also correct.
return detail::simplifyRanges(
detail::getRangesForIndexTypes(begin, end, valueId, comparison));
detail::getRangesForIndexTypes(begin, end, valueId, comparison),
removeEmptyRanges);
}
AD_FAIL();
}
Expand Down
1 change: 1 addition & 0 deletions src/index/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ add_library(index
DocsDB.cpp FTSAlgorithms.cpp
PrefixHeuristic.cpp CompressedRelation.cpp
PatternCreator.cpp ScanSpecification.cpp
CompressedBlockPrefiltering.cpp
DeltaTriples.cpp LocalVocabEntry.cpp)
qlever_target_link_libraries(index util parser vocabulary ${STXXL_LIBRARIES})
291 changes: 291 additions & 0 deletions src/index/CompressedBlockPrefiltering.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
// Copyright 2024, University of Freiburg,
// Chair of Algorithms and Data Structures
// Author: Hannes Baumann <[email protected]>

#include "index/CompressedBlockPrefiltering.h"

#include "global/ValueIdComparators.h"

namespace prefilterExpressions {

// HELPER FUNCTIONS
//______________________________________________________________________________
// Given a PermutedTriple retrieve the suitable Id w.r.t. a column (index).
static Id getIdFromColumnIndex(const BlockMetadata::PermutedTriple& triple,
size_t columnIndex) {
switch (columnIndex) {
case 0:
return triple.col0Id_;
case 1:
return triple.col1Id_;
case 2:
return triple.col2Id_;
default:
// columnIndex out of bounds
AD_FAIL();
}
};

//______________________________________________________________________________
realHannes marked this conversation as resolved.
Show resolved Hide resolved
// Extract the Ids from the given `PermutedTriple` in a tuple w.r.t. the
// position (column index) defined by `ignoreIndex`. The ignored positions are
// filled with Ids `Id::min()`. `Id::min()` is guaranteed
// to be smaller than Ids of all other types.
static auto getMaskedTriple(const BlockMetadata::PermutedTriple& triple,
size_t ignoreIndex = 3) {
const Id& undefined = Id::min();
switch (ignoreIndex) {
case 3:
return std::make_tuple(triple.col0Id_, triple.col1Id_, triple.col2Id_);
case 2:
return std::make_tuple(triple.col0Id_, triple.col1Id_, undefined);
case 1:
return std::make_tuple(triple.col0Id_, undefined, undefined);
case 0:
return std::make_tuple(undefined, undefined, undefined);
default:
// ignoreIndex out of bounds
AD_FAIL();
}
};

//______________________________________________________________________________
// Check required conditions.
static void checkEvalRequirements(const std::vector<BlockMetadata>& input,
size_t evaluationColumn) {
const auto throwRuntimeError = [](const std::string& errorMessage) {
throw std::runtime_error(errorMessage);
};
// Check for duplicates.
if (auto it = std::ranges::adjacent_find(input); it != input.end()) {
throwRuntimeError("The provided data blocks must be unique.");
}
// Helper to check for fully sorted blocks. Return `true` if `b1 < b2` is
// satisfied.
const auto checkOrder = [](const BlockMetadata& b1, const BlockMetadata& b2) {
if (b1.blockIndex_ < b2.blockIndex_) {
AD_CORRECTNESS_CHECK(getMaskedTriple(b1.lastTriple_) <=
getMaskedTriple(b2.lastTriple_));
return true;
}
if (b1.blockIndex_ == b2.blockIndex_) {
// Given the previous check detects duplicates in the input, the
// correctness check here will never evaluate to true.
// => blockIndex_ assignment issue.
AD_CORRECTNESS_CHECK(b1 == b2);
} else {
AD_CORRECTNESS_CHECK(getMaskedTriple(b1.lastTriple_) >
getMaskedTriple(b2.firstTriple_));
}
return false;
};
if (!std::ranges::is_sorted(input, checkOrder)) {
throwRuntimeError("The blocks must be provided in sorted order.");
}
// Helper to check for column consistency. Returns `true` if the columns for
// `b1` and `b2` up to the evaluation are inconsistent.
const auto checkColumnConsistency =
[evaluationColumn](const BlockMetadata& b1, const BlockMetadata& b2) {
const auto& b1Last = getMaskedTriple(b1.lastTriple_, evaluationColumn);
const auto& b2First =
getMaskedTriple(b2.firstTriple_, evaluationColumn);
return getMaskedTriple(b1.firstTriple_, evaluationColumn) != b1Last ||
b1Last != b2First ||
b2First != getMaskedTriple(b2.lastTriple_, evaluationColumn);
};
if (auto it = std::ranges::adjacent_find(input, checkColumnConsistency);
it != input.end()) {
throwRuntimeError(
"The values in the columns up to the evaluation column must be "
"consistent.");
}
};

//______________________________________________________________________________
// Given two sorted `vector`s containing `BlockMetadata`, this function
// returns their merged `BlockMetadata` content in a `vector` which is free of
// duplicates and ordered.
static auto getSetUnion(const std::vector<BlockMetadata>& blocks1,
const std::vector<BlockMetadata>& blocks2) {
std::vector<BlockMetadata> mergedVectors;
mergedVectors.reserve(blocks1.size() + blocks2.size());
const auto blockLessThanBlock = [](const BlockMetadata& b1,
const BlockMetadata& b2) {
return b1.blockIndex_ < b2.blockIndex_;
};
// Given that we have vectors with sorted (BlockMedata) values, we can
// use std::ranges::set_union. Thus the complexity is O(n + m).
std::ranges::set_union(blocks1, blocks2, std::back_inserter(mergedVectors),
blockLessThanBlock);
mergedVectors.shrink_to_fit();
return mergedVectors;
}

// SECTION PREFILTER EXPRESSION (BASE CLASS)
//______________________________________________________________________________
std::vector<BlockMetadata> PrefilterExpression::evaluate(
const std::vector<BlockMetadata>& input, size_t evaluationColumn) const {
checkEvalRequirements(input, evaluationColumn);
const auto& relevantBlocks = evaluateImpl(input, evaluationColumn);
checkEvalRequirements(relevantBlocks, evaluationColumn);
return relevantBlocks;
};

// SECTION RELATIONAL OPERATIONS
//______________________________________________________________________________
template <CompOp Comparison>
std::unique_ptr<PrefilterExpression>
RelationalExpression<Comparison>::logicalComplement() const {
using enum CompOp;
switch (Comparison) {
case LT:
// Complement X < Y: X >= Y
return std::make_unique<GreaterEqualExpression>(referenceId_);
case LE:
// Complement X <= Y: X > Y
return std::make_unique<GreaterThanExpression>(referenceId_);
case EQ:
// Complement X == Y: X != Y
return std::make_unique<NotEqualExpression>(referenceId_);
case NE:
// Complement X != Y: X == Y
return std::make_unique<EqualExpression>(referenceId_);
case GE:
// Complement X >= Y: X < Y
return std::make_unique<LessThanExpression>(referenceId_);
case GT:
// Complement X > Y: X <= Y
return std::make_unique<LessEqualExpression>(referenceId_);
default:
AD_FAIL();
}
};

//______________________________________________________________________________
template <CompOp Comparison>
std::vector<BlockMetadata> RelationalExpression<Comparison>::evaluateImpl(
const std::vector<BlockMetadata>& input, size_t evaluationColumn) const {
using namespace valueIdComparators;
std::vector<ValueId> valueIdsInput;
// For each BlockMetadata value in vector input, we have a respective Id for
// firstTriple and lastTriple
valueIdsInput.reserve(2 * input.size());
std::vector<BlockMetadata> mixedDatatypeBlocks;

for (const auto& block : input) {
const auto firstId =
getIdFromColumnIndex(block.firstTriple_, evaluationColumn);
const auto secondId =
getIdFromColumnIndex(block.lastTriple_, evaluationColumn);
valueIdsInput.push_back(firstId);
valueIdsInput.push_back(secondId);

if (firstId.getDatatype() != secondId.getDatatype()) {
mixedDatatypeBlocks.push_back(block);
}
}

// Use getRangesForId (from valueIdComparators) to extract the ranges
// containing the relevant ValueIds.
// For pre-filtering with CompOp::EQ, we have to consider empty ranges.
// Reason: The referenceId_ could be contained within the bounds formed by
// the IDs of firstTriple_ and lastTriple_ (set false flag to keep
// empty ranges).
auto relevantIdRanges =
Comparison != CompOp::EQ
? getRangesForId(valueIdsInput.begin(), valueIdsInput.end(),
referenceId_, Comparison)
: getRangesForId(valueIdsInput.begin(), valueIdsInput.end(),
referenceId_, Comparison, false);

// The vector for relevant BlockMetadata values which contain ValueIds
// defined as relevant by relevantIdRanges.
std::vector<BlockMetadata> relevantBlocks;
// Reserve memory, input.size() is upper bound.
relevantBlocks.reserve(input.size());

// Given the relevant Id ranges, retrieve the corresponding relevant
// BlockMetadata values from vector input and add them to the relevantBlocks
// vector.
auto endValueIdsInput = valueIdsInput.end();
for (const auto& [firstId, secondId] : relevantIdRanges) {
// Ensures that index is within bounds of index vector.
auto secondIdAdjusted =
secondId < endValueIdsInput ? secondId + 1 : secondId;
relevantBlocks.insert(
relevantBlocks.end(),
input.begin() + std::distance(valueIdsInput.begin(), firstId) / 2,
// Round up, for Ids contained within the bounding Ids of firstTriple
// and lastTriple we have to include the respective metadata block
// (that block is partially relevant).
input.begin() +
std::distance(valueIdsInput.begin(), secondIdAdjusted) / 2);
}
relevantBlocks.shrink_to_fit();
// Merge mixedDatatypeBlocks into relevantBlocks while maintaining order and
// avoiding duplicates.
return getSetUnion(relevantBlocks, mixedDatatypeBlocks);
};

// SECTION LOGICAL OPERATIONS
//______________________________________________________________________________
template <LogicalOperators Operation>
std::unique_ptr<PrefilterExpression>
LogicalExpression<Operation>::logicalComplement() const {
using enum LogicalOperators;
// Source De-Morgan's laws: De Morgan's laws, Wikipedia.
// Reference: https://en.wikipedia.org/wiki/De_Morgan%27s_laws
if constexpr (Operation == OR) {
// De Morgan's law: not (A or B) = (not A) and (not B)
return std::make_unique<AndExpression>(child1_->logicalComplement(),
child2_->logicalComplement());
} else {
static_assert(Operation == AND);
// De Morgan's law: not (A and B) = (not A) or (not B)
return std::make_unique<OrExpression>(child1_->logicalComplement(),
child2_->logicalComplement());
}
};

//______________________________________________________________________________
std::unique_ptr<PrefilterExpression> NotExpression::logicalComplement() const {
// Logically we complement (negate) a NOT here => NOT cancels out.
// Therefore, we can simply return the child of the respective NOT
// expression after undoing its previous complementation.
return child_->logicalComplement();
};

//______________________________________________________________________________
template <LogicalOperators Operation>
std::vector<BlockMetadata> LogicalExpression<Operation>::evaluateImpl(
const std::vector<BlockMetadata>& input, size_t evaluationColumn) const {
using enum LogicalOperators;
if constexpr (Operation == AND) {
auto resultChild1 = child1_->evaluate(input, evaluationColumn);
return child2_->evaluate(resultChild1, evaluationColumn);
} else {
static_assert(Operation == OR);
return getSetUnion(child1_->evaluate(input, evaluationColumn),
child2_->evaluate(input, evaluationColumn));
}
};

//______________________________________________________________________________
std::vector<BlockMetadata> NotExpression::evaluateImpl(
const std::vector<BlockMetadata>& input, size_t evaluationColumn) const {
return child_->evaluate(input, evaluationColumn);
};

//______________________________________________________________________________
// Necessary instantiation of template specializations
template class RelationalExpression<CompOp::LT>;
template class RelationalExpression<CompOp::LE>;
template class RelationalExpression<CompOp::GE>;
template class RelationalExpression<CompOp::GT>;
template class RelationalExpression<CompOp::EQ>;
template class RelationalExpression<CompOp::NE>;

template class LogicalExpression<LogicalOperators::AND>;
template class LogicalExpression<LogicalOperators::OR>;

} // namespace prefilterExpressions
Loading
Loading