Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ql:contains-word now can show the score of the word match in the respective text #1397

Merged
merged 56 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
ea9d39c
ql:contains-word now can show the respective word-score.
Flixtastic Jul 12, 2024
30736ef
Fixed tests and formatted files.
Flixtastic Jul 12, 2024
e752db8
New formatting for Word Score Variables. Changed where necessary and …
Flixtastic Jul 27, 2024
4ef4d93
Merge branch 'ad-freiburg:master' into master
Flixtastic Jul 27, 2024
d52063f
Merge branch 'ad-freiburg:master' into master
Flixtastic Jul 29, 2024
c6fe0c6
Merge branch 'master' of github.com:Flixtastic/qlever.
Flixtastic Jul 29, 2024
d0b9ee8
Added getWordSCoreVariable for std::string_view
Flixtastic Jul 29, 2024
2eade97
Merge branch 'ad-freiburg:master' into master
Flixtastic Sep 23, 2024
595cb57
Merge branch 'ad-freiburg:master' into master
Flixtastic Oct 4, 2024
b4c8c3b
Merge branch 'ad-freiburg:master' into master
Flixtastic Oct 26, 2024
72e5d64
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 12, 2024
d8f9df4
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 15, 2024
29511c6
Made it possible to construct query execution contexts with text inde…
Flixtastic Nov 15, 2024
3855978
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 17, 2024
6021401
Reduced usage of column copying in TextIndexScanForWord.cpp
Flixtastic Nov 17, 2024
d9701ae
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 19, 2024
5f0ce01
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 3, 2024
e2c47cf
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 3, 2024
e6a0cf7
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 4, 2024
ed9fbda
Changed the counting of nofNonLiterals to nofLiterals. Some methods a…
Flixtastic Dec 4, 2024
5ad3d8f
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 4, 2024
af6bd64
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 5, 2024
56ea531
Cleaned up the filtering in TextIndexScanForWord::computeResult and c…
Flixtastic Dec 5, 2024
e1e12e9
renamed nofLiterals to nofLiteralsInTextIndex
Flixtastic Dec 5, 2024
017588c
Removed redundant method getWordScoreVariable
Flixtastic Dec 5, 2024
46666d0
added method appendEscapedWord to escape special chars in Variables
Flixtastic Dec 5, 2024
f36f189
Added two function in the TextIndexScanTestHelpers.h to add content t…
Flixtastic Dec 5, 2024
c62a7e6
Added tests for Scores. Also commented tests and refined them
Flixtastic Dec 5, 2024
89f0b27
Changed the getQec function and the respective makeTestIndex to take …
Flixtastic Dec 5, 2024
058e8ed
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 6, 2024
e8bf56e
Fix the multiple definition error.
joka921 Dec 12, 2024
5173aeb
Merge branch 'master' into flixtastic-master
joka921 Dec 12, 2024
4a15994
Make query planning of index scans fast again (#1674)
joka921 Dec 12, 2024
70964d6
Allow operations to not store their result in the cache (#1665)
joka921 Dec 12, 2024
4237e0d
For C++17, use `range-v3` instead of `std::ranges` (#1667)
joka921 Dec 12, 2024
1adcecb
Reverting the nofLiterals being saved in the TextMetaData and instead…
Flixtastic Dec 12, 2024
f5eefab
Revert to first sync and then reapply "Reverting the nofLiterals bein…
Flixtastic Dec 12, 2024
583a67a
ql:contains-word now can show the respective word-score.
Flixtastic Dec 12, 2024
e4cb2ed
Fixed tests and formatted files.
Flixtastic Jul 12, 2024
3ce304d
New formatting for Word Score Variables. Changed where necessary and …
Flixtastic Jul 27, 2024
eb8e83a
Added getWordSCoreVariable for std::string_view
Flixtastic Jul 29, 2024
cd4789a
Made it possible to construct query execution contexts with text inde…
Flixtastic Dec 12, 2024
fdba417
Changed the counting of nofNonLiterals to nofLiterals. Some methods a…
Flixtastic Dec 4, 2024
6686325
renamed nofLiterals to nofLiteralsInTextIndex
Flixtastic Dec 5, 2024
0faf3d0
Removed redundant method getWordScoreVariable
Flixtastic Dec 5, 2024
eafd594
added method appendEscapedWord to escape special chars in Variables
Flixtastic Dec 5, 2024
fd01a97
Added two function in the TextIndexScanTestHelpers.h to add content t…
Flixtastic Dec 5, 2024
65842f4
Added tests for Scores. Also commented tests and refined them
Flixtastic Dec 5, 2024
baa10cf
Changed the getQec function and the respective makeTestIndex to take …
Flixtastic Dec 5, 2024
6bb80d3
Fix the multiple definition error.
joka921 Dec 12, 2024
d093d85
Reverting the nofLiterals being saved in the TextMetaData and instead…
Flixtastic Dec 12, 2024
716e828
Revert to first sync and then reapply "Reverting the nofLiterals bein…
Flixtastic Dec 12, 2024
613a2c4
Merge remote-tracking branch 'origin/master'
Flixtastic Dec 12, 2024
2e32bd3
Reverting the nofLiterals being saved in the TextMetaData and instead…
Flixtastic Dec 12, 2024
e93f944
Changed some naming to better describe functions
Flixtastic Dec 12, 2024
deb1e37
Changed the ambiguous naming of nofNonLiterals to nofNonLiteralsInTex…
Flixtastic Dec 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/native-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ jobs:
- compiler: clang
compiler-version: 13
include:
- compiler: gcc
compiler-version: 11
additional-cmake-options: "-DUSE_CPP_17_BACKPORTS=ON"
build-type: Release
- compiler: clang
compiler-version: 16
asan-flags: "-fsanitize=address -fno-omit-frame-pointer"
Expand Down
22 changes: 21 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,17 @@ FetchContent_Declare(
SOURCE_SUBDIR runtime/Cpp
)

#################################
# Range v3 (for C++-17 backwards compatibility)
################################
FetchContent_Declare(
range-v3
GIT_REPOSITORY https://github.com/joka921/range-v3
GIT_TAG 1dc0b09abab1bdc7d085a78754abd5c6e37a5d0c # 0.12.0
)



################################
# Threading
################################
Expand Down Expand Up @@ -184,6 +195,14 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra")
# Enable the specification of additional compiler flags manually from the commandline
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ADDITIONAL_COMPILER_FLAGS}")

# Enable the manual usage of the C++ 17 backports (currently `range-v3` instead
# of `std::ranges` and the `std::enable_if_t` based expansion of the concept
# macros from `range-v3`.
set(USE_CPP_17_BACKPORTS OFF CACHE BOOL "Use the C++17 backports (range-v3 and enable_if_t instead of std::ranges and concepts)")
if (${USE_CPP_17_BACKPORTS})
add_definitions("-DQLEVER_CPP_17 -DCPP_CXX_CONCEPTS=0")
endif()

# Enable the specification of additional linker flags manually from the commandline
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${ADDITIONAL_LINKER_FLAGS}")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${ADDITIONAL_LINKER_FLAGS}")
Expand Down Expand Up @@ -321,7 +340,7 @@ FetchContent_Declare(
################################
# Apply FetchContent
################################
FetchContent_MakeAvailable(googletest ctre abseil re2 stxxl fsst s2 nlohmann-json antlr)
FetchContent_MakeAvailable(googletest ctre abseil re2 stxxl fsst s2 nlohmann-json antlr range-v3)
# Disable some warnings in RE2, STXXL, and GTEST
target_compile_options(s2 PRIVATE -Wno-sign-compare -Wno-unused-parameter -Wno-class-memaccess -Wno-comment -Wno-redundant-move -Wno-unknown-warning-option -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-unused-but-set-variable -Wno-unused-function)
target_compile_options(re2 PRIVATE -Wno-unused-parameter)
Expand All @@ -333,6 +352,7 @@ include_directories(${ctre_SOURCE_DIR}/single-header)
target_compile_options(fsst PRIVATE -Wno-extra -Wno-all -Wno-error)
target_compile_options(fsst12 PRIVATE -Wno-extra -Wno-all -Wno-error)
include_directories(${fsst_SOURCE_DIR})
include_directories(${range-v3_SOURCE_DIR}/include)
target_compile_options(antlr4_static PRIVATE -Wno-all -Wno-extra -Wno-error -Wno-deprecated-declarations)
# Only required because a lot of classes that do not explicitly link against antlr4_static use the headers.
include_directories(SYSTEM "${antlr_SOURCE_DIR}/runtime/Cpp/runtime/src")
Expand Down
33 changes: 24 additions & 9 deletions e2e/scientists_queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,31 +55,43 @@ queries:
?t ql:contains-word "RElaT* phySIKalische rela*"
}
checks:
- num_cols: 5
- selected: [ "?x", "?ql_score_t_var_x", "?t", "?ql_matchingword_t_relat", "?ql_matchingword_t_rela" ]
- num_cols: 8
- selected: [ "?x", "?ql_score_t_var_x", "?t", "?ql_score_prefix_t_RElaT", "?ql_matchingword_t_relat", "?ql_score_word_t_phySIKalische", "?ql_score_prefix_t_rela", "?ql_matchingword_t_rela" ]
- contains_row:
- "<Albert_Einstein>"
- null
- null
- null
- "relationship"
- null
- null
- "relationship"
- contains_row:
- "<Albert_Einstein>"
- null
- null
- null
- "relationship"
- null
- null
- "relativity"
- contains_row:
- "<Albert_Einstein>"
- null
- null
- null
- "relativity"
- null
- null
- "relationship"
- contains_row:
- "<Albert_Einstein>"
- null
- null
- null
- "relativity"
- null
- null
- "relativity"

- query: algo-star-female-scientists
Expand Down Expand Up @@ -151,7 +163,7 @@ queries:
}
TEXTLIMIT 2
checks:
- num_cols: 7
- num_cols: 9
- num_rows: 18

- query: algor-star-female-born-before-1940
Expand Down Expand Up @@ -192,7 +204,7 @@ queries:
}
ORDER BY DESC(?ql_score_text_fixedEntity__60_Ada_95_Lovelace_62_)
checks:
- num_cols: 5
- num_cols: 6
- num_rows: 7
- contains_row:
- "<Ada_Lovelace>"
Expand All @@ -202,6 +214,7 @@ queries:
Charles Babbage, also known as' the father of computers', and in
particular, Babbage's work on the Analytical Engine."
- null
- null
- "relationship"
- order_numeric: {"dir": "DESC",
"var" : "?ql_score_text_fixedEntity__60_Ada_95_Lovelace_62_"}
Expand All @@ -219,7 +232,7 @@ queries:
ORDER BY DESC(?ql_score_text_fixedEntity__60_Ada_95_Lovelace_62_)
TEXTLIMIT 2
checks:
- num_cols: 5
- num_cols: 6
- num_rows: 3
- contains_row:
- "<Ada_Lovelace>"
Expand All @@ -229,6 +242,7 @@ queries:
Charles Babbage, also known as' the father of computers', and in
particular, Babbage's work on the Analytical Engine."
- null
- null
- "relationship"
- order_numeric: {"dir": "DESC",
"var" : "?ql_score_text_fixedEntity__60_Ada_95_Lovelace_62_"}
Expand All @@ -246,7 +260,7 @@ queries:
}
TEXTLIMIT 1
checks:
- num_cols: 6
- num_cols: 7
- num_rows: 2
- contains_row:
- "<Ada_Lovelace>"
Expand All @@ -255,6 +269,7 @@ queries:
with Somerville to visit Babbage as often as she could."
- null
- null
- null
- "relationship"


Expand Down Expand Up @@ -1391,10 +1406,10 @@ queries:
?t ql:contains-word "algo* herm* primary"
}
checks:
- num_cols: 5
- num_cols: 8
- num_rows: 1
- selected: [ "?x", "?ql_score_t_var_x", "?t", "?ql_matchingword_t_algo", "?ql_matchingword_t_herm" ]
- contains_row: [ "<Grete_Hermann>",null,"Hermann's algorithm for primary decomposition is still in use now.","algorithm","hermann" ]
- selected: [ "?x", "?ql_score_t_var_x", "?t", "?ql_score_prefix_t_algo", "?ql_matchingword_t_algo", "?ql_score_prefix_t_herm", "?ql_matchingword_t_herm", "?ql_score_word_t_primary" ]
- contains_row: [ "<Grete_Hermann>",null,"Hermann's algorithm for primary decomposition is still in use now.",null,"algorithm",null,"hermann",null ]


- query : select_asterisk_regex-lastname-stein
Expand Down
59 changes: 59 additions & 0 deletions src/backports/algorithm.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Copyright 2024, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Johannes Kalmbach <[email protected]>

#pragma once

#include <algorithm>
#include <functional>
#include <range/v3/all.hpp>
#include <utility>

// The following defines namespaces `ql::ranges` and `ql::views` that are almost
// drop-in replacements for `std::ranges` and `std::views`. In C++20 mode (when
// the `QLEVER_CPP_17` macro is not used), these namespaces are simply aliases
// for `std::ranges` and `std::views`. In C++17 mode they contain the ranges and
// views from Erice Niebler's `range-v3` library. NOTE: `ql::ranges::unique`
// currently doesn't work, because the interface to this function is different
// in both implementations. NOTE: There might be other caveats which we are
// currently not aware of, because they only affect functions that we currently
// don't use. For those, the following header can be expanded in the future.
#ifndef QLEVER_CPP_17
#include <concepts>
#include <ranges>
#endif

namespace ql {

namespace ranges {
#ifdef QLEVER_CPP_17
using namespace ::ranges;

// The `view` concept (which is rather important when implementing custom views)
// is in a different namespace in range-v3, so we make it manually accessible.
template <typename T>
CPP_concept view = ::ranges::cpp20::view<T>;
#else
using namespace std::ranges;
#endif
} // namespace ranges

namespace views {
#ifdef QLEVER_CPP_17
using namespace ::ranges::views;
#else
using namespace std::views;
#endif
} // namespace views

// The namespace `ql::concepts` includes concepts that are contained in the
// C++20 standard as well as in `range-v3`.
namespace concepts {
#ifdef QLEVER_CPP_17
using namespace ::concepts;
#else
using namespace std;
#endif
} // namespace concepts

} // namespace ql
17 changes: 17 additions & 0 deletions src/backports/concepts.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// Copyright 2024, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Johannes Kalmbach <[email protected]>

#pragma once

// Define the following macros:
// `QL_OPT_CONCEPT(arg)` which expands to `arg` in C++20 mode, and to nothing in
// C++17 mode. It can be used to easily opt out of concepts that are only used
// for documentation and increased safety and not for overload resolution.
// Example usage:
// `(QL_OPT_CONCEPT(std::view) auto x = someFunction();`
#ifdef QLEVER_CPP_17
#define QL_OPT_CONCEPT(arg)
#else
#define QL_OPT_CONCEPT(arg) arg
#endif
4 changes: 2 additions & 2 deletions src/engine/AddCombinedRowToTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -349,8 +349,8 @@ class AddCombinedRowToIdTable {
// Make sure to reset `mergedVocab_` so it is in a valid state again.
mergedVocab_ = LocalVocab{};
// Only merge non-null vocabs.
auto range = currentVocabs_ | std::views::filter(toBool) |
std::views::transform(dereference);
auto range = currentVocabs_ | ql::views::filter(toBool) |
ql::views::transform(dereference);
mergedVocab_.mergeWith(std::ranges::ref_view{range});
}
}
Expand Down
4 changes: 2 additions & 2 deletions src/engine/Bind.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ IdTable Bind::cloneSubView(const IdTable& idTable,
const std::pair<size_t, size_t>& subrange) {
IdTable result(idTable.numColumns(), idTable.getAllocator());
result.resize(subrange.second - subrange.first);
std::ranges::copy(idTable.begin() + subrange.first,
idTable.begin() + subrange.second, result.begin());
ql::ranges::copy(idTable.begin() + subrange.first,
idTable.begin() + subrange.second, result.begin());
return result;
}

Expand Down
4 changes: 2 additions & 2 deletions src/engine/CallFixedSize.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ template <int maxValue, size_t NumValues, std::integral Int>
auto callLambdaForIntArray(std::array<Int, NumValues> array, auto&& lambda,
auto&&... args) {
AD_CONTRACT_CHECK(
std::ranges::all_of(array, [](auto el) { return el <= maxValue; }));
ql::ranges::all_of(array, [](auto el) { return el <= maxValue; }));
using ArrayType = std::array<Int, NumValues>;

// Call the `lambda` when the correct compile-time `Int`s are given as a
Expand Down Expand Up @@ -131,7 +131,7 @@ decltype(auto) callFixedSize(std::array<Int, NumIntegers> ints, auto&& functor,
static_assert(NumIntegers > 0);
// TODO<joka921, C++23> Use `std::bind_back`
auto p = [](int i) { return detail::mapToZeroIfTooLarge(i, MaxValue); };
std::ranges::transform(ints, ints.begin(), p);
ql::ranges::transform(ints, ints.begin(), p);

// The only step that remains is to lift our single runtime `value` which
// is in the range `[0, (MaxValue +1)^ NumIntegers]` to a compile-time
Expand Down
Loading
Loading