Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy materialization of index scans #1003

Merged
merged 179 commits into from
Jul 12, 2023
Merged

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Jun 15, 2023

This is not yet quite done and will be split up into several
preparation PRs.

joka921 added 30 commits April 25, 2023 15:22
Also several refactorings for the
IndexScan class.
The `ScanType` is now completely unused inside the `IndexScan` class.
coroutine for scanning.
permutation-templating
out of the code.

TODO:
Many functions can now be moved into the
IndexImpl.cpp
_this to that_ in the
IndexImpl class.
Next step:
actually implement the logic in the JOIN class.
removed some unused functions to
make Codecov happ(y | ier).
# Conflicts:
#	src/index/IndexImpl.Text.cpp
# Conflicts:
#	src/engine/GroupBy.cpp
#	src/engine/IndexScan.cpp
#	src/engine/IndexScan.h
#	src/engine/Join.cpp
#	src/index/Index.cpp
#	src/index/IndexImpl.Text.cpp
#	src/index/IndexImpl.cpp
#	src/index/IndexImpl.h
#	src/index/Permutations.h
#	test/IndexTest.cpp
block joining function,
We yet need to write unit tests for it.
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting there, some changes left

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, a few more changes as discussed + testing it on UniProt

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so AWESOME, three milestones in one.

@hannahbast
Copy link
Member

hannahbast commented Jul 10, 2023

Under certain conditions, index scans are now lazily materialized (and there is a special status for this in the runtime information, see ad-freiburg/qlever-ui#54).

The conditions are: (1) the index scan is the operand of a single-column join operation, (2) the other operand of the join operation contains no UNDEF values in the join column and the query planner knows this, and (3) the number of rows of the index scan, if it were fully materialized, is above the runtime parameter lazy-index-scan-max-size-materialization.

Lazily materialized then means two things: (1) a subset of the blocks is identified that is sufficient to compute the correct result, and (2) these blocks are produced one after the other and buffered only to a limited extent. There are two reasons for the buffering: (1) it is sometimes necessary to have several blocks in memory at the same time to compute cross-products efficiently, and (2) there is an OrderedThreadSafeQueue for the blocks (see #1011 and #1023) with runtime parameters lazy-index-scan-queue-size and lazy-index-scan-num-threads in order to allow parallel decompression of the blocks and minimize waiting times for the join operation consuming the blocks.

For all cases, where the conditions for lazy materialization are not met, one of the three existing join implementations is used: the standard zipper join, the galloping join, or the general-purpose join (which can handle multiple columns and UNDEF values).

joka921 added 10 commits July 11, 2023 11:49
# Conflicts:
#	src/util/ThreadSafeQueue.h
#	test/CMakeLists.txt
#	test/ThreadSafeQueueTest.cpp
… have the test coverage for the thread safe queue as well).
# Conflicts:
#	src/util/ThreadSafeQueue.h
#	test/ThreadSafeQueueTest.cpp
# Conflicts:
#	test/CMakeLists.txt
# Conflicts:
#	src/util/ThreadSafeQueue.h
#	test/ThreadSafeQueueTest.cpp
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, only very minor change left

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so awesome, three milestones in one!

@hannahbast hannahbast changed the title Lazy Join with index scans. Lazy materialization of index scans Jul 12, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 27 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch and thanks for the fix!

@hannahbast hannahbast merged commit 2fefc08 into ad-freiburg:master Jul 12, 2023
@joka921 joka921 deleted the join_with_scan branch July 12, 2023 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants