Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show information about "lazy scans" in "Analysis" view #54

Merged
merged 1 commit into from
Aug 4, 2023

Conversation

hannahbast
Copy link
Member

The runtime information shown in the "Analysis" tree now contains additional information, notably such related to "lazy index scans", see PR ad-freiburg/qlever#1003 .

The status of the "normal" operations has been renamed to "fully materialized" and is not explicitly shown in the tree. The status of the lazy index scans is called "lazily materialized".

When hovering over a node in the tree, more detailed infomation is shown at the bottom if available.

The runtime information shown in the "Analysis" tree now contains
additional information, notably such related to "lazy index scans", see
PR ad-freiburg/qlever#1003 .

The status of the "normal" operations has been renamed to "fully
materialized" and is not explicitly shown in the tree. The status of
the lazy index scans is called "lazily materialized".

When hovering over a node in the tree, more detailed infomation is shown
at the bottom if available.
hannahbast pushed a commit to ad-freiburg/qlever that referenced this pull request Jul 12, 2023
Under certain conditions, index scans are now lazily materialized (and there is a special status for this in the runtime information, see ad-freiburg/qlever-ui#54).

The conditions are: (1) the index scan is the operand of a single-column join operation, (2) the other operand of the join operation contains no UNDEF values in the join column and the query planner knows this, and (3) the number of rows of the index scan, if it were fully materialized, is above the runtime parameter `lazy-index-scan-max-size-materialization`.

Lazily materialized then means two things: (1) a subset of the blocks is identified (using the metadata of the blocks) that is sufficient to compute the correct result, and (2) these blocks are produced one after the other and buffered only to a limited extent. There are two reasons for the buffering: (1) it is sometimes necessary to have several blocks in memory at the same time to compute cross-products efficiently, and (2) there is an `OrderedThreadSafeQueue` for the blocks (see #1011 and #1023) with runtime parameters `lazy-index-scan-queue-size` and `lazy-index-scan-num-threads` in order to allow parallel decompression of the blocks and minimize waiting times for the join operation consuming the blocks.

For all cases, where the conditions for lazy materialization are not met, the operands of the join are fully materialized, and one of the three existing join implementations is used: the standard zipper join, the galloping join (when the size of one operand is much larger than the other), or the general-purpose join (which can handle multiple columns and UNDEF values).
@hannahbast hannahbast merged commit 5931611 into master Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant