Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregates: use tile metadata. #4468

Merged
merged 2 commits into from
Nov 2, 2023

Conversation

KiterLuc
Copy link
Contributor

This allows the readers to use the tile metadata to do aggregation. The tiles still get loaded into memory even when not required but that will be addressed in a follow up change.


TYPE: IMPROVEMENT
DESC: Aggregates: use tile metadata.

@KiterLuc KiterLuc requested a review from robertbindar October 28, 2023 09:44
@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #35771: Aggregates use fragment metadata when possible..

This allows the readers to use the tile metadata to do aggregation. The tiles still get loaded into memory even when not required but that will be addressed in a follow up change.

---
TYPE: IMPROVEMENT
DESC: Aggregates: use tile metadata.
@KiterLuc KiterLuc force-pushed the lr/process-full-tiles-aggregates/ch35771 branch from 4f3d499 to 3b9063f Compare October 28, 2023 09:53
Copy link
Contributor

@robertbindar robertbindar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work. I left 2 comments related to improving code comments in some places and 1 extra comment regarding a performance concern in the dense changes.

start,
end - start + 1,
iter.pos_in_tile(),
stride,
iter.cell_slab_coords().data(),
dest_ptr));

// If any cell doesn't match the query condition, signal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, this flag is used to know that if QC is applied successfully (it filters out at least 1 result) on a set of results, you cannot use frag meta statistics in aggregations. Isn't it expensive to do this iteration even though there might even be no use for the set_qc_filtered_results information?

Can we not run the loop when it's not needed and when it is needed, make apply_dense compute the information instead of looping again outside apply_dense logic?

I remember from the last batch of performance fixes in dense that this is a sensitive area and I'm a bit worried we might regress (If I understand correcly here and in the worst case, we'll end up looping over the entire result set).

@KiterLuc KiterLuc merged commit 081bcc5 into dev Nov 2, 2023
55 checks passed
@KiterLuc KiterLuc deleted the lr/process-full-tiles-aggregates/ch35771 branch November 2, 2023 11:41
teo-tsirpanis added a commit to teo-tsirpanis/TileDB that referenced this pull request Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants