Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Investigate multiple calls to cudf::rolling_window() from GpuWindowExec #1931

Open
mythrocks opened this issue Mar 14, 2021 · 4 comments
Assignees
Labels
performance A performance related task/issue

Comments

@mythrocks
Copy link
Collaborator

This was noticed tangentially, when investigating cudf/pull/7568.

The minimal test-case that reproduced the libcudf issue above consisted of a 0.5KB Parquet dataset containing 15 records across 5 groups.

grouped_rolling_window() seemed invoked once per group, instead of once for the entire column, likely in a bid to keep groups whole. If groups could be packed into larger inputs for grouped_rolling_window(), the performance should be far better.

@mythrocks mythrocks added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 14, 2021
@mythrocks mythrocks self-assigned this Mar 14, 2021
@mythrocks mythrocks added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify bug Something isn't working labels Mar 14, 2021
@jlowe
Copy link
Member

jlowe commented Jan 13, 2022

@mythrocks Is this still relevant?

@revans2
Copy link
Collaborator

revans2 commented Jan 14, 2022

@mythrocks Is this still relevant?

We could save some time in building the full offsets that are passed to the underlying window operations. But we have not even measure how much of the time that is taking up.

@mythrocks
Copy link
Collaborator Author

Sorry I missed this.
I didn't actually see a slowdown, just that there seemed to be multiple calls coming through.

I can close this issue for now, and reopen if we find a slow path here.

@revans2
Copy link
Collaborator

revans2 commented Jan 28, 2022

I want to keep it open because we know that there is duplicate code being called. It may be small, but I want to preserve this because at some point we are going to want to go through the backlog and start fixing things there. Just unassigned yourself from this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

3 participants