You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was noticed tangentially, when investigating cudf/pull/7568.
The minimal test-case that reproduced the libcudf issue above consisted of a 0.5KB Parquet dataset containing 15 records across 5 groups.
grouped_rolling_window() seemed invoked once per group, instead of once for the entire column, likely in a bid to keep groups whole. If groups could be packed into larger inputs for grouped_rolling_window(), the performance should be far better.
The text was updated successfully, but these errors were encountered:
We could save some time in building the full offsets that are passed to the underlying window operations. But we have not even measure how much of the time that is taking up.
I want to keep it open because we know that there is duplicate code being called. It may be small, but I want to preserve this because at some point we are going to want to go through the backlog and start fixing things there. Just unassigned yourself from this for now.
This was noticed tangentially, when investigating cudf/pull/7568.
The minimal test-case that reproduced the
libcudf
issue above consisted of a 0.5KB Parquet dataset containing 15 records across 5 groups.grouped_rolling_window()
seemed invoked once per group, instead of once for the entire column, likely in a bid to keep groups whole. If groups could be packed into larger inputs forgrouped_rolling_window()
, the performance should be far better.The text was updated successfully, but these errors were encountered: