-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust, python): rolling_groupy was returning incorrect results when offset was positive #9082
Conversation
…n offset was positive
assert_eq!(groups[0], [0, 5]); | ||
assert_eq!(groups[1], [1, 5]); | ||
assert_eq!(groups[2], [2, 5]); | ||
assert_eq!(groups[3], [3, 5]); | ||
assert_eq!(groups[4], [4, 5]); | ||
assert_eq!(groups[5], [5, 4]); | ||
assert_eq!(groups[6], [6, 3]); | ||
assert_eq!(groups[7], [7, 2]); | ||
assert_eq!(groups[8], [8, 0]); | ||
assert_eq!(groups[0], [1, 4]); // (00:00, 02:00] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this test was right to begin with
if the dates are:
[00:00, 00:30, ..., 3:30, 4:00]
and the period is 2h
and the offset is 0h
, then the first window is (00:00, 02:00]
. So that includes points: [00:30, 01:00, 01:30, 02:00]
, i.e. the starting offset is 1
, and the length is 4
@@ -355,7 +355,7 @@ pub(crate) fn groupby_values_iter_partial_lookbehind<'a>( | |||
let b = Bounds::new(lower, upper); | |||
|
|||
for &t in &time[lagging_offset..] { | |||
if b.is_member(t, closed_window) || lagging_offset == i { | |||
if b.is_member(t, closed_window) || (offset.negative && lagging_offset == i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we branch on offset.negative
outside the loop?
Then we have two loops that compute the lagging_offset
. The loops are super small, so almost no code bloat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually get here if offset.negative
isn't set? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can tell, only in this test
polars/polars/polars-time/src/windows/test.rs
Lines 711 to 734 in dc7bfd2
let offset = Duration::parse("0h"); | |
let g0 = groupby_values_iter_full_lookahead( | |
period, | |
offset, | |
&dates, | |
closed_window, | |
tu, | |
NO_TIMEZONE.copied(), | |
0, | |
None, | |
) | |
.collect::<PolarsResult<Vec<_>>>() | |
.unwrap(); | |
let g1 = groupby_values_iter_partial_lookbehind( | |
period, | |
offset, | |
&dates, | |
closed_window, | |
tu, | |
NO_TIMEZONE.copied(), | |
) | |
.collect::<PolarsResult<Vec<_>>>() | |
.unwrap(); | |
assert_eq!(g0, g1); |
maybe it's better to just remove that test then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this part of the test, reverted the change to groupby_values_iter_partial_lookbehind
, and expanded the tests on the python side by doing quite a bit more parametrisation
@@ -400,7 +400,16 @@ pub(crate) fn groupby_values_iter_full_lookahead<'a>( | |||
|
|||
let b = Bounds::new(lower, upper); | |||
|
|||
debug_assert!(i < time.len()); | |||
// find starting point of window | |||
for &t in &time[i..] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have a nested loop now.
Can we maybe keep the old behavior for offset= 0
?
We can branch on the offset=?
case and have a loop for both cases.
Or don't we hit this path on the offset=0
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, thanks - yes, but only if the window is closed on the left (if it's open on the left, then time[i]
shouldn't be included in the group)
@@ -398,7 +404,21 @@ pub(crate) fn groupby_values_iter_full_lookahead( | |||
|
|||
let b = Bounds::new(lower, upper); | |||
|
|||
debug_assert!(i < time.len()); | |||
// find starting point of window | |||
match inner_loop_needed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still adds a branch.
Let's do the branch completely before iter().enumerate()
. That makes all paths optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so something like
match inner_loop_needed {
true => {
time[start_offset..upper_bound]
.iter()
.enumerate()
.map(move |(mut i, lower)| {
i += start_offset;
let lower = add(&offset, *lower, tz.as_ref())?;
let upper = add(&period, lower, tz.as_ref())?;
let b = Bounds::new(lower, upper);
// find starting point of window
for &t in &time[i..] {
if b.is_member(t, closed_window) {
break;
}
i += 1;
}
if i >= time.len() {
return Ok((i as IdxSize, 0));
}
let slice = unsafe { time.get_unchecked(i..) };
let len = slice.partition_point(|v| b.is_member(*v, closed_window));
Ok((i as IdxSize, len as IdxSize))
})
}
false => time[start_offset..upper_bound]
.iter()
.enumerate()
.map(move |(mut i, lower)| {
i += start_offset;
let lower = add(&offset, *lower, tz.as_ref())?;
let upper = add(&period, lower, tz.as_ref())?;
let b = Bounds::new(lower, upper);
debug_assert!(i < time.len());
let slice = unsafe { time.get_unchecked(i..) };
let len = slice.partition_point(|v| b.is_member(*v, closed_window));
Ok((i as IdxSize, len as IdxSize))
}),
}
}
?
This would give
= note: expected struct `std::iter::Map<std::iter::Enumerate<std::slice::Iter<'_, _>>, [closure@/home/marcogorelli/polars-dev/polars/polars-time/src/windows/groupby.rs:401:22: 401:43]>`
found struct `std::iter::Map<std::iter::Enumerate<std::slice::Iter<'_, _>>, [closure@/home/marcogorelli/polars-dev/polars/polars-time/src/windows/groupby.rs:427:18: 427:39]>`
= note: no two closures, even if identical, have the same type
= help: consider boxing your closure and/or using it as a trait object
help: you could change the return type to be a boxed trait object
|
381 | ) -> Box<dyn Iterator<Item = PolarsResult<(IdxSize, IdxSize)>> + TrustedLen + '_> {
| ~~~~~~~ +
help: if you change the return type to expect trait objects, box the returned expressions
|
398 ~ Box::new(time[start_offset..upper_bound]
399 | .iter()
...
421 | Ok((i as IdxSize, len as IdxSize))
422 ~ }))
423 | }
424 ~ false => Box::new(time[start_offset..upper_bound]
425 | .iter()
...
438 | Ok((i as IdxSize, len as IdxSize))
439 ~ })),
|
, is that OK? Wouldn't the perf implications of using Box
be worse than just matching a boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, then we can do the branch on the caller location. Meaning we split into 2 functions. We already have done this earlier. We have a full_lookahead
, a partial_lookbehind
and a full_lookbehind
. So now we need 1 extra, I believe?
Thanks @MarcoGorelli. Looking great now. 👍 |
cool, thanks for your review! |
…n offset was positive (pola-rs#9082)
There's room for optimisation, but this at least closes #9077 (besides, rolling windows looking into the future don't strike me as a common use-case)