-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): Add ddof option to rolling_var and rolling_std #8957
Conversation
@@ -12,9 +12,9 @@ impl< | |||
T: NativeType + IsFloat + std::iter::Sum + AddAssign + SubAssign + Div<Output = T> + NumCast, | |||
> RollingAggWindowNoNulls<'a, T> for MeanWindow<'a, T> | |||
{ | |||
fn new(slice: &'a [T], start: usize, end: usize) -> Self { | |||
fn new(slice: &'a [T], start: usize, end: usize, params: Option<RollingFnParams>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a Box<dyn Any>
. Then every function can downcast to its specific arguments type. In this PR's case no function will do that except for var
and std
.
If other functions need different arguments they can pass their own type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a look. I'm relatively new to Rust so what makes that better than doing it this way?
To be extra clear, the type definition is
pub enum RollingFnParams {
RollingVarParams { ddof: u8 },
}
and I figured other variants could be added later but that explicitly defining what each function required was clearer and could be handled by a match where appropriate while preventing arbitrary input from going to these functions.
With the way you're suggesting, would I make RollingVarParams
and any future param sets their own types? Mostly I'm curious about what issues you see here so I can learn something. Is it just pointless to define a type this vague?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On mobile:
Ddof doesn't make any sense to pass to other rolling function like a rolling sum. That's why I propose using dyn Any. Then every function can only accept its own parameter type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for replying so quickly.
In my limited understanding, dyn Any would have the same problems as what I'm doing but even worse. Nothing would stop someone from passing anything at all to a rolling function. With the way it is now, at least there are defined enum variants for what kind of parameters each rolling function should accept and they have to be set ahead of time. Yeah, you could pass MeanParams (if they existed) to variance or VarParams to mean but it's up to the individual kernel to decide what to do just like with dyn Any. Is it better because I can then give each window an associated type through it's trait and cast the Any to that directly beforehand?
The only alternative I can think of that doesn't leave it up to the window itself is to have a separate type for parameters to each rolling function and put a generic type parameter on RollingOptions
. But even then I think the function parameter types would have to be related somehow so they can pass through rolling_agg
and have the same signature. And I don't know if this would work and it seems complicated.
I'm not trying to give you a hard time here. Only trying to learn a little. It doesn't seem like that big of a change to do it with Any so I'm happy to make the change if that's what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that the Box<dyn Any>
is in our private API. Not what we expose publicly for the user facing API.
The users will pass RollingVarArgs
or a RollingMeanArgs
and we will convert it to a Box<Any>
. As I said, I don't want arguments needed for variance being passed to rolling_mean
that doesn't make much sense, so we need some dynamic typing, hence the dyn Any
. Think of it as **kwargs
in python, where a specific implemenation of a function knows which keyword arguments to take out of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like Box<dyn Any>
is even going to work here, at least if we're passing the params down to the actual implementation through RollingOptions
and friends because those types derive Clone and the size of a Box<dyn Any>
isn't known at compile time. At least that's the error I'm getting. Maybe you can get around it with Rc
or Arc
but after looking into it I think ultimately it confuses the issue and still ends up being worse than what I'm proposing.
- We have to get the additional function parameters all the way down many layers of indirection. Since it seems to make the most sense to put these in
RollingOptions
they have to be wrapped in something so everything typechecks. There's also various dispatchers likerolling_agg
which expect specific signatures of the functions they are passed. RollingAggWindow
also needs to accept the same type for each operation, so whatever parameters can only be unwrapped here. Whether you have an any or a generic type representing rolling args, this is the only place you can really check that you have what you need. And it's easier to check that you have the right enum variant than messing around with Any.- We do know ahead of time what sets of arguments are valid for which rolling operations. The enum forces us to be clear about it. Yes, it's internal, and that's why the user-level functions will just take individual arguments, construct the appropriate variant and pass it along.
- It's even recommended in the Rust book to use this pattern to group types together when you know all of the possibilities ahead of time.
So far I haven't seen anything to make me think that what you're proposing is better, or even easier to implement.
But if you really want to do it with run-time dynamic typing rather than just defining a few types of our own, I can give it a try because I do want to help and I believe in this project.
Finally, as a concrete example of how this would work with multiple types, you could have the following, which I believe would let the rolling quantile functions no longer need a special case.
pub enum RollingFnParams {
RollingVarParams { ddof: u8 },
RollingQuantileParams { quantile: f64, interp: QuantileInterpolOptions}
}
Then all you need to do is have new
for each window type do a match for the appropriate variant. If you wanted to allow the rolling windows to take Any
then the handling of different parameters for each rolling function would become far more ad-hoc and error-prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rolling variance args shouldn't be in an enum with rolling quantile args.
What I mean is this:
use std::any::Any;
use std::sync::Arc;
/// Public type
pub struct RollingVarOptions{
ddof: u8
}
/// Public type
pub struct RollingMeanOptions{
mean_args: usize
}
/// Public function
pub fn rolling_var(args: RollingVarOptions) {
let args = Arc::new(args) as Arc<dyn Any>;
rolling_var_private(args)
}
/// Public function
pub fn rolling_mean(args: RollingMeanOptions) {
let args = Arc::new(args) as Arc<dyn Any>;
rolling_var_private(args)
}
/// Private functions
fn rolling_mean_private(args: Arc<dyn Any>) {
let args = args.downcast_ref::<RollingMeanOptions>().unwrap();
// implementation work
}
/// Private functions
fn rolling_var_private(args: Arc<dyn Any>) {
let args = args.downcast_ref::<RollingVarOptions>().unwrap();
// implementation work
}
This should work perfectly and have clear single function structs for the end user. It also reduces binary bloat which is important in this crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it took a little fiddling around with Any
but it seems to do what it's supposed to now.
A few tests started failing due to numerical precision issues. They're being compared exactly, seemingly as a hack around pytest not being able to handle None
in numeric outputs (though it supports NaN
...). I could either change the values to what they are now or change the tests to check None
and numerical precision of floats separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it took a little fiddling around with Any but it seems to do what it's supposed to now.
Nice!
I could either change the values to what they are now or change the tests to check None and numerical precision of floats separately.
No preference on my side. It is up to you. :)
faecd9a
to
37e4011
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great @magarick. I have a small nit that could make the function signature a little bit easier to grok.
@@ -12,9 +12,9 @@ impl< | |||
T: NativeType + IsFloat + std::iter::Sum + AddAssign + SubAssign + Div<Output = T> + NumCast, | |||
> RollingAggWindowNoNulls<'a, T> for MeanWindow<'a, T> | |||
{ | |||
fn new(slice: &'a [T], start: usize, end: usize) -> Self { | |||
fn new(slice: &'a [T], start: usize, end: usize, params: Option<Arc<dyn Any + Sync + Send>>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can clean up the signature a little bit by defining:
type DynArgs = Option<Arc<dyn Any + Sync + Send>>
2bf1854
to
bc375ac
Compare
OK. I added the new type and tidied things up. I'm not sure what to do about the doctests failing. The output doesn't look wrong and I didn't change that. Also how about the "too many arguments" complaints from rust. I can't really do anything about that since this PR was meant to add new arguments... |
You can mark the violating function with
These are fixed in main. A rebase should be sufficient. |
There's also the question of what to do if |
Other than the above question about how to handle edge cases where people end up with windows shorter than ddof I think this is ready. Any opinions on how to handle or should we just ignore it for now, assuming this is unlikely to be an issue? |
Can we follow numpy and also return Inf in that case? |
If we guarantee that the functions can only take floats then yes. It looks like int inputs get cast to floats anyway. So if you're ok adding the |
Yes, that's ok. |
In that case, I think this will do it. |
f2f7399
to
bf279ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @magarick. Looks great. 👍
This is a first attempt at adding a
ddof
parameter torolling_var
/rolling_std
as per #7999To do so, it introduces rolling function parameters that get passed around. I'm not sure this is the greatest way to do it...