-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(rust, python): Remove old cut/qcut #9763
Conversation
That would be a breaking change. It should return a |
For python I could change the default back but honestly returning a data frame wasn't the right thing to do in the first place. On the rust side though I'm not sure how feasible it is since the underlying functions now return a series. Do you want replacement cut and cut functions added back to the "Algo" crate that just mimic the old format and arguments (and possibly warn people not to use them)? |
I am talking on the python side. On the rust side it is fine, we only need an expression on the rust side.
I think a |
For the series it does convert back to a DataFrame and never returns a struct. Only a categorical series or DF. I think changing the default is ok though since the function's documentation always said it was experimental and subject to change. The new default is more likely what people are looking for too. |
Ok, no strong opinion on my part. Going in, thanks! |
@ritchie46 You were right about this PR being a breaking change. |
@gkns1 It should be able to work almost identically to the old version if you want it to. Is there something in particular you need? |
It's just the fact that it now returns a series by default instead of dataframe. It's an easy fix, but I had to first find out what happened when a build was failing. Failing builds is a risk I'm willing to take by using very new software, but a flag/highlight in release notes would help find it much faster. |
Yes, you are right. Sorry about that. Somehow forgot the breaking flag. |
May I have a question that where |
As discussed, this removes the old cut and qcut. It also changes the default behavior to return Series instead of DataFrames but lets the new functions return "old style" data as well. Finally, it makes NaNs bin to null instead of accidentally putting them in a bin.
One thing to note is that the new algorithm isn't parallel yet but it could (and should) be. I tried a few things but couldn't get it to work.