Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(rust, python): Remove old cut/qcut #9763

Merged
merged 5 commits into from
Jul 11, 2023
Merged

Conversation

magarick
Copy link
Contributor

@magarick magarick commented Jul 7, 2023

As discussed, this removes the old cut and qcut. It also changes the default behavior to return Series instead of DataFrames but lets the new functions return "old style" data as well. Finally, it makes NaNs bin to null instead of accidentally putting them in a bin.

One thing to note is that the new algorithm isn't parallel yet but it could (and should) be. I tried a few things but couldn't get it to work.

@github-actions github-actions bot added python Related to Python Polars refactor Code improvements that do not change functionality rust Related to Rust Polars labels Jul 7, 2023
@ritchie46
Copy link
Member

It also changes the default behavior to return Series instead of DataFrames but lets the new functions return "old style" data as well.

That would be a breaking change. It should return a DataFrame for the Series.cut/ Series.qcut variant.

@magarick
Copy link
Contributor Author

It also changes the default behavior to return Series instead of DataFrames but lets the new functions return "old style" data as well.

That would be a breaking change. It should return a DataFrame for the Series.cut/ Series.qcut variant.

For python I could change the default back but honestly returning a data frame wasn't the right thing to do in the first place. On the rust side though I'm not sure how feasible it is since the underlying functions now return a series. Do you want replacement cut and cut functions added back to the "Algo" crate that just mimic the old format and arguments (and possibly warn people not to use them)?

@ritchie46
Copy link
Member

I am talking on the python side. On the rust side it is fine, we only need an expression on the rust side.

For python I could change the default back but honestly returning a data frame wasn't the right thing to do in the first place.

I think a DataFrame is easier to grok than a struct. Conversion between the two is free anyway. So I think for a Series it is a good default. For an expression it isn't possible and the struct is good.

@magarick
Copy link
Contributor Author

I am talking on the python side. On the rust side it is fine, we only need an expression on the rust side.

For python I could change the default back but honestly returning a data frame wasn't the right thing to do in the first place.

I think a DataFrame is easier to grok than a struct. Conversion between the two is free anyway. So I think for a Series it is a good default. For an expression it isn't possible and the struct is good.

For the series it does convert back to a DataFrame and never returns a struct. Only a categorical series or DF. I think changing the default is ok though since the function's documentation always said it was experimental and subject to change. The new default is more likely what people are looking for too.

@ritchie46
Copy link
Member

Ok, no strong opinion on my part. Going in, thanks!

@ritchie46 ritchie46 merged commit f5f0630 into pola-rs:main Jul 11, 2023
@magarick magarick deleted the more-cut branch July 12, 2023 20:28
@gkns1
Copy link

gkns1 commented Jul 13, 2023

@ritchie46 You were right about this PR being a breaking change.
Even if cut was experimental, In the future could changes like that be marked as breaking or highlighted in release notes?
Polars is awesome, but patches with breaking changes make it hard to adopt and keep up with new releases. Thanks!

@magarick
Copy link
Contributor Author

@gkns1 It should be able to work almost identically to the old version if you want it to. Is there something in particular you need?

@gkns1
Copy link

gkns1 commented Jul 13, 2023

It's just the fact that it now returns a series by default instead of dataframe. It's an easy fix, but I had to first find out what happened when a build was failing. Failing builds is a risk I'm willing to take by using very new software, but a flag/highlight in release notes would help find it much faster.

@ritchie46
Copy link
Member

It's just the fact that it now returns a series by default instead of dataframe. It's an easy fix, but I had to first find out what happened when a build was failing. Failing builds is a risk I'm willing to take by using very new software, but a flag/highlight in release notes would help find it much faster.

Yes, you are right. Sorry about that. Somehow forgot the breaking flag.

c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
@lidh15
Copy link

lidh15 commented Aug 8, 2023

May I have a question that where cut/qcut are now?
previously we use use polars_algo::{cut, qcut}; and now they were removed, what should I do after upgrading polars to 0.31.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Related to Python Polars refactor Code improvements that do not change functionality rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants