Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill_nan is not a valid aggregation in group_by #11293

Closed
2 tasks done
paladin158 opened this issue Sep 25, 2023 · 2 comments
Closed
2 tasks done

fill_nan is not a valid aggregation in group_by #11293

paladin158 opened this issue Sep 25, 2023 · 2 comments
Labels
accepted Ready for implementation bug Something isn't working python Related to Python Polars

Comments

@paladin158
Copy link

paladin158 commented Sep 25, 2023

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"a": [1, 1, 2, 2], "b": [1, 2, 0, 0]})

df.group_by("a").agg((pl.col("b") / pl.col("b").std()).fill_nan(0).sum())

Log output

The predicate '[(col("b").cast(Float64)) / (col("b").std())].is_not_nan()' in 'when->then->otherwise' is not a valid aggregation and might product a different number of rows than the group_by operation would. This behavior is experimental and may be subject to change

Issue description

I don't quite get why I receive such a warning message. Although the results are correct, I am not sure about the implications of such a warning.

Expected behavior

No warning message.

Installed versions

0.19.3
@paladin158 paladin158 added bug Something isn't working python Related to Python Polars labels Sep 25, 2023
@paladin158 paladin158 changed the title fill_nan is not a valid aggregation fill_nan is not a valid aggregation in group_by Sep 25, 2023
@taozuoqiao
Copy link

I would avoid when->then->otherwise related expressions in the agg context, and prefer something like

(
    df.with_columns(
        pl.col("b").truediv(pl.col("b").std().over("a")).fill_nan(0).alias("b")
    )
    .group_by("a")
    .agg(pl.col("b").sum())
)

@stinodego stinodego added the accepted Ready for implementation label Oct 26, 2023
@github-project-automation github-project-automation bot moved this to Ready in Backlog Oct 26, 2023
@stinodego
Copy link
Contributor

Related to #10055

fill_nan does a when->then->otherwise behind the screens, apparently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working python Related to Python Polars
Projects
Archived in project
Development

No branches or pull requests

4 participants