-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't exclude datetime[ns] by targeting pl.DateTime #5300
Comments
Selecting has the same issue: import pandas as pd
import polars as pl
(
pl.from_pandas(pd.date_range("2021-01-01", "2021-01-02"))
.to_frame()
.with_column(pl.lit(1))
.select(pl.col(pl.Datetime))
)
# Empty dataframe |
The default is >>> pl.Series(name="foo", values=[], dtype=pl.Datetime)
shape: (0,)
Series: 'foo' [datetime[μs]]
[
] it returns However, converting from Pandas results in a different time unit >>> pl.from_pandas(pd.date_range("2021-01-01", "2021-01-02"))
shape: (2,)
Series: '' [datetime[ns]]
[
2021-01-01 00:00:00
2021-01-02 00:00:00
] and hence the exclude does not pick up this type. You can fix your first example by doing: import pandas as pd
import polars as pl
(
pl.from_pandas(pd.date_range("2021-01-01", "2021-01-02"))
.to_frame()
.with_column(pl.lit(1))
.select(pl.all().exclude(pl.Datetime("ns"))) # <<<--- note the "ns" here!
) |
Ah! That's interesting! Hm. I think I need to sleep on whether I'm fine with this the way it is 😅 |
Having slept on this, I think that it is currently quite difficult to exclude all datetime columns. Given a situation where you don't know the incoming datetime, I have to either exhaustively drop all combinations of The alternative is to target all other dtypes, and
Or we could specify I prefer some variation of this second suggestion. |
The motivation is that I have several |
You can provide a list of data types, so for the three time units, you could easily check: select(pl.all().exclude([pl.Datetime("ms"), pl.Datetime("ns"), pl.Datetime("us")])) That would be unwieldly with all possible timezones, although I guess in practice you won't have a mixture of timezones (or at least I hope so). Alternatively, if you are ok with determining this in eager mode: non_datetimes = [c for c, tp in df.schema.items() if not isinstance(tp, pl.Datetime)] I do agree neither of these solutions is very elegant. |
I feel excluding all |
I have some ideas here... should have some time in the evening over the next few days to take a look (and certainly at the weekend if things get busy during the week). Agree that there needs to be a simple way to match "all Datetimes" - or, equally, "all Durations". |
I was also thinking about accepting a wildcard |
I agree, but then we have to change Edit: just seen the response by @ritchie46 : would be even better if we can not have the wildcard, it is additional syntax and thing to remember. But not sure if it is feasible. |
We would only set that wildcard in the exclude logic if no timezone is given. A user does not have to know. |
I think this issue should be closed due to inactivity |
FYI: following #6425 you can now exclude all from polars.datatypes import DATETIME_DTYPES
df.select( pl.exclude(DATETIME_DTYPES) ) ...though wildcard support for timezones is still outstanding. The various "official" dtype groups available are:
|
Leaving this open for the time being with the suggestion being to be able to replace this: from polars.datatypes import DATETIME_DTYPES
df.select( pl.exclude(DATETIME_DTYPES) ) with df.select(pl.exclude(pl.Datetime("*")) |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
I was trying to drop a datetime[ns] column today, so I did
pl.all().exclude(pl.Datetime)
, but this does not drop it. It does drop datetime[us] columnsReproducible example
Expected behavior
Works for the regular polars-created type
Installed versions
The text was updated successfully, but these errors were encountered: