-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 108 narwhals pandas polars support #130
Fix issue 108 narwhals pandas polars support #130
Conversation
d341df9
to
4effc0f
Compare
ef0699b
to
71d0a6e
Compare
@artiom-matvei looks like you are modifying a lot of files. Have you concluded that this is possible/easy/desirable? |
71d0a6e
to
0719887
Compare
I started with To be honest, I don't think it is desirable as we can simply convert everything to polars as was done before. Also, the user interacts with us by passing a dataframe only at the entry, if we want to pass him the same dataframe type, we can convert our dataframes at the exit. Furthermore, the current version feels pretty patchy and would need additional refactoring to improve the implementation. @vincentarelbundock |
That makes sense. Would it be possible to implement a more lightweight version of this: Use We would still do everything in Polars internally, and we would still return a class that inherits from Polars data frame. The benefit of this would be that we wouldn't have to require both Polars and Pandas as dependencies. Pandas is very heavy, with lots of compiled code, etc. In contrast, Another benefit would be that this would let us accept other data frame times automatically, including DuckDB, PyArrow, etc. |
Narwhals cannot be used to convert between dataframe types unfortunately. How it removes dependencies to other dataframes is by
It is not a compatibility layer between different types of dataframes. The way to go would be to convert the whole project to use nw but this can be made in steps |
We only need And In the example below, I create data frames in 3 formats: Polars, Pandas, and DuckDB. Then, I use import narwhals as nw
import pandas as pd
import polars as pl
import duckdb
df_polars = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)
df_duckdb = duckdb.sql("SELECT * FROM df_polars")
df_pandas = df_polars.to_pandas()
@nw.narwhalify
def ingest(df):
return pl.DataFrame(df.to_arrow())
ingest(df_polars)
ingest(df_pandas)
ingest(df_duckdb) |
Your are right. We can improve by doing like what MarcoGorelli suggested:
, then it'll use the PyCapsule Interface (and, depending on the input, might not need to go via PyArrow) |
Sounds great. As long as it works with all the data frame formats... Are there any non-Arrow ones where this fails? |
Your function does not seem to work on DuckDB |
0719887
to
9306fce
Compare
I might also change the tests import structure to not use a relative import path like in this SO post: https://stackoverflow.com/a/16985066/6916564