Fix issue 108 narwhals pandas polars support #130

artiom-matvei · 2024-10-18T21:58:24Z

I might also change the tests import structure to not use a relative import path like in this SO post: https://stackoverflow.com/a/16985066/6916564

vincentarelbundock · 2024-10-21T15:42:15Z

@artiom-matvei looks like you are modifying a lot of files. Have you concluded that this is possible/easy/desirable?

artiom-matvei · 2024-10-21T16:23:59Z

I started with predictions() but then it required some small changes in other parts, and then other parts so it became a pretty big change.

To be honest, I don't think it is desirable as we can simply convert everything to polars as was done before.

Also, the user interacts with us by passing a dataframe only at the entry, if we want to pass him the same dataframe type, we can convert our dataframes at the exit.

Furthermore, the current version feels pretty patchy and would need additional refactoring to improve the implementation. @vincentarelbundock

vincentarelbundock · 2024-10-21T19:28:08Z

That makes sense.

Would it be possible to implement a more lightweight version of this: Use narhwals to convert the data frame to Polars on entry.

We would still do everything in Polars internally, and we would still return a class that inherits from Polars data frame.

The benefit of this would be that we wouldn't have to require both Polars and Pandas as dependencies. Pandas is very heavy, with lots of compiled code, etc. In contrast, narwhals is 0-dep, so it would be nice to use it as a thin ingestion layer.

Another benefit would be that this would let us accept other data frame times automatically, including DuckDB, PyArrow, etc.

artiom-matvei · 2024-10-22T17:09:34Z

Narwhals cannot be used to convert between dataframe types unfortunately.

How it removes dependencies to other dataframes is by

using a common API
at runtime, translate common API operations to dataframe package-specific operations

It is not a compatibility layer between different types of dataframes.

See discussion on discord

The way to go would be to convert the whole project to use nw but this can be made in steps

vincentarelbundock · 2024-10-22T17:19:47Z

We only need narwhals to convert in one direction: Any Format -> Polars.

And narwhals can absolutely do that. I'm just thinking about writing a simple ingest() helper, and using that to convert any user-supplied DF to Polars.

In the example below, I create data frames in 3 formats: Polars, Pandas, and DuckDB. Then, I use narwhals to convert each of them to identical Polars DFs:

import narwhals as nw
import pandas as pd
import polars as pl
import duckdb
df_polars = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)
df_duckdb = duckdb.sql("SELECT * FROM df_polars")
df_pandas = df_polars.to_pandas()

@nw.narwhalify
def ingest(df):
  return pl.DataFrame(df.to_arrow())

ingest(df_polars)
ingest(df_pandas)
ingest(df_duckdb)

artiom-matvei · 2024-10-22T17:33:14Z

Your are right. We can improve by doing like what MarcoGorelli suggested:
yup - you don't need to use df.to_arrow() explicitly btw, it would be more efficient to do

def ingest(df):
    return nw.from_arrow(df, native_namespace=pl).to_native()

, then it'll use the PyCapsule Interface (and, depending on the input, might not need to go via PyArrow)

vincentarelbundock · 2024-10-22T17:37:34Z

Sounds great. As long as it works with all the data frame formats... Are there any non-Arrow ones where this fails?

vincentarelbundock · 2024-10-22T17:50:20Z

Your function does not seem to work on DuckDB

artiom-matvei added 2 commits October 18, 2024 15:37

doc style homogenizing comparisons.py

bce44ae

wip issue in uncertainty during narwhals migration

4effc0f

artiom-matvei force-pushed the fix_issue_108_narwhals_pandas_polars_support branch from d341df9 to 4effc0f Compare October 18, 2024 22:01

artiom-matvei added 3 commits October 19, 2024 15:45

narwhals and relative import modif

5c460d2

wip

7501f6c

wip nw migration with tests pass

fbbb9a9

artiom-matvei force-pushed the fix_issue_108_narwhals_pandas_polars_support branch from ef0699b to 71d0a6e Compare October 21, 2024 15:39

artiom-matvei force-pushed the fix_issue_108_narwhals_pandas_polars_support branch from 71d0a6e to 0719887 Compare October 21, 2024 15:47

Merge branch 'main' into fix_issue_108_narwhals_pandas_polars_support

9306fce

artiom-matvei force-pushed the fix_issue_108_narwhals_pandas_polars_support branch from 0719887 to 9306fce Compare October 23, 2024 00:20

vincentarelbundock closed this Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 108 narwhals pandas polars support #130

Fix issue 108 narwhals pandas polars support #130

artiom-matvei commented Oct 18, 2024 •

edited

Loading

vincentarelbundock commented Oct 21, 2024

artiom-matvei commented Oct 21, 2024

vincentarelbundock commented Oct 21, 2024

artiom-matvei commented Oct 22, 2024 •

edited

Loading

vincentarelbundock commented Oct 22, 2024

artiom-matvei commented Oct 22, 2024 •

edited

Loading

vincentarelbundock commented Oct 22, 2024

vincentarelbundock commented Oct 22, 2024

Fix issue 108 narwhals pandas polars support #130

Fix issue 108 narwhals pandas polars support #130

Conversation

artiom-matvei commented Oct 18, 2024 • edited Loading

vincentarelbundock commented Oct 21, 2024

artiom-matvei commented Oct 21, 2024

vincentarelbundock commented Oct 21, 2024

artiom-matvei commented Oct 22, 2024 • edited Loading

vincentarelbundock commented Oct 22, 2024

artiom-matvei commented Oct 22, 2024 • edited Loading

vincentarelbundock commented Oct 22, 2024

vincentarelbundock commented Oct 22, 2024

artiom-matvei commented Oct 18, 2024 •

edited

Loading

artiom-matvei commented Oct 22, 2024 •

edited

Loading

artiom-matvei commented Oct 22, 2024 •

edited

Loading