Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

narwhals = Pandas + polars + ... #533

Open
juanitorduz opened this issue Jul 1, 2024 · 8 comments
Open

narwhals = Pandas + polars + ... #533

juanitorduz opened this issue Jul 1, 2024 · 8 comments

Comments

@juanitorduz
Copy link
Contributor

Use https://github.com/narwhals-dev/narwhals to support pandas and polars!

This seems to be a very cool alternative to support various backends. See for example koaning/scikit-lego#671

@MarcoGorelli
Copy link

Hey, just wanted to stop by and say - thanks for your interest! Feel free to book some time on https://calendly.com/marcogorelli if you'd like to chat about how Narwhals could help PyFixest

@s3alfisc
Copy link
Member

s3alfisc commented Jul 2, 2024

Hi both (@MarcoGorelli and @juanitorduz) - I've now thought about it for 15 minutes and I think narwhals might be a great solution for PyFixest! Thanks for offering to chat @MarcoGorelli , I'll book an appointment =)

Just some background on pyfixest and how it works with Data Frames: most of the data manipulation happens via the formulaic library, which requires an input pd.DataFrame. I.e. a usual flow looks like this:

%load_ext autoreload
%autoreload 2

import polars as pl
import pandas as pd
import pyfixest as pf

from formulaic import model_matrix
import narwhals as nw

data = pl.DataFrame(pf.get_data())

def feols(data):

    if isinstance(data, pl.DataFrame):
        data = data.to_pandas()

    # model_matrix requires a pandas DataFrame and returns a pandas DataFrame
    Y, X = model_matrix("Y ~ X1", data = data, output = "pandas")

    # some more pandas manipulations
    Y.dropna(inplace = True)
    X.dropna(inplace = True)

    return Y.to_numpy(), X.to_numpy()

Via narwhals, it could look as

def feols_nw(data, use_polars = False):

    data = nw.from_native(data)

    # model_matrix requires a pandas DataFrame and returns a pandas DataFrame
    Y, X = model_matrix("Y ~ X1", data = data.to_pandas(), output = "pandas")

    if use_polars:
        # another copy? potentially costly? 
        Y = nw.from_native(Y)
        X = nw.from_native(X)

    # some more pandas manipulations
    Y.dropna(inplace = True)
    X.dropna(inplace = True)

    return Y.to_numpy(), X.to_numpy()

@MarcoGorelli
Copy link

Hey! Thanks for your explanation - if formulaic requires specifically pandas input/output, and then that might be a good candidate for Narwhalification :) I'll take a look, thanks!

    # another copy? potentially costly? 
   Y = nw.from_native(Y)

Just to clarify, from_native just wraps a dataframe in a narwhals.DataFrame - it's a virtually free operation, only takes a few microseconds, and doesn't do any copies - Narwhals only translates syntax

@juanitorduz
Copy link
Contributor Author

Naive question: It seems formulaic supports pyarrow.Table. Could this be a shortcut for Polars integration? https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.to_arrow.html

@MarcoGorelli
Copy link

totally!

@baggiponte
Copy link
Contributor

Ciao! I am not sure if I have the time, but I'd be glad to support this narwhalification (h/t @FBruzzesi too).

@s3alfisc
Copy link
Member

That would be amazing @baggiponte! One key step for this would be this PR to be merged into formulaic, and then we'd mostly have to port code from pandas to narwhals in the model_matrix function =)

@s3alfisc
Copy link
Member

There's a nice new vignette in the narwhals docs on how to handle data frame conversion via narwhals without having to rewrite the entire pandas backend. Maybe we could start by using narwhals to handle the pandas to polars conversion (if users supply polars data frames)? If we additionally rewrite the methods in the FeolsCompressed class from polars to narwhals, we could then drop the polars and pyarrow dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants