Support: `formulaic`, `scikit-learn`, and matrix input #35

vincentarelbundock · 2023-09-18T03:23:06Z

https://github.com/matthewwardrop/formulaic

Probably need another argument for the formula used to create y and X in scikit-learn

The text was updated successfully, but these errors were encountered:

vincentarelbundock · 2023-09-18T03:41:09Z

import pandas
import polars as pl
from formulaic import model_matrix
from sklearn.linear_model import LinearRegression

df = pl.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/causaldata/thornton_hiv.csv")

y, X = model_matrix("got ~ distvct + tinc * age", df.to_pandas())

lr = LinearRegression()
lr.fit(X, y)

X.model_spec.variables

X.model_spec.formula

vincentarelbundock · 2023-09-18T13:33:17Z

Do we care about this since there are no standard errors in scikit?

vincentarelbundock · 2024-11-10T03:23:44Z

https://search.app?link=https%3A%2F%2Framhiser.com%2Fpost%2F2018-04-16-building-scikit-learn-pipeline-with-pandas-dataframe%2F&utm_campaign=aga&utm_source=agsadl1%2Cagsadl3%2Csh%2Fx%2Fgs%2Fm2%2F4

artiom-matvei · 2024-11-10T20:09:46Z

Is this to add support for models from scikit-learn?
For something like:

############## Important line is the last one
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import numpy as np
import seaborn as sns
from marginaleffects import *

# Set seed for reproducibility
np.random.seed(123)

# Load and preprocess the data
penguins = sns.load_dataset('penguins')
data = penguins[['species', 'bill_length_mm', 'bill_depth_mm']].dropna()
data['species'] = data['species'].astype('category')

# Scale the features
scaler = StandardScaler()
data[['bill_length_mm', 'bill_depth_mm']] = scaler.fit_transform(data[['bill_length_mm', 'bill_depth_mm']])

# Prepare features and target
X = data[['bill_length_mm', 'bill_depth_mm']].values
y = data['species'].cat.codes  # Convert categories to numeric codes

# Map species to codes
species_mapping = dict(zip(data['species'].cat.categories, range(len(data['species'].cat.categories))))
print("Species mapping:", species_mapping)

# Fit the multinomial logistic regression model
model_py = LogisticRegression(multi_class='multinomial', solver='lbfgs', C=1e10, fit_intercept=True, random_state=123, max_iter=1000)
model_py.fit(X, y)

############## Important line is the last one
predictions(model_py)

vincentarelbundock · 2024-11-10T21:04:25Z

Yes, that's right. The idea would be to write a new model class similar to this: https://github.com/vincentarelbundock/pymarginaleffects/blob/main/marginaleffects/model_pyfixest.py

With these differences:

To instantiate the model class, the user has to supply a Scikit Learn pipeline object that accepts a data frame and returns two matrices: y and X.
When instantiated, it fits a given model.
The get_predict() method then takes a newdata, puts it through the data preparatation pipeline, then makes predictions for that X.

You could give this a shot if you want. I think this is a really fun one.

vincentarelbundock · 2024-12-18T16:51:29Z

Done here: #145

vincentarelbundock changed the title ~~Support: formulaic and scikit-learn~~ Support: formulaic, scikit-learn, and matrix input Oct 27, 2024

vincentarelbundock mentioned this issue Nov 10, 2024

Specifying model without statsmodels.formulas.api seems to not work in both pandas and polars #140

Closed

vincentarelbundock closed this as completed Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support: `formulaic`, `scikit-learn`, and matrix input #35

Support: `formulaic`, `scikit-learn`, and matrix input #35

vincentarelbundock commented Sep 18, 2023

vincentarelbundock commented Sep 18, 2023

vincentarelbundock commented Sep 18, 2023 •

edited

Loading

vincentarelbundock commented Nov 10, 2024

artiom-matvei commented Nov 10, 2024

vincentarelbundock commented Nov 10, 2024

vincentarelbundock commented Dec 18, 2024

Support: formulaic, scikit-learn, and matrix input #35

Support: formulaic, scikit-learn, and matrix input #35

Comments

vincentarelbundock commented Sep 18, 2023

vincentarelbundock commented Sep 18, 2023

vincentarelbundock commented Sep 18, 2023 • edited Loading

vincentarelbundock commented Nov 10, 2024

artiom-matvei commented Nov 10, 2024

vincentarelbundock commented Nov 10, 2024

vincentarelbundock commented Dec 18, 2024

Support: `formulaic`, `scikit-learn`, and matrix input #35

Support: `formulaic`, `scikit-learn`, and matrix input #35

vincentarelbundock commented Sep 18, 2023 •

edited

Loading