-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support: formulaic
, scikit-learn
, and matrix input
#35
Comments
import pandas
import polars as pl
from formulaic import model_matrix
from sklearn.linear_model import LinearRegression
df = pl.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/causaldata/thornton_hiv.csv")
y, X = model_matrix("got ~ distvct + tinc * age", df.to_pandas())
lr = LinearRegression()
lr.fit(X, y)
X.model_spec.variables
X.model_spec.formula |
Do we care about this since there are no standard errors in scikit? |
formulaic
and scikit-learn
formulaic
, scikit-learn
, and matrix input
Is this to add support for models from scikit-learn? ############## Important line is the last one
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import numpy as np
import seaborn as sns
from marginaleffects import *
# Set seed for reproducibility
np.random.seed(123)
# Load and preprocess the data
penguins = sns.load_dataset('penguins')
data = penguins[['species', 'bill_length_mm', 'bill_depth_mm']].dropna()
data['species'] = data['species'].astype('category')
# Scale the features
scaler = StandardScaler()
data[['bill_length_mm', 'bill_depth_mm']] = scaler.fit_transform(data[['bill_length_mm', 'bill_depth_mm']])
# Prepare features and target
X = data[['bill_length_mm', 'bill_depth_mm']].values
y = data['species'].cat.codes # Convert categories to numeric codes
# Map species to codes
species_mapping = dict(zip(data['species'].cat.categories, range(len(data['species'].cat.categories))))
print("Species mapping:", species_mapping)
# Fit the multinomial logistic regression model
model_py = LogisticRegression(multi_class='multinomial', solver='lbfgs', C=1e10, fit_intercept=True, random_state=123, max_iter=1000)
model_py.fit(X, y)
############## Important line is the last one
predictions(model_py) |
Yes, that's right. The idea would be to write a new model class similar to this: https://github.com/vincentarelbundock/pymarginaleffects/blob/main/marginaleffects/model_pyfixest.py With these differences:
You could give this a shot if you want. I think this is a really fun one. |
Done here: #145 |
https://github.com/matthewwardrop/formulaic
Probably need another argument for the formula used to create
y
andX
inscikit-learn
The text was updated successfully, but these errors were encountered: