-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandera dataframe in Pydantic model .dict() and .json() compatability #966
Comments
hi @derinwalters this is currently unexplored territory, would appreciate clarification on the use cases here. For the I suspect once Are you familiar how to create custom pydantic types? How does one extend a type within a |
so looking at pydantic docs, this will work: from devtools import debug
import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series
import pydantic
class SimpleSchema(pa.SchemaModel):
str_col: Series[str] = pa.Field(unique=True)
class PydanticModel(pydantic.BaseModel):
x: int
df: DataFrame[SimpleSchema]
class Config:
json_encoders = {
pd.DataFrame: lambda x: x.to_dict(orient="records")
}
valid_df = pd.DataFrame({"str_col": ["hello", "world"]})
myinst = PydanticModel(x=1, df=valid_df)
debug(myinst)
debug(myinst.dict())
debug(myinst.json()) Output:
The |
@cosmicBboy thank you so much for your suggestion. Leveraging the Config json_encoders seems like just the thing. I will give this a try and report back. The use case is a hierarchical data class that I store in MongoDB and process locally. Recently I transitioned from a monolithic Pandas dataframe to lists of Pydantic class dictionaries where I convert to Pandas for manipulation. However, this incurs extra to-from conversion cost that never really seemed ideal. I don't remember exactly how, but last week I stumbled across Pandera and thought to myself "this is exactly what I was looking for!" and so here I am kicking the tires. |
Yep! this is pretty much the reason I built pandera, though at the time I wasn't aware of pydantic and was doing the same thing with the schema library. |
I think the proposed solution works well enough for what I was asking. Thanks! I'm having a bit of trouble though with figuring out how to properly validate columns of list-like and dictionary-like elements, which is rather straightforward in a pydantic by row approach, and will continue working on that. Looks like you're also already working on providing a default value option on #502, which is great. |
Great!
There's this issue #260, but for now I'd recommend custom checks class SimpleSchema(pa.SchemaModel):
list_col: Series[object]
dict_col: Series[object]
@pa.check("list_col")
def check_list(cls, series):
return series.map(lambda x: isinstance(x, list)) # check any other property about this column
@pa.check("dict_col")
def check_list(cls, series):
return series.map(lambda x: isinstance(x, dict)) # check any other property about this column |
In reading through the Pandera documentation, it's not clear to me how to intermingle Pandera dataframes within a Pydantic model and still be able to use .dict() and .json() methods successfully. I followed the steps on https://pandera.readthedocs.io/en/stable/pydantic_integration.html#using-pandera-schemas-in-pydantic-models and love how seamless it is. However, the .dict() method keeps the Pandera type and .json() fails altogether. The solution provided by Pandera's to_format is close, but I want to keep the validated dataframe intact while I perform operations then convert format later (not right away). Is there a way to do this?
The text was updated successfully, but these errors were encountered: