You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the pandas_dtype_strategies function in #314 doesn't handle categorical data types. To be feature-complete, we'd want to support this, with the caveat that pandera doesn't currently support PandasDtype enums with additional metadata, such as the CategoryDtype with categories and ordered information.
Describe the solution you'd like
When constructing a field_element_strategy, programmatically fetch any Check.isin checks and get the allowed_values fields from those checks. Try to infer the pandas datatype of the underlying categorical value:
categories= []
category_pandas_dtype=Noneifpandas_dtype.is_category:
forcheckinchecks:
ifcheckisCheck.isin:
# get categories and infer pandas dtype# and remove isin checks from the checks listelements=pandas_dtype_strategy(category_pandas_dtype, categories=categories)
...
It seems like pandera could generate strategies using these annotations, and error as it does now if there's no categories annotation. It does seem like we'd have to contribute to hypothesis first perhaps, since their pandas series strategy implementation seems to require being able to create numpy arrays with specific dtype, rather than pandas extension arrays.
Is your feature request related to a problem? Please describe.
Currently, the
pandas_dtype_strategies
function in #314 doesn't handle categorical data types. To be feature-complete, we'd want to support this, with the caveat that pandera doesn't currently supportPandasDtype
enums with additional metadata, such as the CategoryDtype withcategories
andordered
information.Describe the solution you'd like
When constructing a
field_element_strategy
, programmatically fetch anyCheck.isin
checks and get theallowed_values
fields from those checks. Try to infer the pandas datatype of the underlying categorical value:And then in
pandas_dtype_strategy
:Since series/index/dataframe strategies cast the generated data to the correct data type, this workaround should work for now.
The text was updated successfully, but these errors were encountered: