-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less strict numerical type #466
Comments
currently there's no way of specifying a "number" column since right now pandera adheres to pandas data types (and also in general python doesn't have a generic number type), although with @jeffzi's work on #369 you could make custom datatypes like this. for now I'd recommend specifying a float since floats are a superset of integers. |
oh, I guess another way of doing this would be to specify import pandera as pa
from pandas.api.types import is_number
is_number = pa.Check(lambda s: s.map(is_number), name="is_number")
schema = pa.DataFrameSchema({
"column": pa.Column(checks=is_number)
})
schema(pd.DataFrame({"column": [1,2,"a"]}))
# Output
SchemaError: <Schema Column(name=column, type=None)> failed element-wise validator 0:
<Check is_number>
failure cases:
index failure_case
0 2 a |
We could even have a built-in |
I think we should add a built-in Number type that includes all kinds of integers and floats because we have huge datasets and checks with mapping would not be the best performant case. @cosmicBboy |
The higher-level data types are still TBD, but Number will most likely be one of them In the mean time, the more performant thing to do would be from pandas.api.types import is_numeric_dtype
is_number = pa.Check(is_numeric_dtype, name="is_number")
schema = pa.DataFrameSchema({"column": pa.Column(checks=is_number)})
schema(pd.DataFrame({"column": [1,2,"a"]}))
# Output
SchemaError: <Schema Column(name=column, type=None)> failed series validator 0:
<Check is_number> Not that it won't be as informative an error message (no indication of which element caused the check to fail). |
@cosmicBboy I propose to add enhancement tag to this issue. |
adjusted the tags, PR is welcome after the fix for #369 is done |
@cosmicBboy If adding Number type will take time, could you add a build-in check that can be serializable and suitable for data synthesis? |
hey @quancore you can register checks into the Let me know if you need any help with the strategy implementation! |
After #369 and #559 what is the preferred solution here? Still #466 (comment) or #466 (comment)? |
The second (#466 (comment)) seems most efficient as it uses |
Is there any type that represents a numerical column (includes int, float etc.)?
The text was updated successfully, but these errors were encountered: