pandera not working with pyspark dataframes #1033
Answered
by
cosmicBboy
george-moussa
asked this question in
Q&A
-
I'm trying to use pandera with pyspark dataframe. It says here that it's working. schema = pa.DataFrameSchema({
"state": pa.Column(str),
"city": pa.Column(str),
"price": pa.Column(int, pa.Check.in_range(min_value=5, max_value=20))
})
print(schema(df)) I get error TypeError: expected pd.DataFrame, got <class 'pyspark.sql.dataframe.DataFrame'> Does pandera not support 'pyspark.sql.dataframe.DataFrame'? |
Beta Was this translation helpful? Give feedback.
Answered by
cosmicBboy
Nov 22, 2022
Replies: 1 comment
-
hi @george-moussa you have to use the |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
cosmicBboy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hi @george-moussa you have to use the
pyspark.pandas
API.pyspark.sql
is currently not supported