Incompatibilities b/w pandas and pySpark #28
thepushkarp
started this conversation in
Development
Replies: 1 comment
-
Most differences arise between the two because of the way both dataframes are made. Here are some of the ways they both differ:
Essentially, they both are very different datatypes despite having similar names. There is also a native API provided by pySpark to offer some compatibility with the pandas syntax: Pandas API on Spark |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
pandas and pySpark DataFrames do not have the same functions to perform the operations in their dataframes, because they both read and process data differently. We should document these incompatibilities to know where we might have to change prior (and future) implementations.
Beta Was this translation helpful? Give feedback.
All reactions