Skip to content

Commit

Permalink
[SPARK-43363][SQL][PYTHON] Make to call astype to the category type…
Browse files Browse the repository at this point in the history
… only when the arrow type is not provided

### What changes were proposed in this pull request?

Makes to call `astype` to the category type only when the arrow type is not provided.

### Why are the changes needed?

Now that the minimum version of pyarrow is `1.0.0`, a workaround for pandas' categorical type for pyarrow can be removed if the arrow type is provided.

> Note: This can be removed once minimum pyarrow version is >= 0.16.1

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #41041 from ueshin/issues/SPARK-43363/categorical_type.

Authored-by: Takuya UESHIN <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
ueshin authored and HyukjinKwon committed May 5, 2023
1 parent cb26ad8 commit 7b5d7af
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions python/pyspark/sql/pandas/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,7 @@ def create_array(s, t):
s = _check_series_convert_timestamps_internal(s, self._timezone)
elif t is not None and pa.types.is_map(t):
s = _convert_dict_to_map_items(s)
elif is_categorical_dtype(s.dtype):
# Note: This can be removed once minimum pyarrow version is >= 0.16.1
elif t is None and is_categorical_dtype(s.dtype):
s = s.astype(s.dtypes.categories.dtype)
try:
array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck)
Expand Down

0 comments on commit 7b5d7af

Please sign in to comment.