Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1689: [Python] Implement zero-copy conversions for DictionaryArray #1237

Closed

Conversation

Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Oct 22, 2017

This PR closes ARROW-1689.
I want to add the zero-copy option after #1233 merged.

@Licht-T Licht-T changed the title ARROW-1689: [Python] Implement Categorical Block Zero-Copy ARROW-1689: [Python] Implement CategoricalBlock Index Zero-Copy Oct 22, 2017
@njwhite
Copy link
Contributor

njwhite commented Oct 22, 2017

Thanks - I wrote a test for this behaviour when working on my PR!

diff --git a/python/pyarrow/tests/test_convert_pandas.py b/python/pyarrow/tests/test_convert_pandas.py
index 8dd38582..21a5c82d 100644
--- a/python/pyarrow/tests/test_convert_pandas.py
+++ b/python/pyarrow/tests/test_convert_pandas.py
@@ -216,6 +216,19 @@ class TestPandasConversion(unittest.TestCase):
         result = pa.array([0, 1, 2]).to_pandas(zero_copy_only=True)
         npt.assert_array_equal(result, [0, 1, 2])
 
+    def test_zero_copy_dictionaries(self):
+        arr = pa.DictionaryArray.from_arrays(
+            np.array([0, 0]),
+            np.array(['A']))
+
+        result = arr.to_pandas(zero_copy_only=True)
+        values = pd.Categorical(['A', 'A'])
+
+        tm.assert_series_equal(
+            pd.Series(result),
+            pd.Series(values),
+           check_names=False)
+
     def test_zero_copy_failure_on_object_types(self):
         with self.assertRaises(pa.ArrowException):
             pa.array(['A', 'B', 'C']).to_pandas(zero_copy_only=True)

@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 26, 2017

@njwhite Thank you very much!
@wesm Now reflected the master branch change into this PR!

@wesm wesm force-pushed the feature-categorical-index-zerocopy branch from cb4980e to 87db87a Compare October 26, 2017 18:52
@wesm
Copy link
Member

wesm commented Oct 26, 2017

Rebased and fixed flakes. The test case is failing, I will leave you to fix

@wesm
Copy link
Member

wesm commented Oct 27, 2017

Looking at this now to see if I can fix the test case

Licht-T and others added 6 commits October 27, 2017 15:10
Change-Id: I477e3babec8bfdde952c4ff03193a23c304ef565
… no PyObject* is available to set as zero-copy ndarray base

Change-Id: Ib3e8da682906ea941ef9b0294ad5175365e1aabd
@wesm wesm force-pushed the feature-categorical-index-zerocopy branch from 87db87a to 53342e8 Compare October 27, 2017 19:42
@wesm
Copy link
Member

wesm commented Oct 27, 2017

Had to resort to a little bit of dark magic (the PyCapsule API) to handle zero-copy dictionaries. When the dictionary values are not zero-copyable (the test case had strings, I changed this to be integers), this still raises

@wesm
Copy link
Member

wesm commented Oct 27, 2017

+1, will await builds

@wesm wesm changed the title ARROW-1689: [Python] Implement CategoricalBlock Index Zero-Copy ARROW-1689: [Python] Implement zero-copy conversions for DictionaryArray Oct 27, 2017
@Licht-T
Copy link
Contributor Author

Licht-T commented Oct 28, 2017

@wesm Wow! Thanks for your great idea! I didn't get such idea!

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@xhochy xhochy closed this in 74a934a Oct 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants