-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-31: Python: prototype user object model, add PyList conversion path with type inference #19
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added rudimentary strings, doubles, and nested list conversion. That's enough for this patch! |
preview: In [1]: import arrow
In [2]: data = [[1, 2, 3], None, [], [4, 5]]
In [3]: arr = arrow.from_pylist(data)
In [4]: arr
Out[4]: <arrow.array.ListArray at 0x7f44437e49a8>
In [5]: arr.type
Out[5]: DataType(list<int64>)
In [6]: arr.null_count
Out[6]: 1
In [7]: len(arr)
Out[7]: 4 |
…k toward converting Python lists to Arrow arrays
Rebased. |
wesm
added a commit
to wesm/arrow
that referenced
this pull request
Sep 2, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`. Author: Wes McKinney <[email protected]> Closes apache#19 from wesm/googletest-infra and squashes the following commits: 758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC 61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh 6435970 [Wes McKinney] Fix setup_build_env.sh script a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make
wesm
added a commit
to wesm/arrow
that referenced
this pull request
Sep 4, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`. Author: Wes McKinney <[email protected]> Closes apache#19 from wesm/googletest-infra and squashes the following commits: 758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC 61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh 6435970 [Wes McKinney] Fix setup_build_env.sh script a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
wesm
added a commit
to wesm/arrow
that referenced
this pull request
Sep 6, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`. Author: Wes McKinney <[email protected]> Closes apache#19 from wesm/googletest-infra and squashes the following commits: 758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC 61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh 6435970 [Wes McKinney] Fix setup_build_env.sh script a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
wesm
added a commit
to wesm/arrow
that referenced
this pull request
Sep 7, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`. Author: Wes McKinney <[email protected]> Closes apache#19 from wesm/googletest-infra and squashes the following commits: 758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC 61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh 6435970 [Wes McKinney] Fix setup_build_env.sh script a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
wesm
added a commit
to wesm/arrow
that referenced
this pull request
Sep 8, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`. Author: Wes McKinney <[email protected]> Closes apache#19 from wesm/googletest-infra and squashes the following commits: 758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC 61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh 6435970 [Wes McKinney] Fix setup_build_env.sh script a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
zhouyuan
pushed a commit
to zhouyuan/arrow
that referenced
this pull request
Mar 27, 2020
[C++] Use one buffer for merge instead of multiple slices
kou
pushed a commit
that referenced
this pull request
May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point. I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`. ``` $ git log | head -1 commit ed5f534 % ctest ... Start 1: arrow-array-test 1/51 Test #1: arrow-array-test ..................... Passed 4.62 sec Start 2: arrow-buffer-test 2/51 Test #2: arrow-buffer-test .................... Passed 0.14 sec Start 3: arrow-extension-type-test 3/51 Test #3: arrow-extension-type-test ............ Passed 0.12 sec Start 4: arrow-misc-test 4/51 Test #4: arrow-misc-test ...................... Passed 0.14 sec Start 5: arrow-public-api-test 5/51 Test #5: arrow-public-api-test ................ Passed 0.12 sec Start 6: arrow-scalar-test 6/51 Test #6: arrow-scalar-test .................... Passed 0.13 sec Start 7: arrow-type-test 7/51 Test #7: arrow-type-test ...................... Passed 0.14 sec Start 8: arrow-table-test 8/51 Test #8: arrow-table-test ..................... Passed 0.13 sec Start 9: arrow-tensor-test 9/51 Test #9: arrow-tensor-test .................... Passed 0.13 sec Start 10: arrow-sparse-tensor-test 10/51 Test #10: arrow-sparse-tensor-test ............. Passed 0.16 sec Start 11: arrow-stl-test 11/51 Test #11: arrow-stl-test ....................... Passed 0.12 sec Start 12: arrow-concatenate-test 12/51 Test #12: arrow-concatenate-test ............... Passed 0.53 sec Start 13: arrow-diff-test 13/51 Test #13: arrow-diff-test ...................... Passed 1.45 sec Start 14: arrow-c-bridge-test 14/51 Test #14: arrow-c-bridge-test .................. Passed 0.18 sec Start 15: arrow-io-buffered-test 15/51 Test #15: arrow-io-buffered-test ............... Passed 0.20 sec Start 16: arrow-io-compressed-test 16/51 Test #16: arrow-io-compressed-test ............. Passed 3.48 sec Start 17: arrow-io-file-test 17/51 Test #17: arrow-io-file-test ................... Passed 0.74 sec Start 18: arrow-io-hdfs-test 18/51 Test #18: arrow-io-hdfs-test ................... Passed 0.12 sec Start 19: arrow-io-memory-test 19/51 Test #19: arrow-io-memory-test ................. Passed 2.77 sec Start 20: arrow-utility-test 20/51 Test #20: arrow-utility-test ...................***Failed 5.65 sec Start 21: arrow-threading-utility-test 21/51 Test #21: arrow-threading-utility-test ......... Passed 1.34 sec Start 22: arrow-compute-compute-test 22/51 Test #22: arrow-compute-compute-test ........... Passed 0.13 sec Start 23: arrow-compute-boolean-test 23/51 Test #23: arrow-compute-boolean-test ........... Passed 0.15 sec Start 24: arrow-compute-cast-test 24/51 Test #24: arrow-compute-cast-test .............. Passed 0.22 sec Start 25: arrow-compute-hash-test 25/51 Test #25: arrow-compute-hash-test .............. Passed 2.61 sec Start 26: arrow-compute-isin-test 26/51 Test #26: arrow-compute-isin-test .............. Passed 0.81 sec Start 27: arrow-compute-match-test 27/51 Test #27: arrow-compute-match-test ............. Passed 0.40 sec Start 28: arrow-compute-sort-to-indices-test 28/51 Test #28: arrow-compute-sort-to-indices-test ... Passed 3.33 sec Start 29: arrow-compute-nth-to-indices-test 29/51 Test #29: arrow-compute-nth-to-indices-test .... Passed 1.51 sec Start 30: arrow-compute-util-internal-test 30/51 Test #30: arrow-compute-util-internal-test ..... Passed 0.13 sec Start 31: arrow-compute-add-test 31/51 Test #31: arrow-compute-add-test ............... Passed 0.12 sec Start 32: arrow-compute-aggregate-test 32/51 Test #32: arrow-compute-aggregate-test ......... Passed 14.70 sec Start 33: arrow-compute-compare-test 33/51 Test #33: arrow-compute-compare-test ........... Passed 7.96 sec Start 34: arrow-compute-take-test 34/51 Test #34: arrow-compute-take-test .............. Passed 4.80 sec Start 35: arrow-compute-filter-test 35/51 Test #35: arrow-compute-filter-test ............ Passed 8.23 sec Start 36: arrow-dataset-dataset-test 36/51 Test #36: arrow-dataset-dataset-test ........... Passed 0.25 sec Start 37: arrow-dataset-discovery-test 37/51 Test #37: arrow-dataset-discovery-test ......... Passed 0.13 sec Start 38: arrow-dataset-file-ipc-test 38/51 Test #38: arrow-dataset-file-ipc-test .......... Passed 0.21 sec Start 39: arrow-dataset-file-test 39/51 Test #39: arrow-dataset-file-test .............. Passed 0.12 sec Start 40: arrow-dataset-filter-test 40/51 Test #40: arrow-dataset-filter-test ............ Passed 0.16 sec Start 41: arrow-dataset-partition-test 41/51 Test #41: arrow-dataset-partition-test ......... Passed 0.13 sec Start 42: arrow-dataset-scanner-test 42/51 Test #42: arrow-dataset-scanner-test ........... Passed 0.20 sec Start 43: arrow-filesystem-test 43/51 Test #43: arrow-filesystem-test ................ Passed 1.62 sec Start 44: arrow-hdfs-test 44/51 Test #44: arrow-hdfs-test ...................... Passed 0.13 sec Start 45: arrow-feather-test 45/51 Test #45: arrow-feather-test ................... Passed 0.91 sec Start 46: arrow-ipc-read-write-test 46/51 Test #46: arrow-ipc-read-write-test ............ Passed 5.77 sec Start 47: arrow-ipc-json-simple-test 47/51 Test #47: arrow-ipc-json-simple-test ........... Passed 0.16 sec Start 48: arrow-ipc-json-test 48/51 Test #48: arrow-ipc-json-test .................. Passed 0.27 sec Start 49: arrow-json-integration-test 49/51 Test #49: arrow-json-integration-test .......... Passed 0.13 sec Start 50: arrow-json-test 50/51 Test #50: arrow-json-test ...................... Passed 0.26 sec Start 51: arrow-orc-adapter-test 51/51 Test #51: arrow-orc-adapter-test ............... Passed 1.92 sec 98% tests passed, 1 tests failed out of 51 Label Time Summary: arrow-tests = 27.38 sec (27 tests) arrow_compute = 45.11 sec (14 tests) arrow_dataset = 1.21 sec (7 tests) arrow_ipc = 6.20 sec (3 tests) unittest = 79.91 sec (51 tests) Total Test time (real) = 79.99 sec The following tests FAILED: 20 - arrow-utility-test (Failed) Errors while running CTest ``` Closes #7142 from kiszk/ARROW-8754 Authored-by: Kazuaki Ishizaki <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
projjal
pushed a commit
to projjal/arrow
that referenced
this pull request
Oct 29, 2020
…ression (apache#19) Closes apache#8530 from pprudhvi/stringalloc Authored-by: Prudhvi Porandla <[email protected]> Signed-off-by: Pindikura Ravindra <[email protected]>
icexelloss
pushed a commit
to icexelloss/arrow
that referenced
this pull request
Oct 28, 2022
* ARROW-18008: Added a use_threads option to run_query * ARROW-18008: pylint
kou
pushed a commit
that referenced
this pull request
Dec 18, 2024
…zone (#45051) ### Rationale for this change If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception. This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash. Here is a backtrace excerpt: ``` #12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6 #13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116 #17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 #24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900 ``` ### What changes are included in this PR? Catch C++ exceptions when iterating ORC batches instead of letting them slip through. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #40633 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Depends on ARROW-7. Pretty mundane stuff but got to start somewhere. I'm going to do a little more in this patch (handle normal lists of strings and lists of other supported Python types) before merging.