Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-31: Python: prototype user object model, add PyList conversion path with type inference #19

Closed
wants to merge 21 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Mar 7, 2016

Depends on ARROW-7. Pretty mundane stuff but got to start somewhere. I'm going to do a little more in this patch (handle normal lists of strings and lists of other supported Python types) before merging.

@wesm
Copy link
Member Author

wesm commented Mar 7, 2016

Added rudimentary strings, doubles, and nested list conversion. That's enough for this patch!

@wesm
Copy link
Member Author

wesm commented Mar 7, 2016

preview:

In [1]: import arrow

In [2]: data = [[1, 2, 3], None, [], [4, 5]]

In [3]: arr = arrow.from_pylist(data)

In [4]: arr
Out[4]: <arrow.array.ListArray at 0x7f44437e49a8>

In [5]: arr.type
Out[5]: DataType(list<int64>)

In [6]: arr.null_count
Out[6]: 1

In [7]: len(arr)
Out[7]: 4

@wesm
Copy link
Member Author

wesm commented Mar 7, 2016

Rebased.

@asfgit asfgit closed this in 9afb667 Mar 7, 2016
@wesm wesm deleted the ARROW-31 branch March 7, 2016 23:02
wesm added a commit to wesm/arrow that referenced this pull request Sep 2, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`.

Author: Wes McKinney <[email protected]>

Closes apache#19 from wesm/googletest-infra and squashes the following commits:

758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC
61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh
6435970 [Wes McKinney] Fix setup_build_env.sh script
a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make
wesm added a commit to wesm/arrow that referenced this pull request Sep 4, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`.

Author: Wes McKinney <[email protected]>

Closes apache#19 from wesm/googletest-infra and squashes the following commits:

758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC
61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh
6435970 [Wes McKinney] Fix setup_build_env.sh script
a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make

Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
wesm added a commit to wesm/arrow that referenced this pull request Sep 6, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`.

Author: Wes McKinney <[email protected]>

Closes apache#19 from wesm/googletest-infra and squashes the following commits:

758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC
61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh
6435970 [Wes McKinney] Fix setup_build_env.sh script
a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make

Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
wesm added a commit to wesm/arrow that referenced this pull request Sep 7, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`.

Author: Wes McKinney <[email protected]>

Closes apache#19 from wesm/googletest-infra and squashes the following commits:

758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC
61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh
6435970 [Wes McKinney] Fix setup_build_env.sh script
a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make

Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
wesm added a commit to wesm/arrow that referenced this pull request Sep 8, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`.

Author: Wes McKinney <[email protected]>

Closes apache#19 from wesm/googletest-infra and squashes the following commits:

758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC
61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh
6435970 [Wes McKinney] Fix setup_build_env.sh script
a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make

Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2
zhouyuan pushed a commit to zhouyuan/arrow that referenced this pull request Mar 27, 2020
[C++] Use one buffer for merge instead of multiple slices
kou pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
projjal pushed a commit to projjal/arrow that referenced this pull request Oct 29, 2020
…ression (apache#19)

Closes apache#8530 from pprudhvi/stringalloc

Authored-by: Prudhvi Porandla <[email protected]>
Signed-off-by: Pindikura Ravindra <[email protected]>
icexelloss pushed a commit to icexelloss/arrow that referenced this pull request Oct 28, 2022
* ARROW-18008: Added a use_threads option to run_query

* ARROW-18008: pylint
kou pushed a commit that referenced this pull request Dec 18, 2024
…zone (#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #40633

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant