Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-23: Add a logical Column data structure #15

Closed
wants to merge 7 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Mar 4, 2016

I also added global const instances of common primitive types

@wesm
Copy link
Member Author

wesm commented Mar 4, 2016

This is a requirement for ARROW-24. See also ARROW-39. My goal is to build enough scaffolding to be able to read Parquet files; we should examine the design of these helper data structures holistically after we get a few things working.

@asfgit asfgit closed this in 9c2b954 Mar 4, 2016
@wesm wesm deleted the ARROW-23 branch March 4, 2016 23:01
wesm added a commit to wesm/arrow that referenced this pull request Sep 2, 2018
There actually was a legitimate bug fixed here for malformed Parquet files, but we are not yet in a position to write a decent test for it until PARQUET-497. I will make a note on that JIRA.

I also set our Travis CI build to fail on future compiler warnings.

This also closes apache#15.

Author: Wes McKinney <[email protected]>

Closes apache#40 from wesm/PARQUET-455 and squashes the following commits:

a348063 [Wes McKinney] Compiler warnings fail the build
271d71e [Wes McKinney] Fix OS X / Clang compiler warnings
wesm added a commit to wesm/arrow that referenced this pull request Sep 4, 2018
There actually was a legitimate bug fixed here for malformed Parquet files, but we are not yet in a position to write a decent test for it until PARQUET-497. I will make a note on that JIRA.

I also set our Travis CI build to fail on future compiler warnings.

This also closes apache#15.

Author: Wes McKinney <[email protected]>

Closes apache#40 from wesm/PARQUET-455 and squashes the following commits:

a348063 [Wes McKinney] Compiler warnings fail the build
271d71e [Wes McKinney] Fix OS X / Clang compiler warnings

Change-Id: I8b89f98506f59bb5b207b6cf1b0f2ea5d17c2585
wesm added a commit to wesm/arrow that referenced this pull request Sep 6, 2018
There actually was a legitimate bug fixed here for malformed Parquet files, but we are not yet in a position to write a decent test for it until PARQUET-497. I will make a note on that JIRA.

I also set our Travis CI build to fail on future compiler warnings.

This also closes apache#15.

Author: Wes McKinney <[email protected]>

Closes apache#40 from wesm/PARQUET-455 and squashes the following commits:

a348063 [Wes McKinney] Compiler warnings fail the build
271d71e [Wes McKinney] Fix OS X / Clang compiler warnings

Change-Id: I8b89f98506f59bb5b207b6cf1b0f2ea5d17c2585
wesm added a commit to wesm/arrow that referenced this pull request Sep 7, 2018
There actually was a legitimate bug fixed here for malformed Parquet files, but we are not yet in a position to write a decent test for it until PARQUET-497. I will make a note on that JIRA.

I also set our Travis CI build to fail on future compiler warnings.

This also closes apache#15.

Author: Wes McKinney <[email protected]>

Closes apache#40 from wesm/PARQUET-455 and squashes the following commits:

a348063 [Wes McKinney] Compiler warnings fail the build
271d71e [Wes McKinney] Fix OS X / Clang compiler warnings

Change-Id: I8b89f98506f59bb5b207b6cf1b0f2ea5d17c2585
wesm added a commit to wesm/arrow that referenced this pull request Sep 8, 2018
There actually was a legitimate bug fixed here for malformed Parquet files, but we are not yet in a position to write a decent test for it until PARQUET-497. I will make a note on that JIRA.

I also set our Travis CI build to fail on future compiler warnings.

This also closes apache#15.

Author: Wes McKinney <[email protected]>

Closes apache#40 from wesm/PARQUET-455 and squashes the following commits:

a348063 [Wes McKinney] Compiler warnings fail the build
271d71e [Wes McKinney] Fix OS X / Clang compiler warnings

Change-Id: I8b89f98506f59bb5b207b6cf1b0f2ea5d17c2585
xuechendi pushed a commit to xuechendi/arrow that referenced this pull request Dec 17, 2019
kszucs added a commit that referenced this pull request Mar 12, 2020
See it in action: kszucs#16 (comment)

Main drawback that is is much slower than ursabot, but we can optimize it by:
- porting crossbow to only depend on pygithub instead of libgit2 (will consume the rate limit, but should fit in)
- use caching or docker

Theoretically CROSSBOW_GITHUB_TOKEN is set as a github actions secret, see https://issues.apache.org/jira/browse/INFRA-19954
We can trigger a build once this is merged into master.

Closes #6571 from kszucs/master and squashes the following commits:

7a604a8 <Krisztián Szűcs> note that the license is BSD2
8586eb7 <Krisztián Szűcs> add license reference
def8724 <Krisztián Szűcs> RAT
a96e7e2 <Krisztián Szűcs> flake8
6f5da63 <Krisztián Szűcs> add requirements to docker whitelist
6678c2e <Krisztián Szűcs> update archery dependencies
33f65d4 <Krisztián Szűcs> revert removing the rest of the workflows
a82b879 <Krisztián Szűcs> test dep
06a7716 <Krisztián Szűcs> responses test dep
ba25229 <Krisztián Szűcs> fix archery workflow syntax
9352ee0 <Krisztián Szűcs> run archery unittests
deb857f <Krisztián Szűcs> checkout@v2 and fetch tags
215495a <Krisztián Szűcs> fix result path
748832f <Krisztián Szűcs> message formatter
ea1b7c8 <Krisztián Szűcs> no dry run
6c83b0c <Krisztián Szűcs> dry run
4789ac5 <Krisztián Szűcs> response ormatter
1b0b15d <Krisztián Szűcs> cleanup
2270a35 <Krisztián Szűcs> validate
035024f <Krisztián Szűcs> validate callback
e791c62 <Krisztián Szűcs> diag
641227f <Krisztián Szűcs> diab
b22b204 <Krisztián Szűcs> token
d95e86b <Krisztián Szűcs> path to event payload
3e9a279 <Krisztián Szűcs> pygithub
ca1592d <Krisztián Szűcs> typo
3c1358e <Krisztián Szűcs> triger event handler
55e65fa <Krisztián Szűcs> crossbow command
92568eb <Krisztián Szűcs> first draft of bot
99ea0c2 <Krisztián Szűcs> cat
3c0f16d <Krisztián Szűcs> remove all other workflows
1f8f21d <Krisztián Szűcs> diag event handling
2f613dd <Krisztián Szűcs> Check event handling (#15)

Authored-by: Krisztián Szűcs <[email protected]>
Signed-off-by: Krisztián Szűcs <[email protected]>
kou pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request May 8, 2021
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request May 13, 2021
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request May 17, 2021
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
zhouyuan added a commit to zhouyuan/arrow that referenced this pull request Jun 9, 2021
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request Feb 8, 2022
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request Feb 8, 2022
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request Mar 3, 2022
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
rui-mo pushed a commit to rui-mo/arrow-1 that referenced this pull request Mar 23, 2022
this patch allows to use customized libhdfs3 dir

Signed-off-by: Yuan Zhou <[email protected]>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request Oct 23, 2022
…nts (apache#15)

* ARROW-17966: Updated to latest Substrait version.  Switched from optional enum args to proper options.  Added check for minimum Substrait version

* ARROW-17966: Add version to python substrait examples.  Fix version handling to check major version and not just minor

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Display the available choices when a user enters a valid substrait option that Acero doesn't support

* ARROW-17966: Simplify parsing boilerplate per review comments

* ARROW-17966: Gracefully error if the user does not supply any preferences for an option

* ARROW-17966: Prefer range loops where possible

* ARROW-17966: Rebase cleanup

* ARROW-17966: Minor fix to failing unit tests: remove enum="unspecified"

* ARROW-17966: Minor lint fix

* ARROW-17966: Cmake format

Co-authored-by: Benjamin Kietzman <[email protected]>
icexelloss pushed a commit to icexelloss/arrow that referenced this pull request Oct 28, 2022
…nts (apache#15)

* ARROW-17966: Updated to latest Substrait version.  Switched from optional enum args to proper options.  Added check for minimum Substrait version

* ARROW-17966: Add version to python substrait examples.  Fix version handling to check major version and not just minor

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Update cpp/src/arrow/engine/substrait/extension_set.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

* ARROW-17966: Display the available choices when a user enters a valid substrait option that Acero doesn't support

* ARROW-17966: Simplify parsing boilerplate per review comments

* ARROW-17966: Gracefully error if the user does not supply any preferences for an option

* ARROW-17966: Prefer range loops where possible

* ARROW-17966: Rebase cleanup

* ARROW-17966: Minor fix to failing unit tests: remove enum="unspecified"

* ARROW-17966: Minor lint fix

* ARROW-17966: Cmake format

Co-authored-by: Benjamin Kietzman <[email protected]>
felipecrv added a commit to felipecrv/arrow that referenced this pull request Apr 5, 2024
…ormatting buffer

With ASAN, this reproduces the issue.

    [ RUN      ] Formatting.Timestamp
    =================================================================
    ==4191383==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff1804c48f at pc 0x5608edffe39d bp 0x7fff1804c110 sp 0x7fff1804c108
    WRITE of size 1 at 0x7fff1804c48f thread T0
        #0 0x5608edffe39c in arrow::internal::detail::FormatOneChar(char, char**) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h:132:67
        apache#1 0x5608ee035c00 in arrow::internal::detail::FormatYYYY_MM_DD(arrow_vendored::date::year_month_day, char**) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h:351:5
        apache#2 0x5608ee05e8a0 in decltype(std::declval<arrow::StringAppender&>()(std::basic_string_view<char, std::char_traits<char> >{})) arrow::internal::StringFormatter<arrow::TimestampType, void>::operator()<std::chrono::duration<long, std::ratio<1l, 1000l> >, arrow::StringAppender&>(std::chrono::duration<long, std::ratio<1l, 1000l> >, long, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h
    :521:5
        apache#3 0x5608ee05d60f in decltype(std::declval<arrow::internal::StringFormatter<arrow::TimestampType, void>&>()(std::chrono::duration<long, std::ratio<1l, 1l> >{}, std::declval<long&>(), std::declval<arrow::StringAppender&>())) arrow::util::VisitDuration<arrow::internal::StringFormatter<arrow::TimestampType, void>&, long&, arrow::StringAppender&>(arrow::TimeUnit::type, arrow::internal::StringFormatter<arrow::Timestam
    pType, void>&, long&, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/time.h:60:14
        apache#4 0x5608ee05d122 in decltype(std::declval<arrow::StringAppender&>()(std::basic_string_view<char, std::char_traits<char> >{})) arrow::internal::StringFormatter<arrow::TimestampType, void>::operator()<arrow::StringAppender&>(long, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h:527:12
        apache#5 0x5608edfeffb3 in void arrow::AssertFormatting<arrow::internal::StringFormatter<arrow::TimestampType, void>, long>(arrow::internal::StringFormatter<arrow::TimestampType, void>&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting_util_test.cc:52:3
        apache#6 0x5608edfece95 in arrow::Formatting_Timestamp_Test::TestBody() /home/felipeo/code/arrow/cpp/src/arrow/util/formatting_util_test.cc:540:5
        apache#7 0x7fd95d7901de in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd901de) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#8 0x7fd95d784905 in testing::Test::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd84905) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#9 0x7fd95d784a84 in testing::TestInfo::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd84a84) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#10 0x7fd95d785038 in testing::TestSuite::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd85038) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#11 0x7fd95d78573e in testing::internal::UnitTestImpl::RunAllTests() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd8573e) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#12 0x7fd95d7907a6 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd907a6) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#13 0x7fd95d784b4b in testing::UnitTest::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd84b4b) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#14 0x5608ee54506d in RUN_ALL_TESTS() /usr/include/gtest/gtest.h:2490:46
        apache#15 0x5608ee544fb9 in main /home/felipeo/code/arrow/cpp/src/arrow/util/logging_test.cc:129:10
        apache#16 0x7fd93de29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
        apache#17 0x7fd93de29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
        apache#18 0x5608ed7b4f64 in _start (/home/felipeo/code/arrow/cpp/ninja/debug/arrow-utility-test+0x1224f64) (BuildId: 81cfdc36b7a960a7249ecd5884beaa869140ab89)

    Address 0x7fff1804c48f is located in stack of thread T0 at offset 399 in frame
        #0 0x5608ee05dc9f in decltype(std::declval<arrow::StringAppender&>()(std::basic_string_view<char, std::char_traits<char> >{})) arrow::internal::StringFormatter<arrow::TimestampType, void>::operator()<std::chrono::duration<long, std::ratio<1l, 1000l> >, arrow::StringAppender&>(std::chrono::duration<long, std::ratio<1l, 1000l> >, long, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h
    :486
raulcd pushed a commit that referenced this pull request Apr 11, 2024
### Rationale for this change

An error is received installing R duckdb:

```
#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: #41145

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
raulcd pushed a commit that referenced this pull request Apr 11, 2024
### Rationale for this change

An error is received installing R duckdb:

```
#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: #41145

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Apr 15, 2024
…ache#41152)

### Rationale for this change

An error is received installing R duckdb:

```
apache#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
apache#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
apache#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: apache#41145

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…ache#41152)

### Rationale for this change

An error is received installing R duckdb:

```
apache#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
apache#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
apache#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: apache#41145

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…ache#41152)

### Rationale for this change

An error is received installing R duckdb:

```
apache#15 18.13 > remotes::install_github('duckdb/duckdb-r', build = FALSE)
apache#15 18.27 Error: Failed to install 'unknown package' from **GitHub:**
apache#15 18.27   Line starting 'Roxyg ...' is malformed!
```

Some searching seems to suggest that this is because R cannot process UTF-8 characters in DESCRIPTION files if the `LANG` is set to `C`.

### What changes are included in this PR?

The `LANG` is set to `C.UTF-8` in the dockerfile for this CI job

### Are these changes tested?

The change only affects a test

### Are there any user-facing changes?

No
* GitHub Issue: apache#41145

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
kou pushed a commit that referenced this pull request Dec 18, 2024
…zone (#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #40633

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant