Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Add snappy and use in arrow #19

Merged
merged 1 commit into from
Jun 24, 2019
Merged

Add snappy and use in arrow #19

merged 1 commit into from
Jun 24, 2019

Conversation

jeroen
Copy link
Contributor

@jeroen jeroen commented Jun 24, 2019

No description provided.

@jeroen jeroen force-pushed the arrow-snappy branch 3 times, most recently from 53bb7f6 to 237811e Compare June 24, 2019 14:38
@jeroen jeroen merged commit 9691eeb into master Jun 24, 2019
@jeroen jeroen deleted the arrow-snappy branch June 24, 2019 16:32
@jeroen
Copy link
Contributor Author

jeroen commented Jun 24, 2019

I don't understand why this doesn't build in azure pipelines. Seems they have a conflicting version of boost preinstalled or some such...

wesm pushed a commit to wesm/arrow that referenced this pull request Jun 25, 2019
Following r-windows/rtools-backports#7 and r-windows/rtools-packages#19, R Windows builds can now support Snappy compression. This patch tweaks the necessary files and unskips tests, in addition to some other PKGBUILD script cleanup.

Passing build here: https://ci.appveyor.com/project/nealrichardson/arrow/builds/25507388

Author: Neal Richardson <[email protected]>

Closes apache#4681 from nealrichardson/r-snappy and squashes the following commits:

0996a30 <Neal Richardson> Add license info for rtools-backports
1af0841 <Neal Richardson> Revert "Only run mine for now"
a5d967f <Neal Richardson> Get snappy from backports after all
a1c6390 <Neal Richardson> -lsnappy
a415449 <Neal Richardson> Only run mine for now
b911568 <Neal Richardson> More comments; unskip tests
d7a7419 <Neal Richardson> Add snappy to PKGBUILD; prune some default cmake flags and add comment
alamb pushed a commit to apache/arrow-rs that referenced this pull request Apr 20, 2021
Following r-windows/rtools-backports#7 and r-windows/rtools-packages#19, R Windows builds can now support Snappy compression. This patch tweaks the necessary files and unskips tests, in addition to some other PKGBUILD script cleanup.

Passing build here: https://ci.appveyor.com/project/nealrichardson/arrow/builds/25507388

Author: Neal Richardson <[email protected]>

Closes #4681 from nealrichardson/r-snappy and squashes the following commits:

0996a30e2 <Neal Richardson> Add license info for rtools-backports
1af08413c <Neal Richardson> Revert "Only run mine for now"
a5d967f0c <Neal Richardson> Get snappy from backports after all
a1c6390b6 <Neal Richardson> -lsnappy
a41544914 <Neal Richardson> Only run mine for now
b911568df <Neal Richardson> More comments; unskip tests
d7a74192a <Neal Richardson> Add snappy to PKGBUILD; prune some default cmake flags and add comment
lidavidm pushed a commit to apache/arrow-adbc that referenced this pull request Jun 1, 2022
Following r-windows/rtools-backports#7 and r-windows/rtools-packages#19, R Windows builds can now support Snappy compression. This patch tweaks the necessary files and unskips tests, in addition to some other PKGBUILD script cleanup.

Passing build here: https://ci.appveyor.com/project/nealrichardson/arrow/builds/25507388

Author: Neal Richardson <[email protected]>

Closes #4681 from nealrichardson/r-snappy and squashes the following commits:

0996a30e2 <Neal Richardson> Add license info for rtools-backports
1af08413c <Neal Richardson> Revert "Only run mine for now"
a5d967f0c <Neal Richardson> Get snappy from backports after all
a1c6390b6 <Neal Richardson> -lsnappy
a41544914 <Neal Richardson> Only run mine for now
b911568df <Neal Richardson> More comments; unskip tests
d7a74192a <Neal Richardson> Add snappy to PKGBUILD; prune some default cmake flags and add comment
lidavidm pushed a commit to apache/arrow-adbc that referenced this pull request Jun 1, 2022
Following r-windows/rtools-backports#7 and r-windows/rtools-packages#19, R Windows builds can now support Snappy compression. This patch tweaks the necessary files and unskips tests, in addition to some other PKGBUILD script cleanup.

Passing build here: https://ci.appveyor.com/project/nealrichardson/arrow/builds/25507388

Author: Neal Richardson <[email protected]>

Closes #4681 from nealrichardson/r-snappy and squashes the following commits:

0996a30e2 <Neal Richardson> Add license info for rtools-backports
1af08413c <Neal Richardson> Revert "Only run mine for now"
a5d967f0c <Neal Richardson> Get snappy from backports after all
a1c6390b6 <Neal Richardson> -lsnappy
a41544914 <Neal Richardson> Only run mine for now
b911568df <Neal Richardson> More comments; unskip tests
d7a74192a <Neal Richardson> Add snappy to PKGBUILD; prune some default cmake flags and add comment
lidavidm added a commit to apache/arrow-adbc that referenced this pull request Jun 1, 2022
* Update readme and add license in root.

* ARROW-202: Integrate with appveyor ci for windows

This only adds yet a successful compilation for windows. Tests don't
run.

Author: Uwe L. Korn <[email protected]>

Closes #213 from xhochy/ARROW-202 and squashes the following commits:

d5088a6 [Uwe L. Korn] Correctly reference Kudu in LICENSE and NOTICE
72a583b [Uwe L. Korn] Differentiate Boost libraries based on build type
6c75699 [Uwe L. Korn] Add license header
e33b08c [Uwe L. Korn] Pick up shared Boost libraries correctly
5da5f5d [Uwe L. Korn] ARROW-202: Integrate with appveyor ci for windows

* ARROW-557: [Python] Add option to explicitly opt in to HDFS tests, do not implicitly skip

I have

```
$ py.test pyarrow/tests/test_hdfs.py
================================== test session starts ==================================
platform linux2 -- Python 2.7.11, pytest-2.9.0, py-1.4.31, pluggy-0.3.1
rootdir: /home/wesm/code/arrow/python, inifile:
collected 15 items

pyarrow/tests/test_hdfs.py sssssssssssssss
```

But

```
$ py.test pyarrow/tests/test_hdfs.py --hdfs -v
================================== test session starts ==================================
platform linux2 -- Python 2.7.11, pytest-2.9.0, py-1.4.31, pluggy-0.3.1 -- /home/wesm/anaconda3/envs/py27/bin/python
cachedir: .cache
rootdir: /home/wesm/code/arrow/python, inifile:
collected 15 items

pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_close PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_download_upload PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_file_context_manager PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_ls PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_mkdir PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_orphaned_file PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_read_multiple_parquet_files SKIPPED
pyarrow/tests/test_hdfs.py::TestLibHdfs::test_hdfs_read_whole_file PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_close PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_download_upload PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_file_context_manager PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_ls PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_mkdir PASSED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_read_multiple_parquet_files SKIPPED
pyarrow/tests/test_hdfs.py::TestLibHdfs3::test_hdfs_read_whole_file PASSED
```

The `py.test pyarrow --only-hdfs` option will run only the HDFS tests.

Author: Wes McKinney <[email protected]>

Closes #353 from wesm/ARROW-557 and squashes the following commits:

52e03db [Wes McKinney] Add conftest.py file, hdfs group to opt in to HDFS tests with --hdfs

* ARROW-1104: Integrate in-memory object store into arrow

This supersedes https://github.com/apache/arrow/pull/467

This is ready for review. Next steps are
- Integration with the arrow CI
- Write docs on how to use the object store

There is one remaining compilation error (it doesn't find Python.h for one of the Travis configurations, if anybody has an idea on what is going on, let me know).

Author: Philipp Moritz <[email protected]>
Author: Robert Nishihara <[email protected]>

Closes #742 from pcmoritz/plasma-store-2 and squashes the following commits:

c100a453 [Philipp Moritz] fixes
d67160c5 [Philipp Moritz] build dlmalloc with -O3
16d1f716 [Philipp Moritz] fix test hanging
0f321e16 [Philipp Moritz] try to fix tests
80f9df40 [Philipp Moritz] make format
4c474d71 [Philipp Moritz] run plasma_store from the right directory
85aa1710 [Philipp Moritz] fix mac tests
61d421b5 [Philipp Moritz] fix formatting
4497e337 [Philipp Moritz] fix tests
00f17f24 [Philipp Moritz] fix licenses
81437920 [Philipp Moritz] fix linting
5370ae06 [Philipp Moritz] fix plasma protocol
a137e783 [Philipp Moritz] more fixes
b36c6aaa [Philipp Moritz] fix fling.cc
214c426c [Philipp Moritz] fix eviction policy
e7badc48 [Philipp Moritz] fix python extension
6432d3fa [Philipp Moritz] fix formatting
b21f0814 [Philipp Moritz] fix remaining comments about client
27f9c9e8 [Philipp Moritz] fix formatting
7b08fd2a [Philipp Moritz] replace ObjectID pass by value with pass by const reference and fix const correctness
ca80e9a6 [Philipp Moritz] remove plain pointer in plasma client, part II
627b7c75 [Philipp Moritz] fix python extension name
30bd68b7 [Philipp Moritz] remove plain pointer in plasma client, part I
77d98227 [Philipp Moritz] put all the object code into a common library
0fdd4cd5 [Philipp Moritz] link libarrow.a and remove hardcoded optimization flags
8daea699 [Philipp Moritz] fix includes according to google styleguide
65ac7433 [Philipp Moritz] remove offending c++ flag from c flags
7003a4a4 [Philipp Moritz] fix valgrind test by setting working directory
217ff3d8 [Philipp Moritz] add valgrind heuristic
9c703c20 [Philipp Moritz] integrate client tests
9e5ae0e1 [Philipp Moritz] port serialization tests to gtest
0b8593db [Robert Nishihara] Port change from Ray. Change listen backlog size from 5 to 128.
b9a5a06e [Philipp Moritz] fix includes
ed680f97 [Philipp Moritz] reformat the code
f40f85bd [Philipp Moritz] add clang-format exceptions
d6e60d26 [Philipp Moritz] do not compile plasma on windows
f936adb7 [Philipp Moritz] build plasma python client only if python is available
e11b0e86 [Philipp Moritz] fix pthread
74ecb199 [Philipp Moritz] don't link against Python libraries
b1e0335a [Philipp Moritz] fix linting
7f7e7e78 [Philipp Moritz] more linting
79ea0ca7 [Philipp Moritz] fix clang-tidy
99420e8f [Philipp Moritz] add rat exceptions
6cee1e25 [Philipp Moritz] fix
c93034fb [Philipp Moritz] add Apache 2.0 headers
63729130 [Philipp Moritz] fix malloc?
99537c94 [Philipp Moritz] fix compiler warnings
cb3f3a38 [Philipp Moritz] compile C files with CMAKE_C_FLAGS
e649c2af [Philipp Moritz] fix compilation
04c2edb3 [Philipp Moritz] add missing file
51ab9630 [Philipp Moritz] fix compiler warnings
9ef7f412 [Philipp Moritz] make the plasma store compile
e9f9bb4a [Philipp Moritz] Initial commit of the plasma store. Contributors: Philipp Moritz, Robert Nishihara, Richard Shin, Stephanie Wang, Alexey Tumanov, Ion Stoica @ RISElab, UC Berkeley (2017) [from https://github.com/ray-project/ray/commit/b94b4a35e04d8d2c0af4420518a4e9a94c1c9b9f]

* ARROW-1151: [C++] Add branch prediction to RETURN_NOT_OK

Also added some missing status checks to builder-benchmark

Author: Wes McKinney <[email protected]>

Closes #782 from wesm/ARROW-1151 and squashes the following commits:

9b488a0e [Wes McKinney] Try to fix snappy warning
06276119 [Wes McKinney] Restore check macros used in libplasma
83b3f36d [Wes McKinney] Add branch prediction to RETURN_NOT_OK

* ARROW-1154: [C++] Import miscellaneous computational utility code from parquet-cpp

I will make a corresponding PR to parquet-cpp to ensure that this code migration is complete enough.

Author: Wes McKinney <[email protected]>

Closes #785 from wesm/ARROW-1154 and squashes the following commits:

08b54c98 [Wes McKinney] Fix variety of compiler warnings
ddc7354b [Wes McKinney] Fixes to get PARQUET-1045 working
f5cd0259 [Wes McKinney] Import miscellaneous computational utility code from parquet-cpp

* ARROW-1185: [C++] Status class cleanup, warn_unused_result attribute and Clang warning fixes

This was tedious, but overdue. The Status class in Arrow as originally imported from Apache Kudu, which had been modified from standard use in Google projects. I simplified the implementation to bring it more in line with the Status implementation used in TensorFlow.

This also addresses ARROW-111 by providing an attribute to warn in Clang if a Status is ignored

Author: Wes McKinney <[email protected]>

Closes #814 from wesm/status-cleaning and squashes the following commits:

7b7e6517 [Wes McKinney] Bring Status implementation somewhat more in line with TensorFlow and other Google codebases, remove unused posix code. Add warn_unused_result attribute and fix clang warnings

* ARROW-1630: [Serialization] Support Python datetime objects

An additional pair of eyes would be helpful, somewhat strangely the tests are passing for some datetime objects and not for others.

Author: Philipp Moritz <[email protected]>

Closes #1153 from pcmoritz/serialize-datetime and squashes the following commits:

f3696ae4 [Philipp Moritz] add numpy to LICENSE.txt
a94bca7d [Philipp Moritz] put PyDateTime_IMPORT higher up
0ae645e9 [Philipp Moritz] windows fixes
cbd1b222 [Philipp Moritz] get rid of gmtime_r
f3ea6699 [Philipp Moritz] use numpy datetime code to implement time conversions
e644f4f5 [Philipp Moritz] linting
f38cbd46 [Philipp Moritz] fixes
6e549c47 [Philipp Moritz] serialize datetime

* ARROW-1559: [C++] Add Unique kernel and refactor DictionaryBuilder to be a stateful kernel

Only intended to implement selective categorical conversion in `to_pandas()` but it seems that there is a lot missing to do this in a clean fashion.

Author: Wes McKinney <[email protected]>

Closes #1266 from xhochy/ARROW-1559 and squashes the following commits:

50249652 [Wes McKinney] Fix MSVC linker issue
b6cb1ece [Wes McKinney] Export CastOptions
4ea3ce61 [Wes McKinney] Return NONE Datum in else branch of functions
4f969c6b [Wes McKinney] Move deprecation suppression after flag munging
7f557cc0 [Wes McKinney] Code review comments, disable C4996 warning (equivalent to -Wno-deprecated) in MSVC builds
84717461 [Wes McKinney] Do not compute hash table threshold on each iteration
ae8f2339 [Wes McKinney] Fix double to int64_t conversion warning
c1444a26 [Wes McKinney] Fix doxygen warnings
2de85961 [Wes McKinney] Add test cases for unique, dictionary_encode
383b46fd [Wes McKinney] Add Array methods for Unique, DictionaryEncode
0962f06b [Wes McKinney] Add cast method for Column, chunked_array and column factory functions
62c3cefd [Wes McKinney] Datum stubs
27151c47 [Wes McKinney] Implement Cast for chunked arrays, fix kernel implementation. Change kernel API to write to a single Datum
1bf2e2f4 [Wes McKinney] Fix bug with column using wrong type
eaadc3e5 [Wes McKinney] Use macros to reduce code duplication in DoubleTableSize
6b4f8f3c [Wes McKinney] Fix datetime64->date32 casting error raised by refactor
2c77a19e [Wes McKinney] Some Decimal->Decimal128 renaming. Add DecimalType base class
c07f91b3 [Wes McKinney] ARROW-1559: Add unique kernel

* ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests

This PR adds a workaround for reading the metadata layout for C++ dictionary-encoded vectors.

I added tests that validate against the C++/Java integration suite. In order to make the new tests pass, I had to update the generated flatbuffers format and add a few types the JS version didn't have yet (Bool, Date32, and Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to determine whether the DictionaryBatch vector should replace or append to the existing dictionary.

I also added a script for generating test arrow files from the C++ and Java implementations, so we don't break the tests updating the format in the future. I saved the generated Arrow files in with the tests because I didn't see a way to pipe the JSON test data through the C++/Java json-to-arrow commands without writing to a file. If I missed something and we can do it all in-memory, I'd be happy to make that change!

This PR is marked WIP because I added an [integration test](https://github.com/apache/arrow/commit/6e98874d9f4bfae7758f8f731212ae7ceb3f1321#diff-18c6be12406c482092d4b1f7bd70a8e1R22) that validates the JS reader reads C++ and Java files the same way, but unfortunately it doesn't. Debugging, I noticed a number of other differences between the buffer layout metadata between the C++ and Java versions. If we go ahead with @jacques-n [comment in ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812) and remove/ignore the metadata, this test should pass too.

cc @TheNeuralBit

Author: Paul Taylor <[email protected]>
Author: Wes McKinney <[email protected]>

Closes #1294 from trxcllnt/generate-js-test-files and squashes the following commits:

f907d5a7 [Paul Taylor] fix aggressive closure-compiler mangling in the ES5 UMD bundle
57c7df45 [Paul Taylor] remove arrow files from perf tests
5972349c [Paul Taylor] update performance tests to use generated test data
14be77f4 [Paul Taylor] fix Date64Vector TypedArray, enable datetime integration tests
5660eb34 [Wes McKinney] Use openjdk8 for integration tests, jdk7 for main Java CI job
019e8e24 [Paul Taylor] update closure compiler with full support for ESModules, and remove closure-compiler-scripts
48111290 [Paul Taylor] Add support for reading Arrow buffers < MetadataVersion 4
c72134a5 [Paul Taylor] compile JS source in integration tests
c83a700d [Wes McKinney] Hack until ARROW-1837 resolved. Constrain unsigned integers max to signed max for bit width
fd3ed475 [Wes McKinney] Uppercase hex values
224e041c [Wes McKinney] Remove hard-coded file name to prevent primitive JSON file from being clobbered
0882d8e9 [Paul Taylor] separate JS unit tests from integration tests in CI
1f6a81b4 [Paul Taylor] add missing mkdirp for test json data
19136fbf [Paul Taylor] remove test data files in favor of auto-generating them in CI
9f195682 [Paul Taylor] Generate test files when the test run if they don't exist
0cdb74e0 [Paul Taylor] Add a cli arg to integration_test.py generate test JSON files for JS
cc744564 [Paul Taylor] resolve LICENSE.txt conflict
33916230 [Paul Taylor] move js license to top-level license.txt
d0b61f49 [Paul Taylor] add validate package script back in, make npm-release.sh suitable for ASF release process
7e3be574 [Paul Taylor] Copy license.txt and notice.txt into target dirs from arrow root.
c8125d2d [Paul Taylor] Update readme to reflect new Table.from signature
49ac3398 [Paul Taylor] allow unrecognized cli args in gulpfile
3c52587e [Paul Taylor] re-enable node_js job in travis
cb142f11 [Paul Taylor] add npm release script, remove unused package scripts
d51793dd [Paul Taylor] run tests on src folder for accurate jest coverage statistics
c087f482 [Paul Taylor] generate test data in build scripts
1d814d00 [Paul Taylor] excise test data csvs
14d48964 [Paul Taylor] stringify Struct Array cells
1f004968 [Paul Taylor] rename FixedWidthListVector to FixedWidthNumericVector
be73c918 [Paul Taylor] add BinaryVector, change ListVector to always return an Array
02fb3006 [Paul Taylor] compare iterator results in integration tests
e67a66a1 [Paul Taylor] remove/ignore test snapshots (getting too big)
de7d96a3 [Paul Taylor] regenerate test arrows from master
a6d3c83e [Paul Taylor] enable integration tests
44889fbe [Paul Taylor] report errors generating test arrows
fd68d510 [Paul Taylor] always increment validity buffer index while reading
562eba7d [Paul Taylor] update test snapshots
d4399a8a [Paul Taylor] update integration tests, add custom jest vector matcher
8d44dcd7 [Paul Taylor] update tests
6d2c03d4 [Paul Taylor] clean arrows folders before regenerating test data
4166a9ff [Paul Taylor] hard-code reader to Arrow spec and ignore field layout metadata
c60305d6 [Paul Taylor] refactor: flatten vector folder, add more types
ba984c61 [Paul Taylor] update dependencies
5eee3eaa [Paul Taylor] add integration tests to compare how JS reads cpp vs. java arrows
d4ff57aa [Paul Taylor] update test snapshots
407b9f5b [Paul Taylor] update reader/table tests for new generated arrows
85497069 [Paul Taylor] update cli args to execute partial test runs for debugging
eefc256d [Paul Taylor] remove old test arrows, add new generated test arrows
0cd31ab9 [Paul Taylor] add generate-arrows script to tests
3ff71384 [Paul Taylor] Add bool, date, time, timestamp, and ARROW-1693 workaround in reader
4a34247c [Paul Taylor] export Row type
141194e7 [Paul Taylor] use fieldNode.length as vector length
c45718e7 [Paul Taylor] support new DictionaryBatch isDelta flag
9d8fef97 [Paul Taylor] split DateVector into Date32 and Date64 types
8592ff3c [Paul Taylor] update generated format flatbuffers

* ARROW-1703: [C++] Vendor exact version of jemalloc we depend on

Author: Uwe L. Korn <[email protected]>

Closes #1334 from xhochy/ARROW-1703 and squashes the following commits:

7282583f [Uwe L. Korn] ARROW-1703: [C++] Vendor exact version of jemalloc we depend on

* ARROW-2798: [Plasma] Use hashing function that takes into account all UniqueID bytes

Now, the hashing of UniqueID in plasma is too simple which has caused a problem.  In some cases(for example, in github/ray, UniqueID is composed of a taskID and a index),  the UniqueID may be like "ffffffffffffffffffff00", "ffffffffffffffffff01", "fffffffffffffffffff02" ...  . The current hashing method is only to copy the first few bytes of a UniqueID and the result is that most of the hashed ids  are same, so when the hashed ids  put to plasma store, it will become very slow when searching(plasma store uses unordered_map to store the ids, and when the keys are same, it will become slow just like list).

In fact, the same PR has been merged into ray, see https://github.com/ray-project/ray/pull/2174.

and I have tested the perf between the new hashing method and the original one with putting lots of objects continuously, it seems the new hashing method doesn't cost more time.

Author: songqing <[email protected]>

Closes #2220 from songqing/oid-hashing and squashes the following commits:

5c803aa0 <songqing> modify murmurhash LICENSE
8b8aa3e1 <songqing> add murmurhash LICENSE
d8d5f93f <songqing> lint fix
426cd1e2 <songqing> lint fix
4767751d <songqing> Use hashing function that takes into account all UniqueID bytes

* ARROW-2634: [Go] Add Go license details to LICENSE.txt

Author: Wes McKinney <[email protected]>

Closes #2221 from wesm/ARROW-2634 and squashes the following commits:

c65a8193 <Wes McKinney> Add Go license details to LICENSE.txt

* ARROW-3050: [C++] Adopt HiveServer2 client codebase from
cloudera/hs2client. Add Thrift to thirdparty toolchain

This patch incorporates patches developed at cloudera/hs2client (Apache 2.0) by
the following authors:

* 12  Wes McKinney <[email protected]>, <[email protected]>
*  2  Thomas Tauber-Marshall <[email protected]>
*  2  陈晓发 <[email protected]>
*  2  Matthew Jacobs <[email protected]>, <[email protected]>
*  1  Miki Tebeka <[email protected]>
*  1  Tim Armstrong <[email protected]>
*  1  henryr <[email protected]>

Closes #2444

Change-Id: I88aed528a9f4d2069a4908f6a09230ade2fbe50a

* ARROW-1325: [R] Initial R package that builds against the arrow C++ library

This is very minimal in functionality, it just gives a simple R package that calls a function from the arrow C++ library.

Author: Romain Francois <[email protected]>
Author: Wes McKinney <[email protected]>

Closes #2489 from romainfrancois/r-bootstrap and squashes the following commits:

89f14b4ba <Wes McKinney> Add license addendums
9e3ffb4d2 <Romain Francois> skip using rpath linker option
79c50011d <Romain Francois> follow up from @wesm comments on #2489
a1a5e7c33 <Romain Francois> + installation instructions
fb412ca1d <Romain Francois> not checking for headers on these files
2848fd168 <Romain Francois> initial R :package: with travis setup and testthat suite, that links to arrow c++ library and calls arrow::int32()

* ARROW-3187: [C++] Add support for using glog (Google logging library)

1. `glog` provides richer information.
2. `glog` can print good call stack while crashing, which is very helpful for debugging.
3. Make logging pluggable with `glog` or original log using a macro. Users can enable/disable `glog` using the cmake option `ARROW_USE_GLOG`.

Author: Yuhong Guo <[email protected]>
Author: Wes McKinney <[email protected]>

Closes #2522 from guoyuhong/glog and squashes the following commits:

b359640d4 <Yuhong Guo> Revert some useless changes.
38560c06e <Yuhong Guo> Change back the test code to fix logging-test
e3203a598 <Wes McKinney> Some fixes, run logging-test
4a9d1728b <Wes McKinney> Fix Flatbuffers download url
f36430836 <Yuhong Guo> Add test code to only include glog lib and init it without other use.
c8269fd88 <Yuhong Guo> Change ARROW_JEMALLOC_LINK_LIBS setting to ARROW_LINK_LIBS
34e6841f8 <Yuhong Guo> Add pthread
48afa3484 <Yuhong Guo> Address comment
12f9ba7e9 <Yuhong Guo> Disable glog from ARROW_BUILD_TOOLCHAIN
62f20002d <Yuhong Guo> Add -pthread to glog
673dbebe5 <Yuhong Guo> Try to fix ci FAILURE
69c1e7979 <Yuhong Guo> Add pthread for glog
fbe9cc932 <Yuhong Guo> Change Thirdpart to use EP_CXX_FLAGS
6f4d1b8fc <Yuhong Guo> Add lib64 to lib path suffix.
84532e338 <Yuhong Guo> Add glog to Dockerfile
ccc03cb12 <Yuhong Guo> Fix a bug
7bacd53ef <Yuhong Guo> Add LICENSE information.
9a3834caa <Yuhong Guo> Enable glog and fix building error
2b1f7e00e <Yuhong Guo> Turn glog off.
7d92091a6 <Yuhong Guo> Hide glog symbols from libarrow.so
a6ff67110 <Yuhong Guo> Support offline build of glog
14865ee93 <Yuhong Guo> Try to fix MSVC building failure
53cecebef <Yuhong Guo> Change log level to enum and refine code
09c6af7b9 <Yuhong Guo> Enable glog in plasma

* ARROW-3182: [Gandiva] Integrate gandiva to arrow build. Update licenses to apache license.

Fix clang-format, cpplint warnings, -Wconversion warnings and other warnings
with -DBUILD_WARNING_LEVEL=CHECKIN. Fix some build toolchain issues, Arrow
target dependencies. Remove some unused CMake code

* ARROW-3536: [C++] Add UTF8 validation functions

The baseline UTF8 decoder is adapted from Bjoern Hoehrmann's DFA-based implementation.
The common case of runs of ASCII chars benefit from a fast path handling 8 bytes at a time.

Benchmark results (on a Ryzen 7 machine with gcc 7.3):
```
-----------------------------------------------------------------------------
Benchmark                                      Time           CPU Iterations
-----------------------------------------------------------------------------
BM_ValidateTinyAscii/repeats:1                 3 ns          3 ns  245245630   3.26202GB/s
BM_ValidateTinyNonAscii/repeats:1              7 ns          7 ns  104679950   1.54295GB/s
BM_ValidateSmallAscii/repeats:1               10 ns         10 ns   66365983   13.0928GB/s
BM_ValidateSmallAlmostAscii/repeats:1         37 ns         37 ns   18755439   3.69415GB/s
BM_ValidateSmallNonAscii/repeats:1            68 ns         68 ns   10267387   1.82934GB/s
BM_ValidateLargeAscii/repeats:1             4140 ns       4140 ns     171331   22.5003GB/s
BM_ValidateLargeAlmostAscii/repeats:1      24472 ns      24468 ns      28565   3.80816GB/s
BM_ValidateLargeNonAscii/repeats:1         50420 ns      50411 ns      13830   1.84927GB/s
```

The case of tiny strings is probably the most important for the use case of CSV type inference.

PS: benchmarks on the same machine with clang 6.0:
```
-----------------------------------------------------------------------------
Benchmark                                      Time           CPU Iterations
-----------------------------------------------------------------------------
BM_ValidateTinyAscii/repeats:1                 3 ns          3 ns  213945214   2.84658GB/s
BM_ValidateTinyNonAscii/repeats:1              8 ns          8 ns   90916423   1.33072GB/s
BM_ValidateSmallAscii/repeats:1                7 ns          7 ns   91498265   17.4425GB/s
BM_ValidateSmallAlmostAscii/repeats:1         34 ns         34 ns   20750233   4.08138GB/s
BM_ValidateSmallNonAscii/repeats:1            58 ns         58 ns   12063206   2.14002GB/s
BM_ValidateLargeAscii/repeats:1             3999 ns       3999 ns     175099   23.2937GB/s
BM_ValidateLargeAlmostAscii/repeats:1      21783 ns      21779 ns      31738   4.27822GB/s
BM_ValidateLargeNonAscii/repeats:1         55162 ns      55153 ns      12526   1.69028GB/s
```

Author: Antoine Pitrou <[email protected]>

Closes #2916 from pitrou/ARROW-3536-utf8-validation and squashes the following commits:

9c9713b78 <Antoine Pitrou> Improve benchmarks
e6f23963a <Antoine Pitrou> Use a larger state table allowing for single lookups
29d6e347c <Antoine Pitrou> Help clang code gen
e621b220f <Antoine Pitrou> Use memcpy for safe aligned reads, and improve speed of non-ASCII runs
89f6843d9 <Antoine Pitrou> ARROW-3536:  Add UTF8 validation functions

* ARROW-3800: [C++] Vendor a string_view backport

Vendor the `std::string_view` backport from https://github.com/martinmoene/string-view-lite

Author: Antoine Pitrou <[email protected]>

Closes #2974 from pitrou/ARROW-3800-string-view-backport and squashes the following commits:

4353414b6 <Antoine Pitrou> ARROW-3800:  Vendor a string_view backport

* ARROW-3738: [C++] Parse ISO8601-like timestamps in CSV columns

Second granularity is allowed (we might want to add support for fractions of seconds, e.g. in the "YYYY-MM-DD[T ]hh:mm:ss.ssssss" format).

Timestamp conversion also participates in CSV type inference, since it's unlikely to produce false positives (e.g. a semantically "string" column that would be entirely made of valid timestamp strings).

Author: Antoine Pitrou <[email protected]>

Closes #2952 from pitrou/ARROW-3738-csv-timestamps and squashes the following commits:

005a6e3f7 <Antoine Pitrou> ARROW-3738:  Parse ISO8601-like timestamps in CSV columns

* ARROW-2653: [C++] Refactor hash table support

1. Get rid of all macros and sprinkled out hash table handling code

2. Improve performance by more careful selection of hash functions
   (and better collision resolution strategy)

Integer hashing benefits from a very fast specialization.
Small string hashing benefits from a fast specialization with less branches
and less computation.
Generic string hashing falls back on hardware CRC32 or Murmur2-64, which has probably sufficient
performance given the typical distribution of string key length.

3. Add some tests and benchmarks

Author: Antoine Pitrou <[email protected]>

Closes #3005 from pitrou/ARROW-2653 and squashes the following commits:

0c2dcc3de <Antoine Pitrou> ARROW-2653:  Refactor hash table support

* ARROW-4017: [C++] Move vendored libraries in dedicated directory

Also update mapbox::variant to v1.1.5 (I'm not sure which version was previously vendored).

Author: Antoine Pitrou <[email protected]>

Closes #3184 from pitrou/ARROW-4017-vendored-libraries and squashes the following commits:

fe69566d7 <Antoine Pitrou> ARROW-4017:  Move vendored libraries in dedicated directory

* ARROW-3819: [Packaging] Update conda variant files to conform with feedstock after compiler migration

Crossbow builds:
- [kszucs/crossbow/build-403](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-403)
- [kszucs/crossbow/build-404](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-404)
- [kszucs/crossbow/build-405](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-405)
- [kszucs/crossbow/build-406](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-406)
- [kszucs/crossbow/build-407](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-407)

Author: Krisztián Szűcs <[email protected]>

Closes #3368 from kszucs/conda_forge_migration and squashes the following commits:

e0a5a6422 <Krisztián Szűcs>  use --croot
3749a2ff9 <Krisztián Szűcs>  git on osx; set FEEDSTOSK_ROOT
ca7217d7f <Krisztián Szűcs>  support channel sources from variant files
33cba7118 <Krisztián Szűcs>  fix conda path on linux
2505828b7 <Krisztián Szűcs> fix task names
0c4a10bc3 <Krisztián Szűcs> conda recipes for python 3.7; compiler migration

* ARROW-4198: [Gandiva] Added support to cast timestamp

Added howard hinnant date project as a third party library.
Used system timezone database for timezone information.

Author: Antoine Pitrou <[email protected]>
Author: shyam <[email protected]>

Closes #3352 from shyambits2004/timestamp and squashes the following commits:

882a5cf6 <Antoine Pitrou> Tweak wording of vendored date library README
7f524805 <Antoine Pitrou> Small tweaks to license wording for the date library
9ee8eff4 <shyam> ARROW-4198 :  Added support to cast timestamp

* ARROW-4546: Update LICENSE.txt with parquet-cpp licenses

- Ported parquet-cpp external license references
- Removed spurious duplicates (boost, mapbox)

Author: François Saint-Jacques <[email protected]>

Closes #3692 from fsaintjacques/ARROW-4546-parquet-license and squashes the following commits:

a5aa81e48 <François Saint-Jacques> ARROW-4546: Update LICENSE with parquet-cpp licenses

* ARROW-4690: Building TensorFlow compatible wheels for Arrow

This includes a Dockerfile that can be used to create wheels based on ubuntu 14.04 which are compatible with TensorFlow.

TODO before this can be merged:
- [x] write documentation how to build this
- [x] do more testing

Author: Philipp Moritz <[email protected]>

Closes #3766 from pcmoritz/ubuntu-wheels and squashes the following commits:

f708c29b <Philipp Moritz> remove tensorflow import check
599ce2e7 <Philipp Moritz> fix manylinux1 build instructions
f1fbedf8 <Philipp Moritz> remove tensorflow hacks
bf47f579 <Philipp Moritz> improve wording
4fb1d38b <Philipp Moritz> add documentation
078be98b <Philipp Moritz> add licenses
0ab0bccb <Philipp Moritz> cleanup
c7ab1395 <Philipp Moritz> fix
eae775d5 <Philipp Moritz> update
2820363e <Philipp Moritz> update
ed683309 <Philipp Moritz> update
e8c96ecf <Philipp Moritz> update
8a3b19e8 <Philipp Moritz> update
0fcc3730 <Philipp Moritz> update
fd387797 <Philipp Moritz> update
78dcf42d <Philipp Moritz> update
7726bb6a <Philipp Moritz> update
82ae4828 <Philipp Moritz> update
f44082ea <Philipp Moritz> update
deb30bfd <Philipp Moritz> update
50e40320 <Philipp Moritz> update
58f6c121 <Philipp Moritz> update
5e8ca589 <Philipp Moritz> update
5fa73dd5 <Philipp Moritz> update
595d0fe1 <Philipp Moritz> update
79006722 <Philipp Moritz> add libffi-dev
9ff5236d <Philipp Moritz> update
ca972ad0 <Philipp Moritz> update
60805e22 <Philipp Moritz> update
7a66ba35 <Philipp Moritz> update
1b56d1f1 <Philipp Moritz> zlib
eedef794 <Philipp Moritz> update
3ae2b5ab <Philipp Moritz> update
df297e1c <Philipp Moritz> add python build script
358e4f85 <Philipp Moritz> update
65afcebe <Philipp Moritz> update
11ccfc7e <Philipp Moritz> update
f1784245 <Philipp Moritz> update
b3039c8b <Philipp Moritz> update
9064c3ca <Philipp Moritz> update
c39f92a9 <Philipp Moritz> install tensorflow
ec4e2210 <Philipp Moritz> unicode
773ca2b6 <Philipp Moritz> link python
b690d64a <Philipp Moritz> update
5ce7f0d6 <Philipp Moritz> update
a9302fce <Philipp Moritz> install python-dev
f12e0cfe <Philipp Moritz> multibuild python 2.7
9342006b <Philipp Moritz> add git
ab2ef8e7 <Philipp Moritz> fix cmake install
cef997b5 <Philipp Moritz> install cmake and ninja
5d560faf <Philipp Moritz> add build-essential
adf2f705 <Philipp Moritz> add curl
f8d66963 <Philipp Moritz> remove xz
e439356e <Philipp Moritz> apt update
79fe557e <Philipp Moritz> add docker image for ubuntu wheel

* ARROW-4611: [C++] Rework CMake logic

This changes refactors much of our CMake logic to make use of built-in CMake paths and remove custom logic. It also switches to the use of more modern dependency management via CMake targets instead of plain text variables.

This includes the following fixes:

- Use CMake's standard find features, e.g. respecting the `*_ROOT` variables: https://issues.apache.org/jira/browse/ARROW-4383
- Add a Dockerfile for Fedora: https://issues.apache.org/jira/browse/ARROW-4730
- Add a Dockerfile for Ubuntu Xenial: https://issues.apache.org/jira/browse/ARROW-4731
- Add a Dockerfile for Ubuntu Bionic: https://issues.apache.org/jira/browse/ARROW-4849
- Add a Dockerfile for Debian Testing: https://issues.apache.org/jira/browse/ARROW-4732
- Change the clang-7 entry to use system packages without any dependency on conda(-forge): https://issues.apache.org/jira/browse/ARROW-4733
- Support `double-conversion<3.1`: https://issues.apache.org/jira/browse/ARROW-4617
- Use google benchmark from toolchain: https://issues.apache.org/jira/browse/ARROW-4609
- Use the `compilers` metapackage to install the correct binutils when using conda, otherwise system binutils to fix https://issues.apache.org/jira/browse/ARROW-4485
- RapidJSON throws compiler errors with GCC 8+ https://issues.apache.org/jira/browse/ARROW-4750
- Handle `EXPECT_OK` collision: https://issues.apache.org/jira/browse/ARROW-4760
- Activate flight build in ci/docker_build_cpp.sh: https://issues.apache.org/jira/browse/ARROW-4614
- Build Gandiva in the docker containers: https://issues.apache.org/jira/browse/ARROW-4644

Author: Uwe L. Korn <[email protected]>

Closes #3688 from xhochy/build-on-fedora and squashes the following commits:

88e11fcfb <Uwe L. Korn> ARROW-4611:  Rework CMake logic

* ARROW-4900: [C++] polyfill __cpuidex on mingw-w64

Author: Jeroen Ooms <[email protected]>

Closes #3923 from jeroen/cpuidex and squashes the following commits:

59429f02 <Jeroen Ooms> Mention mingw-w64 polyfill in LICENSE.txt
28619330 <Jeroen Ooms> run clang-format
9e780465 <Jeroen Ooms> polyfill for __cpuidex on mingw-w64

* ARROW-5252: [C++] Use standard-compliant std::variant backport

Replace mapbox::variant with Michael Park's variant implementation.

Author: Antoine Pitrou <[email protected]>

Closes #4259 from pitrou/ARROW-5252-variant-backport and squashes the following commits:

03dbc0e14 <Antoine Pitrou> ARROW-5252:  Use standard-compliant std::variant backport

* ARROW-5648: [C++] Avoid using codecvt

Some antiquated C++ build chains miss the standard <codecvt> header.
Use a small vendored UTF8 implementation instead.

Author: Antoine Pitrou <[email protected]>

Closes #4616 from pitrou/ARROW-5648-simple-utf8 and squashes the following commits:

54b1b2f68 <Antoine Pitrou> ARROW-5648:  Avoid using codecvt

* ARROW-4800: [C++] Introduce a Result<T> class

- Mostly an adaptation of StatusOr from google/asylo (both header and unittests).
- Demonstrate usage in ipc/writer*
- If this PR is accepted I can do a follow-up PR to port over useful testing utilities.

Author: Micah Kornfield <[email protected]>
Author: emkornfield <[email protected]>

Closes #4501 from emkornfield/error_or and squashes the following commits:

82e48c453 <Micah Kornfield> fix linter.  Add unittest.
aad79b183 <Micah Kornfield> rename to Return
1d7dbfbcd <Micah Kornfield> Use bkietz's suggestion.  cleanup test
d8e80431c <Micah Kornfield> fix compile errors
cc626079c <Micah Kornfield> try non anonyous namespace
86e43ac89 <Micah Kornfield> export before
8a4b3ccf3 <Micah Kornfield> try explicit instantation for msvc
f12f6d027 <Micah Kornfield> Revert "remove ARROW_EXPORT from test and try add link to gtest_main"
9581b05b1 <Micah Kornfield> remove ARROW_EXPORT from test and try add link to gtest_main
7a21e577a <Micah Kornfield> try exporting private test classes for appveyor
0b44389da <Micah Kornfield> fix format
de9d2d0d9 <Micah Kornfield> remove duplicate code.  fix format
504fcd7bf <emkornfield> Update cpp/src/arrow/error_or.h
31d9906c5 <Micah Kornfield> use vendored variant
aa540da09 <Micah Kornfield> fix append
6f459a5f9 <Micah Kornfield> address review comments
7a1e54de4 <Micah Kornfield> Add Arrow export
2886733fb <Micah Kornfield> use ARROW_RETURN_NOT_OK
f7ed04f00 <Micah Kornfield> address comments
3e2b3691a <Micah Kornfield> follow recommendation of docs for macro
d5e43d034 <Micah Kornfield> ARROW-4800: Introduce an ErrorOr class

* ARROW-5683: [R] Add snappy to Rtools Windows builds

Following https://github.com/r-windows/rtools-backports/pull/7 and https://github.com/r-windows/rtools-packages/pull/19, R Windows builds can now support Snappy compression. This patch tweaks the necessary files and unskips tests, in addition to some other PKGBUILD script cleanup.

Passing build here: https://ci.appveyor.com/project/nealrichardson/arrow/builds/25507388

Author: Neal Richardson <[email protected]>

Closes #4681 from nealrichardson/r-snappy and squashes the following commits:

0996a30e2 <Neal Richardson> Add license info for rtools-backports
1af08413c <Neal Richardson> Revert "Only run mine for now"
a5d967f0c <Neal Richardson> Get snappy from backports after all
a1c6390b6 <Neal Richardson> -lsnappy
a41544914 <Neal Richardson> Only run mine for now
b911568df <Neal Richardson> More comments; unskip tests
d7a74192a <Neal Richardson> Add snappy to PKGBUILD; prune some default cmake flags and add comment

* ARROW-5725: [Crossbow] Port conda recipes to azure pipelines

- [x] artifact uploading
- [x] osx build
- [x] win build (using appveyor, because of https://github.com/conda-forge/conda-forge.github.io/issues/703)
- [x] linux build
- [x] package gandiva

Author: Krisztián Szűcs <[email protected]>

Closes #4649 from kszucs/crossbow-azure and squashes the following commits:

20f6cecd3 <Krisztián Szűcs> update conda-win artifact patterns
0c470bb6a <Krisztián Szűcs> readme fixes
b6a86076b <Krisztián Szűcs> configure output folder for the artifacts
68d88a833 <Krisztián Szűcs> combine status api and checks api
111964957 <Krisztián Szűcs> fix artifact patterns
95cb44217 <Krisztián Szűcs> use FETCH_HEAD in the CI templates
cbb9c9ce7 <Krisztián Szűcs> rat
8f58839e1 <Krisztián Szűcs> use the default python on osx
f75efae18 <Krisztián Szűcs> use pip module for installing dependencies
2a598945d <Krisztián Szűcs> tabulate win template
9db3db1dd <Krisztián Szűcs> use pip3
2aa497748 <Krisztián Szűcs> azure template for docker tests
750f624c1 <Krisztián Szűcs> asset uploading script
e0d8fb9b2 <Krisztián Szűcs> git commit additional log
7fbce5df8 <Krisztián Szűcs> use appveyor for the win packages; upload assets scripts
d6c4ce9fa <Krisztián Szűcs> touch done_canary
611222e28 <Krisztián Szűcs> docker shm
ba0e88cce <Krisztián Szűcs> update old templates; query cxx include paths
0d76f1364 <Krisztián Szűcs> win
0c8464a4b <Krisztián Szűcs> parquet-cpp depend on exact arrow-cpp version
aecc2b19e <Krisztián Szűcs> displayName order
c42ebf595 <Krisztián Szűcs> quoting gandiva flags
8abd34779 <Krisztián Szűcs> move displayName after the script
bdf705ff0 <Krisztián Szűcs> OSX configuration
a874d1f99 <Krisztián Szűcs> gandiva flags
f50af1b51 <Krisztián Szűcs> path gymnastics
5cd9fa0b5 <Krisztián Szűcs> use pyarrow as recipe root
4b005892f <Krisztián Szűcs> try to fix assertion error
9ef81c567 <Krisztián Szűcs> use feedstock_root
0e826ac43 <Krisztián Szűcs> fix recipe directories
adae7c0f3 <Krisztián Szűcs> build all three recipes
7b60c9d07 <Krisztián Szűcs> pass arrow_version
ce740d799 <Krisztián Szűcs> fixing build_steps.sh path
df31ff7dc <Krisztián Szűcs> trying to fix feedstock and recipe roots
501d55341 <Krisztián Szűcs> set config
b2425e650 <Krisztián Szűcs> fix working directory
53e8eb24f <Krisztián Szűcs> don't use azure templates
b2fd21a24 <Krisztián Szűcs> use variables ]
2037f78fc <Krisztián Szűcs> port conda recipes to azure pipelines

* ARROW-5934: [Python] Bundle arrow's LICENSE with the wheels

Setuptools can handle multiple license files, but they get overwritten because of the same naming, so move the content python/LICENSE.txt to the top level LICENSE.txt.  I've investigated a solution to concatenate these, but it would require to extend multiple distutils/setuptools/wheel commands in a really error-prone fashion.
- Distribute the top level LICENSE.txt file with the wheels
- Distribute the top level LICENSE.txt with the source distribution, however `python setup.py bdist` leaves LICENSE.txt file as a garbage in the rebository, it cannot cleanup properly probably because the relative directory reference.

Add 3rdparty licenses because of the static-linkage in the wheels:
- lz4: https://github.com/lz4/lz4/blob/dev/lib/LICENSE#L11-L13
- zstd: https://github.com/facebook/zstd/blob/dev/LICENSE#L13-L15
- rapidjson: because of transitive dependencies https://github.com/Tencent/rapidjson/blob/master/license.txt#L21
- uriparser: https://github.com/uriparser/uriparser/blob/master/COPYING#L15-L18
- double-conversion: https://github.com/google/double-conversion/blob/3f9cd30e1bca91c0a036ad8b2b7eb8e2d2290dd2/LICENSE#L8-L11
- snappy: https://github.com/google/snappy/blob/master/COPYING#L10-L13
- brotli: https://github.com/google/brotli/blob/master/LICENSE#L10
- jemalloc is already added
- protobuf is already added
- gflags: https://github.com/gflags/gflags/blob/master/COPYING.txt#L10-L13
- c-ares: https://github.com/c-ares/c-ares/blob/master/LICENSE.md
- glog: https://github.com/google/glog/blob/master/COPYING#L10-L13
- boost is already added
- flatbuffers is already added
- grpc: https://github.com/grpc/grpc/blob/master/NOTICE.txt
- gtest: do we redistribute any binaries from it?
- gbenchmark: do we redistribute any binaries from it?
- apache thrift added
- apache orc added
- zlip is redistributed as a shared library with all wheels (in the future we should switch to static linkage)
- openssl and libcrypto is redistributed as shared libraries with the windows wheels, so added the openssl license preceding version 3

bzip2 doesn't require to include its license with binary redistributions: https://github.com/asimonov-im/bzip2/blob/master/LICENSE

Authored-by: Krisztián Szűcs <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>

Closes #4880 from kszucs/wheel-license and squashes the following commits:

b839964a2 <Krisztián Szűcs> openssl
f220c6609 <Krisztián Szűcs> zlib
064444d25 <Krisztián Szűcs> distribute notice as well
bee17ac0e <Krisztián Szűcs> orc notice
25f738cf8 <Krisztián Szűcs> Thrift and ORC
54a643bd8 <Krisztián Szűcs> grpc
baa77962f <Krisztián Szűcs> glog
0a3070e5a <Krisztián Szűcs> c-ares
749574f00 <Krisztián Szűcs> gflags
52d7e19a8 <Krisztián Szűcs> typo
0697927c6 <Krisztián Szűcs> brotli
51b9264fe <Krisztián Szűcs> re2
5418a0e2c <Krisztián Szűcs> snappy and double-conversion
5647ab923 <Krisztián Szűcs> lz4 and rapidjson
29fa7e046 <Krisztián Szűcs> wheel licensing

* ARROW-6258: [R] Add macOS build scripts

When installing the R package from source on macOS, if the configure script cannot find libarrow with pkgconfig, and if `apache-arrow` has not been installed via Homebrew (neither of which is the case on CRAN), an "autobrew" step happens: a script is downloaded and sourced, which uses a fork of Homebrew to download and install binary dependencies for bundling with the R package.

This patch alters the configure script to let you `FORCE_AUTOBREW`, which is useful for testing, and it will use a local `autobrew` script file, if found, rather than downloading it. The patch also adds the `autobrew` script and the `apache-arrow.rb` brew formula to the `r/tools` directory, alongside the similar script that downloads the Arrow C++ binary on Windows. The two scripts are copied exactly from their "upstream" versions (noted on L18 of each file), with two minor modifications: (1) `autobrew` will use a local `apache-arrow.rb` formula if the file exists, and (2) the formula adds the `head` reference so you can `brew install --build-from-source --HEAD apache-arrow.rb` and pull the latest master branch of `apache/arrow` from GitHub.

See this in action at https://github.com/nealrichardson/arrow-r-nightly/blob/34d27bf482fa1d9f490003a8396fabdff5beea37/.travis.yml. Ordinarily I would add a Travis-CI job to `apache/arrow` for this, but I know we're anxious not to delay build times further, so I'll run this job nightly. Nightly runs will solve https://issues.apache.org/jira/browse/ARROW-5134, and it will also allow us to host an R package repository with nightly binaries for macOS (and Windows too, using the existing Appveyor config + deploy to bintray).

To install a binary from that repository on macOS, `install.packages("arrow", repos="https://dl.bintray.com/nealrichardson/arrow-r")`.

One TODO: get @jeroen 's approval to include these scripts here under the Apache license and add a citation to LICENSE.txt.

Closes #5095 from nealrichardson/force-autobrew and squashes the following commits:

499296d37 <Neal Richardson> Add license information for autobrew
a63765bd7 <Neal Richardson> :rat:
1a8a77700 <Neal Richardson> Add autobrew scripts
f48e6ba1e <Neal Richardson> Check for local autobrew script
284fd871b <Neal Richardson> Add FORCE_AUTOBREW arg to r/configure to support testing

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: François Saint-Jacques <[email protected]>

* ARROW-6454: [LICENSE] Add LLVM's license due to static linkage

Also adding to the conda-forge recipe https://github.com/conda-forge/arrow-cpp-feedstock/pull/96, which we should keep until the next release.

Closes #5250 from kszucs/llvm-license and squashes the following commits:

f732908ad <Krisztián Szűcs> add LLVM's license

Authored-by: Krisztián Szűcs <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>

* ARROW-4649: [C++/CI/R] Add nightly job that tests the homebrew formula

This doesn't just test `brew install apache-arrow --HEAD` as the ticket suggested--it brings the Homebrew formula under our source control and tests against that. This will enable us to know when we need to update the Homebrew formula due to changes in dependencies or cmake configuration.

I've been testing out this approach on a different repository: https://travis-ci.org/nealrichardson/arrow-brew/builds/568531245

Closes #5360 from nealrichardson/homebrew-nightly and squashes the following commits:

4a50a3779 <Neal Richardson> Shuffle again
ea3f9b5fc <Neal Richardson> Move autobrew script again and fix URL to match regular homebrew
851ac6eae <Neal Richardson> Sort hunks :muscle:
1d6512b2a <Neal Richardson> Move homebrew formulae inside dev/tasks. Update autobrew package version in release script
ff1489ea7 <Neal Richardson> Use regular snapshot version for homebrew formula
79e9191f0 <Neal Richardson> Remove autoconf from homebrew formula because new jemalloc no longer requires it
8816279b4 <Neal Richardson> Fix autobrew audit check
b25ac6de6 <Neal Richardson> Fix licensing/:rat: for ci/apache-arrow.rb
89dea4ff0 <Neal Richardson> Parametrize homebrew task and also test 'autobrew'
d61290a87 <Neal Richardson> Remove bottle shas
a133385c1 <Neal Richardson> Fix test (hopefully?)
6d00fe444 <Neal Richardson> Fix regular expression
34af2a911 <Neal Richardson> Add missing test assertion
40fc370c6 <Neal Richardson> Attempt to set the homebrew formula arrow version in the release script
ea4368e49 <Neal Richardson> Re-alphabetize the seds in 00-prepare.sh
262c2415b <Neal Richardson> Rename homebrew task yml; add to LICENSE
163c67a9d <Neal Richardson> Add homebrew formula and crossbow nightly task

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>

* ARROW-6678: [C++][Parquet] Binary data stored in Parquet metadata must be base64-encoded to be UTF-8 compliant

I have added a simple base64 implementation (Zlib license) to arrow/vendored from

https://github.com/ReneNyffenegger/cpp-base64

Closes #5493 from wesm/ARROW-6678 and squashes the following commits:

c058e8694 <Wes McKinney> Simplify, add MSVC exports
06f75cd5b <Wes McKinney> Fix Python unit test that needs to base64-decode now
eabb121ba <Wes McKinney> Fix LICENSE.txt, add iwyu export
b3a584a29 <Wes McKinney> Add vendored base64 C++ implementation and ensure that Thrift KeyValue in Parquet metadata is UTF-8

Authored-by: Wes McKinney <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>

* ARROW-6679: [RELEASE] Add license info for the autobrew scripts

cf. https://github.com/jeroen/autobrew/blob/gh-pages/LICENCE.txt

Closes #5501 from nealrichardson/autobrew-license and squashes the following commits:

3e790f5b8 <Neal Richardson> MIT license for autobrew

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>

* ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions

## Projecting ideas from ursabot

### Parametric docker images

The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:

```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```

Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`

### Use *_build.sh and *_test.sh for each job

The docker images provide the environment, and each language backend usually should implement  two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.

With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.

## Using GitHub Actions for running the builds

Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.

GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.

### Workflow

A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.

## Feature parity with the current builds

Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.

### Travis-CI

- [x] **Lint, Release tests**:
  - `Lint / C++, Python, R, Rust, Docker, RAT`
  - `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
  - `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
  - `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
  - `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
  - `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
  - `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
  - `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
  - `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
  - `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
  - `C++ / AMD64 Debian 10 C++`: with GCC 8.3
  - `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
  - `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
  - `C GLib / AMD64 Ubuntu 18.04 C GLib`
  - `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
  - `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
  - `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
  - `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
  - `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
  - `R / AMD64 Conda R 3.6`: with libarrow
  - `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow

### Appveyor

- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
  - `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
  - `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
  - `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~

### Github Actions

- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
  - `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
  - `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
  - `C++ / AMD64 Debian 10 C++`: with GCC 8.3
  - `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
  - `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
  - `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
  - `Lint / C++, Python, R, Rust, Docker, RAT`
  - `Dev / Source Release`

### Nightly Crossbow tests

The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.

Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration

# TODOs left:

- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~

## Follow-up JIRAs:

- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
    Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
    ```
    Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
    1364
    : CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
    1365
    LLVM ERROR: inconsistency in registered CommandLine options
    1366
    /build/cpp/src/gandiva
    ```
- [JS][CI] NodeJS build fails on Github Actions Windows node
    ```
    > NODE_NO_WARNINGS=1 gulp build
    # 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
    # operable program or batch file.
    # npm ERR! code ELIFECYCLE
    # npm ERR! errno 1
    # npm ERR! [email protected] build: `NODE_NO_WARNINGS=1 gulp build`
    # npm ERR! Exit status 1
    # npm ERR!
    # npm ERR! Failed at the [email protected] build script.
    # npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
    ```

Closes #5589 from kszucs/docker-refactor and squashes the following commits:

5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.

Authored-by: Krisztián Szűcs <[email protected]>
Signed-off-by: Krisztián Szűcs <[email protected]>

* ARROW-6633: [C++] Vendor double-conversion library

Since this is both a mandatory dependency and a small-ish library, vendor it to make the build chain simplier. Also, make its use private, because of Windows DLL exports. This incurs a small (~8-10%) performance hit on specific workloads.

Closes #5832 from pitrou/ARROW-6633-vendor-double-conversion and squashes the following commits:

b1e12c7bc <Antoine Pitrou> Remove unneeded code
85877648e <Antoine Pitrou> Add license
3b89b191e <Antoine Pitrou> Make use of double-conversion private.
9e1da51d1 <Antoine Pitrou> ARROW-6633:  Vendor double-conversion library

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: François Saint-Jacques <[email protected]>

* ARROW-7178: [C++] Vendor forward compatible std::optional

Add a version of std::optional vendored from
https://github.com/martinmoene/optional-lite using git tag v3.2.0

Closes #5849 from gawain-bolton/ARROW-7178_vendor_forward_compatible_std_optional and squashes the following commits:

825213c4e <gawain.bolton> Amend LICENSE.txt for cpp/src/arrow/vendored/optional.hpp
1bb25a962 <gawain.bolton> ARROW-7178:  Vendor forward compatible std::optional

Authored-by: gawain.bolton <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>

* ARROW-7169: [C++] Vendor uriparser library

The library is only used internally (the types and headers are not exposed).
Vendoring it makes it easier for core functionality to depend on it.

Closes #5865 from pitrou/ARROW-7169-vendor-uriparser and squashes the following commits:

83bb7c22a <Antoine Pitrou> ARROW-7169:  Vendor uriparser library

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Krisztián Szűcs <[email protected]>

* ARROW-6341: [Python] Implement low-level bindings for Dataset

Closes #5237 from kszucs/ARROW-6341 and squashes the following commits:

45121f77a <Krisztián Szűcs> Fix tests for MockFs
069d8f55e <Krisztián Szűcs> Don't expose SimpleDataSource
48b5556ef <Krisztián Szűcs> Test projected partitions
27dbe56ab <Krisztián Szűcs> Don't deprecate RandomFileAccess just remove in favor of CRandomFileAccess
64ca71245 <Krisztián Szűcs> Execute instead of scan
bd0d1d2e7 <Krisztián Szűcs> more type_name
f2ab5ebfe <Krisztián Szűcs> type_name
3988865ca <Krisztián Szűcs> Rebase again
b8949a3ce <Krisztián Szűcs> Result iterator api
7553caa96 <Krisztián Szűcs> Clang format
a1d82546f <Krisztián Szűcs> Remove ScanContext
6260734a3 <Krisztián Szűcs> Fix api changes
e6a562356 <Krisztián Szűcs> Expose root_partition setter; resolve a couple of review issues
c9ba0fb93 <Krisztián Szűcs> Fix review comments
f589ecb23 <Krisztián Szűcs> Removed todo notes
9b38a40cf <Krisztián Szűcs> Docstring additions
3210e9160 <Krisztián Szűcs> Fixing review issues
4384b74cf <Krisztián Szűcs> Enable PYARROW_BUILD_DATASET
f52e735a0 <Krisztián Szűcs> Remove DataFragment and ScanOptions
13eaf46a0 <Krisztián Szűcs> Remove move workaround
620ba6ffa <Krisztián Szűcs> schema as property
e9f77bd6b <Krisztián Szűcs> Expose root_partition
01510bcf8 <Krisztián Szűcs> Some docstrings
5beb0d26c <Krisztián Szűcs> Pxd definition fixes
f89dc4913 <Krisztián Szűcs> Data fragments
c9881c858 <Krisztián Szűcs> Downcast data fragments
032a4358c <Krisztián Szűcs> More expressions and testing
1a4a8544a <Krisztián Szűcs> Fix import errors if dataset is not enabled
2da1b5c76 <Krisztián Szűcs> Please the linters
d1bc74efe <Krisztián Szűcs> HivePartitionScheme
bf5dd17f4 <Krisztián Szűcs> Release the gil for std::move
a76dc6c3c <Krisztián Szűcs> Remove the move headers from flight
8cdfe1054 <Krisztián Szűcs> Expose more methods
53b64910e <Krisztián Szűcs> Expose Scalar/Comparison/Boolean expressions
444ae58a0 <Krisztián Szűcs> Expressions and scalar wrapping
01029a666 <Krisztián Szűcs> Test parquet data discovery
2e416ea8f <Krisztián Szűcs> Expressions
d14cf502b <Krisztián Szűcs> PartitionScheme
bd6e1d656 <Krisztián Szűcs> FileSystemDataSourceDiscovery
0c0e3752f <Krisztián Szűcs> Working scanner
18cfd949b <Krisztián Szűcs> Resolve issues with handling iterator results
43c3d2b…
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant