Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Dummy PR to check maint-13.0.0 status #36616

Closed
wants to merge 35 commits into from
Closed

WIP: Dummy PR to check maint-13.0.0 status #36616

wants to merge 35 commits into from

Commits on Jul 11, 2023

  1. GH-36284: [Python][Parquet] Support write page index in Python API (#…

    …36290)
    
    ### Rationale for this change
    
    Support `write_page_index` in Parquet Python API
    
    ### What changes are included in this PR?
    
    support `write_page_index` in properties
    
    ### Are these changes tested?
    
    Currently not
    
    ### Are there any user-facing changes?
    
    User can generate page index here.
    
    * Closes: #36284
    
    Lead-authored-by: mwish <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Co-authored-by: mwish <[email protected]>
    Co-authored-by: Alenka Frim <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    4 people authored and raulcd committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    92c5d74 View commit details
    Browse the repository at this point in the history
  2. GH-36599: [MATLAB] Bump libmexclass version to 3465900 (#36600)

    ### Rationale for this change
    
    We recently made some improvements to `libmexclass` that include creating a class named `libmexclass.proxy.Identifier` which represents proxy ids. This class will be helpful when we create `arrowy.array.Array` objects from existing proxy ids, which used to be represented as scalar `uint64` values.
    
    ### What changes are included in this PR?
    
    1. Bumps the libmexclass version the MATLAB Interface depends on to [#3465900](mathworks/libmexclass@3465900) 
    
    ### Are these changes tested?
    
    No tests needed.
    
    ### Are there any user-facing changes?
    
    No.
    
    * Closes: #36599
    
    Authored-by: Sarah Gilmore <[email protected]>
    Signed-off-by: Kevin Gurney <[email protected]>
    sgilmore10 authored and raulcd committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    b6a3465 View commit details
    Browse the repository at this point in the history
  3. GH-36568: [Go] Include Timestamp Zone in ValueStr (#36569)

    ### Rationale for this change
    While trying to fix an issue with Snowflake ADBC timestamp handling, I came across this as part of the problem.
    
    ### What changes are included in this PR?
    Adds the timezone string indicator to the output of `ValueStr` for a timestamp array.
    
    * Closes: #36568
    
    Authored-by: Matt Topol <[email protected]>
    Signed-off-by: Matt Topol <[email protected]>
    zeroshade authored and raulcd committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    608669f View commit details
    Browse the repository at this point in the history
  4. GH-36311: [C++] Fix integer overflows in utf8_slice_codeunits (#36575)

    ### Rationale for this change
    
    The default value for the `SliceOptions::stop` is `INT64_MAX`, which isn't considered in several internal calculations - resulting in integer overflows and unexpected behavior when `stop` isn't provided.
    
    Also note that running the included tests without the fixes should result in ubsan errors (it did for me, at least).
    
    ### What changes are included in this PR?
    
    - Adds some logic to `SliceCodunitsTransform` that handles potential overflows
    - Adds tests for cases where the `start` param is positive/negative and `stop` is the maximum value
    
    **Update**
    Discovered that `utf8_slice_codeunits` deviates from Python array behavior when `stop=None` and `step < 0`, so further changes were made:
    - Handles `INT64_MIN` for `SliceOptions::stop` on C++ side, adds more tests.
    - Updates Python bindings for `SliceOptions` so that the default value when `stop=None` (`sys.maxsize`) is negated when `step < 0`
    - Adds `None` as a possible `stop` value in Python tests
    
    ### Are these changes tested?
    
    Yes (tests are included)
    
    ### Are there any user-facing changes?
    
    In theory, altering the behavior of `utf8_slice_codepoints` when `stop=None` and `step < 0` could be considered a breaking change. That being said, the current implementation produces incorrect results whenever `None` is even used, so it probably isn't one in practice...
    
    * Closes: #36311
    
    Authored-by: benibus <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    benibus authored and raulcd committed Jul 11, 2023
    Configuration menu
    Copy the full SHA
    0f74731 View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2023

  1. GH-36271: [R] Split out R6 classes and convenience functions (#36394)

    Closes: #36271 
    * Closes: #36271
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    thisisnic authored and raulcd committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    dc6954a View commit details
    Browse the repository at this point in the history
  2. GH-36598: [C++][MinGW] Fix build failure with Protobuf 23.4 (#36606)

    ### Rationale for this change
    
    There are 2 problems:
    
    * `FindProtobuf.cmake` provided by CMake is incomplete with Protobuf 23.4.
    * Misses `-DPROTOBUF_USE_DLLS` for building Substrait related files.
    
    ### What changes are included in this PR?
    
    * We need to use `protobuf-config.cmake` provided by Protobuf instead of `FindProtobuf.cmake` provided by CMake because `FindProtobuf.cmake` misses `absl::status` dependency.
    * Accept Protobuf 23.4.
    * Use `PROTOBUF_USE_DLLS` when we build Substrait related files.
    * Use `Boost_INCLUDE_DIRS` instead of `Boost_INCLUDE_DIR` because `Boost_INCLUDE_DIR` isn't defined in `BoostConfig.cmake`.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    * Closes: #36598
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    kou authored and raulcd committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    a7a6034 View commit details
    Browse the repository at this point in the history
  3. GH-36482: [C++][CI] Fix sporadic test failures in AsofJoinBasicTest (#…

    …36499)
    
    ### What changes are included in this PR?
    
    The key hasher is invalidated before the first invocation of `GetKey` (via `GetLatestKey`) after a new batch arrives. In the pre-PR code, this invalidation happens within `Advance`, which is called from `AdvanceAndMemoize` only after `GetLatestKey` is called. The change adds synchronization between the input-receiving- and processing- threads, because avoiding that would require a more complicated and brittle change, e.g., one that involves detecting in the processing thread when a new batch was added to the queue in order to invalidate the key hasher at that time.
    
    ### Are these changes tested?
    
    Yes, by existing tests.
    
    ### Are there any user-facing changes?
    
    No.
    
    **This PR contains a "Critical Fix".**
    * Closes: #36482
    
    Authored-by: Yaron Gvili <[email protected]>
    Signed-off-by: Weston Pace <[email protected]>
    rtpsw authored and raulcd committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    f5a4f12 View commit details
    Browse the repository at this point in the history
  4. GH-36629: [CI][Python] Skip dask tests due to our non-nanosecond chan…

    …ges in arrow->pandas conversion (#36630)
    
    ### Rationale for this change
    
    Due to the changes on #33321 a dask test started failing.
    
    ### What changes are included in this PR?
    
    Skip the test in the meantime
    
    ### Are these changes tested?
    
    Yes, with crossbow
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #36629
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    raulcd committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    e77e13a View commit details
    Browse the repository at this point in the history
  5. GH-36641: [C++] Remove reference to acero from non-acero file (#36650)

    ### Rationale for this change
    
    Files in modules which do not depend on the acero module should not reference files inside the acero module.
    
    ### What changes are included in this PR?
    
    There were no changes to the body of any functions.  I simply moved functions around so that the acero include was no longer needed.  There were some conflicts that arose between the class `bit_util` and the namespace `bit_util` and so I got rid of the class in favor of the namespace as that is more similar to how we handle `bit_util` elsewhere.
    
    ### Are these changes tested?
    
    Sort of.  I would like to add an AVX2 CI system as well.  I'm not confident any of the CI builds are building with AVX2 enabled.  Also, even if we have an AVX2 CI system it would not have caught this issue since the code was only needed definitions from the acero header and was not relying on any actual compiled symbols.  However, I think setting up tests to catch this sort of invalid include are beyond the scope of this PR.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: #36641
    
    Lead-authored-by: Weston Pace <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    2 people authored and raulcd committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    629c6a1 View commit details
    Browse the repository at this point in the history
  6. GH-36659: [Python] Fix pyarrow.dataset.Partitioning.__eq__ when compa…

    …ring with other type (#36661)
    
    ### Rationale for this change
    
    Ensure that `part == other` doesn't crash with `other` is not a Partitioning instance
    
    Small follow-up on #36462
    
    * Closes: #36659
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    jorisvandenbossche authored and raulcd committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    b4c3b41 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2023

  1. GH-36456: [R] Link to correct version of OpenSSL when using autobrew (#…

    …36551)
    
    ### Rationale for this change
    
    The r-binary-packages job (which uses autobrew) and the autobrew nightly jobs are failing because they are linking to a different version of OpenSSL than the package was built against. I believe this occurred because Arrow and its dependencies are built against the autobrew headers which included openssl. The `ssl` and `crypto` libraries weren't explicitly linked, so I think whatever LibreSSL fork MacOS installs by default was getting linked. This was perhaps compatible using the version of autobrew for High Sierra/the version of LibreSSL on High Sierra but was not compatible with the version of autobrew for Big Sur/the version of LibreSSL on Big Sur.
    
    ### What changes are included in this PR?
    
    This PR explicitly adds OpenSSL 1.1 to the autobrew formulas and explicitly adds `-lssl -lcrypto` to the PKG_LIBS (1.1 because that's what was in the corresponding homebrew formula).
    
    ### Are these changes tested?
    
    Existing nightly tests cover these changes.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: #36456
    
    Lead-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    2 people authored and raulcd committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    1d5f4cd View commit details
    Browse the repository at this point in the history
  2. GH-36707: [C++] Use ARROW_PACKAGE_PREFIX for OPENSSL_ROOT_DIR too (#3…

    …6710)
    
    ### Rationale for this change
    
    In general, a CMake package uses `${PACKAGE}_ROOT` variable to detect `PACKAGE` but `FindOpenSSL.cmake` uses `OPENSSL_ROOT_DIR` not `OpenSSL_ROOT`.
    
    ### What changes are included in this PR?
    
    Set `OPENSSL_ROOT_DIR` explicitly.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    * Closes: #36707
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    kou authored and raulcd committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    453965d View commit details
    Browse the repository at this point in the history
  3. GH-36686: [C++] Pass CMAKE_OSX_SYSROOT to external projects (#36706)

    ### Rationale for this change
    
    If we use different macOS SDK in Apache Arrow C++ and bundled projects, it will cause some problems such as a build error.
    
    ### What changes are included in this PR?
    
    Pass `CMAKE_OSX_SYSROOT` explicitly to external projects.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    * Closes: #36686
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
    kou authored and raulcd committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    137e50d View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2023

  1. GH-36669: [Go] Guard against garbage in C Data structures (#36670)

    ### Rationale for this change
    
    Prevent hard to debug crashes when using Go code with other code via C Data Interface.
    
    ### What changes are included in this PR?
    
    In the C Stream Interface implementation, jump through a trampoline that zeroes the out parameters before letting Go see them.
    
    Note that this can only guard against the issue when the C Stream Interface is used. 
    
    Also, fix other issues in the C Data Interface tests with invalid pointers and uninitialized memory that were turned up by the new test here (because it calls `runtime.GC` very frequently).
    
    ### Are these changes tested?
    
    Yes
    
    ### Are there any user-facing changes?
    
    No
    
    **This PR contains a "Critical Fix".**
    * Closes: #36669
    
    Lead-authored-by: David Li <[email protected]>
    Co-authored-by: Matt Topol <[email protected]>
    Signed-off-by: David Li <[email protected]>
    2 people authored and raulcd committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    2fff4d1 View commit details
    Browse the repository at this point in the history
  2. GH-36687: [R] Add correct branch name to autobrew formulae to facilit…

    …ate local testing (#36689)
    
    ### Rationale for this change
    
    It is currently not possible to recreate an autobrew build locally by following the instructions in the comments. This fixes the local copies of the upstream formulas and the instructions so that future debuggers can recreate an autobrew build.
    
    ### What changes are included in this PR?
    
    The branch `master` no longer exists and is the default value. This PR adds the revised default branch name ("main").
    
    ### Are these changes tested?
    
    No nightly test covers this because this value would be overwritten to test specific commits anyway.
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: #36687
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    paleolimbot authored and raulcd committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    3cded6e View commit details
    Browse the repository at this point in the history
  3. GH-36746: [R] Update NEWS.md for 12.0.1.1 release (#36747)

    ### What changes are included in this PR?
    
    Update NEWS.md in the R package to include 12.0.1.1 release
    
    ### Are these changes tested?
    
    No
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #36746
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    thisisnic authored and raulcd committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    ae32f8f View commit details
    Browse the repository at this point in the history
  4. GH-36744: [Python][Packaging] Add upper pin for cython<3 to pyarrow b…

    …uild dependencies (#36743)
    
    ### Rationale for this change
    
    Although we already fixed some cython 3 build issues (#34726), some new have been introduced, which we are seeing now cython 3 is released (#36730)
    
    Adding an upper pin (<3) for the release, so we have more time (the full 14.0 release cycle) to iron out issues.
    * Closes: #36744
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    jorisvandenbossche authored and raulcd committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    c7483af View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2023

  1. GH-36756: [CI][Python] Install Cython < 3.0 on verify-release-candida…

    …te script (#36757)
    
    ### Rationale for this change
    
    Some of our verification tasks fail for 13.0.0
    
    ### What changes are included in this PR?
    
    Pin Cython to be less than 3.0
    
    ### Are these changes tested?
    
    Archery
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #36756
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
    raulcd committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    6aff232 View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2023

  1. GH-36812: [C#] Fix C API support to work with .NET desktop framework (#…

    …36813)
    
    The C API support in the C# library has been modified to work correctly on .NET 4.7.2.
    The tests have been modified to work correctly on .NET 4.7.2, though that platform is disabled by default as the Python interop seem to cause a hang when unloading the xUnit AppDomain.
    
    **This PR contains a "Critical Fix".**
    * Closes: #36812
    
    Authored-by: Curt Hagenlocher <[email protected]>
    Signed-off-by: Weston Pace <[email protected]>
    CurtHagenlocher authored and raulcd committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    bc75d67 View commit details
    Browse the repository at this point in the history
  2. GH-36805: [R] Update NEWS.md for 13.0.0 (#36806)

    ### Rationale for this change
    
    Update NEWS.md for 13.0.0
    * Closes: #36805
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    thisisnic authored and raulcd committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    0adeffa View commit details
    Browse the repository at this point in the history
  3. MINOR: [R] Bump versions following 12.0.1.1 release (#36801)

    ### Rationale for this change
    
    Bumping version numbers after 12.0.1.1 release (this is a manual process for CRAN-only releases)
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    thisisnic authored and raulcd committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    b78a291 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2023

  1. GH-36839: [CI][Docs] Update test-ubuntu-default-docs to use GitHub ac…

    …tions instead of Azure (#36840)
    
    ### Rationale for this change
    
    Currently `test-ubuntu-default-docs` has been failing on Azure for the 13.0.0 RC0 and we had to use GitHub actions to generate the documentation.
    Using the same base action for both preview-docs, test and packaging will improve maintainability.
    
    ### What changes are included in this PR?
    
    Move `test-ubuntu-default-docs` to use GH actions instead of Azure.
    
    ### Are these changes tested?
    
    Yes, with archery related tasks.
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #36839
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
    raulcd committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    8997754 View commit details
    Browse the repository at this point in the history
  2. GH-36832: [Packaging][RPM] Remove needless Requires (#36833)

    ### Rationale for this change
    
    `arrowXX-libs` doesn't use `gflags` but it depends on `gflags`.
    
    ### What changes are included in this PR?
    
    Remove needless explicit `Requires`.
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Yes.
    * Closes: #36832
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    kou authored and raulcd committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    6375be6 View commit details
    Browse the repository at this point in the history
  3. GH-36688: [C#] Fix dereference error (#36691)

    * Closes: #36688
    
    Authored-by: Curt Hagenlocher <[email protected]>
    Signed-off-by: Weston Pace <[email protected]>
    CurtHagenlocher authored and raulcd committed Jul 25, 2023
    Configuration menu
    Copy the full SHA
    f0f27d7 View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2023

  1. PARQUET-2323: [C++] Use bitmap to store pre-buffered column chunks (#…

    …36649)
    
    ### Rationale for this change
    
    In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader.
    
    In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead. 
    
    Using a bitmap instead (with one bit per column chunk indicating whether it's prebuffered or not) would be a reasonsable mitigation, taking 4KB for 32K columns.
    
    ### What changes are included in this PR?
    
    Switch from a hash set to a bitmap buffer.
    
    ### Are these changes tested?
    
    Yes, passed unit tests on partial prebuffer.
    
    ### Are there any user-facing changes?
    
    No.
    
    Lead-authored-by: jp0317 <[email protected]>
    Co-authored-by: Jinpeng <[email protected]>
    Co-authored-by: Gang Wu <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    2 people authored and raulcd committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    7ea78a9 View commit details
    Browse the repository at this point in the history
  2. GH-36913: [C++] Skip empty buffer concatenation to fix UBSan error (#…

    …36914)
    
    ### Rationale for this change
    
    This is a trivial fix for a UBSan error in calls to `ConcatenateBuffers` with an empty buffer that has a null data pointer.
    
    ### What changes are included in this PR?
    
    Conditional call to `std::memcpy` based on whether the buffer's length is 0.
    
    ### Are these changes tested?
    
    Test added in buffer_test.cc.
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #36913
    
    Lead-authored-by: Elliott Brossard <[email protected]>
    Co-authored-by: Elliott Brossard <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    2 people authored and raulcd committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    4d22851 View commit details
    Browse the repository at this point in the history
  3. GH-36928: [Java] Make it run well with the netty newest version 4.1.96 (

    #36926)
    
    When I used `netty arrow memory 13.0.0` and `netty 4.1.96.Final` in Spark, the following error occurred,
    Because `netty 4.1.96.Final` version has revert some modifications, in order to ensure that `netty arrow memory 13.0.0` works well with ``netty 4.1.96.Final`` version, I suggest making similar modifications here.
    1.Compilation errors are as follows:
    https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/47657403
    <img width="955" alt="image" src="https://github.com/apache/arrow/assets/15246973/e7ee2da9-97c0-474c-a62d-5821858e361f">
    
    2.Some modifications have been reverted in `netty 4.1.96.Final` as follows:
    <img width="884" alt="image" src="https://github.com/apache/arrow/assets/15246973/0226685a-cfa3-4b8b-b114-23ad8d027c05">
    <img width="907" alt="image" src="https://github.com/apache/arrow/assets/15246973/a6ea21a0-8531-42b6-ab9d-25eaab1c7fde">
    https://netty.io/news/2023/07/27/4-1-96-Final.html
    netty/netty#13510
    * Closes: #36928
    
    Authored-by: panbingkun <[email protected]>
    Signed-off-by: David Li <[email protected]>
    panbingkun authored and raulcd committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    17a6885 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2023

  1. GH-36947: [CI] Move free up disk space to the Jinja macros to be able…

    … to reuse it on docs job (#36948)
    
    ### Rationale for this change
    
    Try to get rid of some failures on docs generation on release and reuse existing code.
    
    ### What changes are included in this PR?
    
    Move step to a macro to be able to reuse it
    
    ### Are these changes tested?
    
    Archery tasks
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #36947
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    raulcd committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    464d012 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2023

  1. GH-36892: [C++] Fix performance regressions in FieldPath::Get (#37032)

    ### Rationale for this change
    
    #35197 appears to have introduced significant performance regressions in `FieldPath::Get` - indicated [here](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/), in a benchmark that uses a wide (10K column) dataframe.
    
    ### What changes are included in this PR?
    
    - Adds basic benchmarks for `FieldPath::Get` across various input types, as they didn't previously exist
    - Addresses several performance issues. These came in the form of extremely high upfront costs for the `RecordBatch` and `ArrayData` overloads specifically
    - Some minor refactoring of `NestedSelector`
    
    ### Are these changes tested?
    
    Yes (covered by existing tests)
    
    ### Are there any user-facing changes?
    
    No
    
    * Closes: #36892
    
    Lead-authored-by: benibus <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
    2 people authored and raulcd committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    f5a9a59 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2023

  1. GH-37019: [R] Documentation for read_parquet() et al needs updating (#…

    …37020)
    
    ### Rationale for this change
    
    Docs were out of data with code after previous changes to returned object type
    
    ### What changes are included in this PR?
    
    Update docs to reflect correct return type
    
    ### Are these changes tested?
    
    No
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #37019
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
    thisisnic authored and raulcd committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    594b746 View commit details
    Browse the repository at this point in the history
  2. GH-36969: [R] Disable GCS by default when doing a bundled build on gc…

    …c-13 (#37147)
    
    ### Rationale for this change
    
    Currently a naive `install.packages("arrow")` will result in a failed build if gcc-13 is the compiler. This is because we include GCS by default on this type of build (bundled). CRAN's check farm includes at least one system where gcc-13 is the compiler and so we can't error or suggest a user workaround.
    
    ### What changes are included in this PR?
    
    This PR explicitly sets the relevant environment variable if the compiler version string contains "g++" and "13.XX.XX". This is admittedly crude; however, the alternative of updating Abseil results in a cascading set of changes that may break other parts of Arrow. Few if any actual users will build the Arrow R package from source using gcc-13, so this has a much lower footprint (and a workaround: you can just set the ARROW_GCS environment variable + custom abseil location yourself before building if you do, in fact, want to attempt this).
    
    ### Are these changes tested?
    
    Tested via crossbow (see below).
    
    ### Are there any user-facing changes?
    
    No.
    * Closes: #36969
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Jacob Wujciak-Jens <[email protected]>
    paleolimbot authored and raulcd committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    73660ff View commit details
    Browse the repository at this point in the history
  3. GH-37197: [Java][CI][Packaging] Free some disk space on the java-jars…

    … GitHub job (#37198)
    
    ### Rationale for this change
    The java-jars job was failing on the maintenance branch for the release due to disk out of space.
    
    ### What changes are included in this PR?
    
    Add a step to do some cleanup for the job.
    
    ### Are these changes tested?
    
    Yes, I tested it on the maintenance branch having the job successfully run and via crossbow.
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #37197
    
    Authored-by: Raúl Cumplido <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
    raulcd committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    0a98984 View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2023

  1. Configuration menu
    Copy the full SHA
    43dd768 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c28a20a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b7d2f7f View commit details
    Browse the repository at this point in the history