-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Dummy PR to check maint-13.0.0 status #36616
Conversation
…36290) ### Rationale for this change Support `write_page_index` in Parquet Python API ### What changes are included in this PR? support `write_page_index` in properties ### Are these changes tested? Currently not ### Are there any user-facing changes? User can generate page index here. * Closes: #36284 Lead-authored-by: mwish <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
### Rationale for this change We recently made some improvements to `libmexclass` that include creating a class named `libmexclass.proxy.Identifier` which represents proxy ids. This class will be helpful when we create `arrowy.array.Array` objects from existing proxy ids, which used to be represented as scalar `uint64` values. ### What changes are included in this PR? 1. Bumps the libmexclass version the MATLAB Interface depends on to [#3465900](mathworks/libmexclass@3465900) ### Are these changes tested? No tests needed. ### Are there any user-facing changes? No. * Closes: #36599 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
### Rationale for this change While trying to fix an issue with Snowflake ADBC timestamp handling, I came across this as part of the problem. ### What changes are included in this PR? Adds the timezone string indicator to the output of `ValueStr` for a timestamp array. * Closes: #36568 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>
### Rationale for this change The default value for the `SliceOptions::stop` is `INT64_MAX`, which isn't considered in several internal calculations - resulting in integer overflows and unexpected behavior when `stop` isn't provided. Also note that running the included tests without the fixes should result in ubsan errors (it did for me, at least). ### What changes are included in this PR? - Adds some logic to `SliceCodunitsTransform` that handles potential overflows - Adds tests for cases where the `start` param is positive/negative and `stop` is the maximum value **Update** Discovered that `utf8_slice_codeunits` deviates from Python array behavior when `stop=None` and `step < 0`, so further changes were made: - Handles `INT64_MIN` for `SliceOptions::stop` on C++ side, adds more tests. - Updates Python bindings for `SliceOptions` so that the default value when `stop=None` (`sys.maxsize`) is negated when `step < 0` - Adds `None` as a possible `stop` value in Python tests ### Are these changes tested? Yes (tests are included) ### Are there any user-facing changes? In theory, altering the behavior of `utf8_slice_codepoints` when `stop=None` and `step < 0` could be considered a breaking change. That being said, the current implementation produces incorrect results whenever `None` is even used, so it probably isn't one in practice... * Closes: #36311 Authored-by: benibus <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
This comment was marked as outdated.
This comment was marked as outdated.
Closes: #36271 * Closes: #36271 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
### Rationale for this change There are 2 problems: * `FindProtobuf.cmake` provided by CMake is incomplete with Protobuf 23.4. * Misses `-DPROTOBUF_USE_DLLS` for building Substrait related files. ### What changes are included in this PR? * We need to use `protobuf-config.cmake` provided by Protobuf instead of `FindProtobuf.cmake` provided by CMake because `FindProtobuf.cmake` misses `absl::status` dependency. * Accept Protobuf 23.4. * Use `PROTOBUF_USE_DLLS` when we build Substrait related files. * Use `Boost_INCLUDE_DIRS` instead of `Boost_INCLUDE_DIR` because `Boost_INCLUDE_DIR` isn't defined in `BoostConfig.cmake`. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #36598 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…36499) ### What changes are included in this PR? The key hasher is invalidated before the first invocation of `GetKey` (via `GetLatestKey`) after a new batch arrives. In the pre-PR code, this invalidation happens within `Advance`, which is called from `AdvanceAndMemoize` only after `GetLatestKey` is called. The change adds synchronization between the input-receiving- and processing- threads, because avoiding that would require a more complicated and brittle change, e.g., one that involves detecting in the processing thread when a new batch was added to the queue in order to invalidate the key hasher at that time. ### Are these changes tested? Yes, by existing tests. ### Are there any user-facing changes? No. **This PR contains a "Critical Fix".** * Closes: #36482 Authored-by: Yaron Gvili <[email protected]> Signed-off-by: Weston Pace <[email protected]>
…ges in arrow->pandas conversion (#36630) ### Rationale for this change Due to the changes on #33321 a dask test started failing. ### What changes are included in this PR? Skip the test in the meantime ### Are these changes tested? Yes, with crossbow ### Are there any user-facing changes? No * Closes: #36629 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
### Rationale for this change Files in modules which do not depend on the acero module should not reference files inside the acero module. ### What changes are included in this PR? There were no changes to the body of any functions. I simply moved functions around so that the acero include was no longer needed. There were some conflicts that arose between the class `bit_util` and the namespace `bit_util` and so I got rid of the class in favor of the namespace as that is more similar to how we handle `bit_util` elsewhere. ### Are these changes tested? Sort of. I would like to add an AVX2 CI system as well. I'm not confident any of the CI builds are building with AVX2 enabled. Also, even if we have an AVX2 CI system it would not have caught this issue since the code was only needed definitions from the acero header and was not relying on any actual compiled symbols. However, I think setting up tests to catch this sort of invalid include are beyond the scope of this PR. ### Are there any user-facing changes? No. * Closes: #36641 Lead-authored-by: Weston Pace <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ring with other type (#36661) ### Rationale for this change Ensure that `part == other` doesn't crash with `other` is not a Partitioning instance Small follow-up on #36462 * Closes: #36659 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
…36551) ### Rationale for this change The r-binary-packages job (which uses autobrew) and the autobrew nightly jobs are failing because they are linking to a different version of OpenSSL than the package was built against. I believe this occurred because Arrow and its dependencies are built against the autobrew headers which included openssl. The `ssl` and `crypto` libraries weren't explicitly linked, so I think whatever LibreSSL fork MacOS installs by default was getting linked. This was perhaps compatible using the version of autobrew for High Sierra/the version of LibreSSL on High Sierra but was not compatible with the version of autobrew for Big Sur/the version of LibreSSL on Big Sur. ### What changes are included in this PR? This PR explicitly adds OpenSSL 1.1 to the autobrew formulas and explicitly adds `-lssl -lcrypto` to the PKG_LIBS (1.1 because that's what was in the corresponding homebrew formula). ### Are these changes tested? Existing nightly tests cover these changes. ### Are there any user-facing changes? No. * Closes: #36456 Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…6710) ### Rationale for this change In general, a CMake package uses `${PACKAGE}_ROOT` variable to detect `PACKAGE` but `FindOpenSSL.cmake` uses `OPENSSL_ROOT_DIR` not `OpenSSL_ROOT`. ### What changes are included in this PR? Set `OPENSSL_ROOT_DIR` explicitly. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #36707 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
### Rationale for this change If we use different macOS SDK in Apache Arrow C++ and bundled projects, it will cause some problems such as a build error. ### What changes are included in this PR? Pass `CMAKE_OSX_SYSROOT` explicitly to external projects. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #36686 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
#36926) When I used `netty arrow memory 13.0.0` and `netty 4.1.96.Final` in Spark, the following error occurred, Because `netty 4.1.96.Final` version has revert some modifications, in order to ensure that `netty arrow memory 13.0.0` works well with ``netty 4.1.96.Final`` version, I suggest making similar modifications here. 1.Compilation errors are as follows: https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/47657403 <img width="955" alt="image" src="https://github.com/apache/arrow/assets/15246973/e7ee2da9-97c0-474c-a62d-5821858e361f"> 2.Some modifications have been reverted in `netty 4.1.96.Final` as follows: <img width="884" alt="image" src="https://github.com/apache/arrow/assets/15246973/0226685a-cfa3-4b8b-b114-23ad8d027c05"> <img width="907" alt="image" src="https://github.com/apache/arrow/assets/15246973/a6ea21a0-8531-42b6-ab9d-25eaab1c7fde"> https://netty.io/news/2023/07/27/4-1-96-Final.html netty/netty#13510 * Closes: #36928 Authored-by: panbingkun <[email protected]> Signed-off-by: David Li <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
… to reuse it on docs job (#36948) ### Rationale for this change Try to get rid of some failures on docs generation on release and reuse existing code. ### What changes are included in this PR? Move step to a macro to be able to reuse it ### Are these changes tested? Archery tasks ### Are there any user-facing changes? No * Closes: #36947 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change #35197 appears to have introduced significant performance regressions in `FieldPath::Get` - indicated [here](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/), in a benchmark that uses a wide (10K column) dataframe. ### What changes are included in this PR? - Adds basic benchmarks for `FieldPath::Get` across various input types, as they didn't previously exist - Addresses several performance issues. These came in the form of extremely high upfront costs for the `RecordBatch` and `ArrayData` overloads specifically - Some minor refactoring of `NestedSelector` ### Are these changes tested? Yes (covered by existing tests) ### Are there any user-facing changes? No * Closes: #36892 Lead-authored-by: benibus <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Revision: f5a9a59 Submitted crossbow builds: ursacomputing/crossbow @ maint-13-nightly-tests-9 |
Revision: f5a9a59 Submitted crossbow builds: ursacomputing/crossbow @ maint-13-nightly-packaging-10 |
Revision: f5a9a59 Submitted crossbow builds: ursacomputing/crossbow @ maint-13-nightly-packaging-java-jars-0
|
…37020) ### Rationale for this change Docs were out of data with code after previous changes to returned object type ### What changes are included in this PR? Update docs to reflect correct return type ### Are these changes tested? No ### Are there any user-facing changes? No * Closes: #37019 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
…c-13 (#37147) ### Rationale for this change Currently a naive `install.packages("arrow")` will result in a failed build if gcc-13 is the compiler. This is because we include GCS by default on this type of build (bundled). CRAN's check farm includes at least one system where gcc-13 is the compiler and so we can't error or suggest a user workaround. ### What changes are included in this PR? This PR explicitly sets the relevant environment variable if the compiler version string contains "g++" and "13.XX.XX". This is admittedly crude; however, the alternative of updating Abseil results in a cascading set of changes that may break other parts of Arrow. Few if any actual users will build the Arrow R package from source using gcc-13, so this has a much lower footprint (and a workaround: you can just set the ARROW_GCS environment variable + custom abseil location yourself before building if you do, in fact, want to attempt this). ### Are these changes tested? Tested via crossbow (see below). ### Are there any user-facing changes? No. * Closes: #36969 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
… GitHub job (#37198) ### Rationale for this change The java-jars job was failing on the maintenance branch for the release due to disk out of space. ### What changes are included in this PR? Add a step to do some cleanup for the job. ### Are these changes tested? Yes, I tested it on the maintenance branch having the job successfully run and via crossbow. ### Are there any user-facing changes? No * Closes: #37197 Authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
Revision: 0a98984 Submitted crossbow builds: ursacomputing/crossbow @ maint-13-nightly-tests-10 |
Revision: 0a98984 Submitted crossbow builds: ursacomputing/crossbow @ maint-13-nightly-packaging-11 |
Revision: 0a98984 Submitted crossbow builds: ursacomputing/crossbow @ maint-13-nightly-verification-6 |
@github-actions crossbow submit java-jars |
Revision: 0a98984 Submitted crossbow builds: ursacomputing/crossbow @ actions-675fc98ef4
|
@github-actions crossbow submit homebrew-r-* |
Revision: 0a98984 Submitted crossbow builds: ursacomputing/crossbow @ actions-48118f4f0a
|
Closing the temporary PR for maint-13.0.0 as it has been released. |
DO NOT MERGE.
This PR is to track some crossbow jobs to validate status of maintenance branch before creating the first RC.