From 2dc5d03bc7cb4adfaf177ae53c419b2e3cb9d829 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 9 Jan 2024 11:54:11 +0100 Subject: [PATCH 01/12] Version 15.0.0 blog post skeleton --- _posts/2024-01-10-15.0.0-release.md | 102 ++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 _posts/2024-01-10-15.0.0-release.md diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md new file mode 100644 index 000000000000..d0cb0fdac0f4 --- /dev/null +++ b/_posts/2024-01-10-15.0.0-release.md @@ -0,0 +1,102 @@ +--- +layout: post +title: "Apache Arrow 15.0.0 Release" +date: "2024-01-10 00:00:00" +author: pmc +categories: [release] +--- + + + +The Apache Arrow team is pleased to announce the 15.0.0 release. This covers +over 3 months of development work and includes [**XXX resolved issues**][1] +from [**YYY distinct contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) +to learn how to get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 14.0.0 release, Curt Hagenlocher, Xuwei Fu, James Duong and Felipe Oliveira Carvalho +have been invited to be committers. +Jonathan Keane and Raúl Cumplido have joined the Project Management Committee (PMC). + +As per our tradition of rotating the PMC chair once a year +Andy Grove was elected as the new PMC chair and VP. + +Thanks for your contributions and participation in the project! + +## Columnar Format Notes + +## C Data Interface notes + +## Arrow Flight RPC notes + +## C++ notes + +### Compute layer + +### Datasets + +### Gandiva + +### Parquet + +### Substrait + +### Miscellaneous + +## C# notes + +## Go Notes + +### Bug Fixes + +### Enhancements + +## Java notes + +## JavaScript notes + +## Python notes + +## R notes + +For more on what’s in the 15.0.0 R package, see the [R changelog][4]. + +## Ruby and C GLib notes + +### Ruby + +### C GLib + +## Rust notes + +The Rust projects have moved to separate repositories outside the +main Arrow monorepo. For notes on the latest release of the Rust +implementation, see the latest [Arrow Rust changelog][5]. + +[1]: https://github.com/apache/arrow/milestone/56?closed=1 +[2]: {{ site.baseurl }}/release/15.0.0.html#contributors +[3]: {{ site.baseurl }}/release/15.0.0.html#changelog +[4]: {{ site.baseurl }}/docs/r/news/ +[5]: https://github.com/apache/arrow-rs/tags From a502f827b16f6d3df06d232b097c9d0175e5eb2f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Mon, 15 Jan 2024 11:56:21 +0900 Subject: [PATCH 02/12] Add Ruby and C GLib changes --- _posts/2024-01-10-15.0.0-release.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index d0cb0fdac0f4..d442ca39654e 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -87,8 +87,12 @@ For more on what’s in the 15.0.0 R package, see the [R changelog][4]. ### Ruby +- Add `Arrow::Table#each_raw_record` and `Arrow::RecordBatch#each_raw_record` ([GH-37137](https://github.com/apache/arrow/issues/37137), [GH-37600](https://github.com/apache/arrow/issues/37600)) + ### C GLib +- Follow C++ changes. + ## Rust notes The Rust projects have moved to separate repositories outside the From 78570b2d04e4e95911c7bbafedd0bea1f09b08b9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 16 Jan 2024 16:00:22 +0100 Subject: [PATCH 03/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: Matt Topol --- _posts/2024-01-10-15.0.0-release.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index d442ca39654e..851e0e4660f0 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -45,10 +45,11 @@ Andy Grove was elected as the new PMC chair and VP. Thanks for your contributions and participation in the project! -## Columnar Format Notes - ## C Data Interface notes +New format strings have been added for ListView, LargeListView, BinaryView and StringView array types. + + ## Arrow Flight RPC notes ## C++ notes From e1b4e1c3b5e0ce48746da2ab58c1d59df2ed547c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 16 Jan 2024 16:00:36 +0100 Subject: [PATCH 04/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: David Li --- _posts/2024-01-10-15.0.0-release.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 851e0e4660f0..8e95a3fb7901 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -52,6 +52,10 @@ New format strings have been added for ListView, LargeListView, BinaryView and S ## Arrow Flight RPC notes +Flight SQL is now considered stable (GH-39037). The Flight SQL specification was clarified regarding how the result set schema of a prepared statement is affected by bound parameters (GH-37061). + +The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and bind parameters (GH-33475), follows the Flight RPC spec when fetching data (GH-34532), and can reuse credentials across metadata and data connections (GH-38576). On macOS it will also use the system keychain to be consistent with other platforms (GH-39014). Applications can also retrieve the underlying Flight RPC metadata from the JDBC driver (GH-38024, GH-38022). + ## C++ notes ### Compute layer From 05bd47717343269fb056b99d3f84451a3495952e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 16 Jan 2024 16:00:51 +0100 Subject: [PATCH 05/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: Matt Topol --- _posts/2024-01-10-15.0.0-release.md | 31 +++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 8e95a3fb7901..e02c94231ec2 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -76,6 +76,37 @@ The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and ### Bug Fixes +#### Arrow + +* Ensured reliability of AuthenticateBasicToken behind proxies ([GH-38198](https://github.com/apache/arrow/issues/38198)) +* Ensured release callback is properly called on C Data imported arrays/batches ([GH-38281](https://github.com/apache/arrow/issues/38281)) +* Fixed rounding errors in decimal256 string functions ([GH-38395](https://github.com/apache/arrow/issues/38395)) +* Added `ValueLen` to Binary and String array interface ([GH-38458](https://github.com/apache/arrow/issues/38458)) +* Fixed Decimal128 rounding issues ([GH-38477](https://github.com/apache/arrow/issues/38477)) +* Fixed memory leak in IPC LZ4 decompressor ([GH-38728](https://github.com/apache/arrow/issues/38728)) +* Addressed Data race in `GetToTimeFunc` for fixed timestamp data types ([GH-38795](https://github.com/apache/arrow/issues/38795)) +* Fixed "index out of range" error for empty resultsets of FlightSQL driver ([GH-39238](https://github.com/apache/arrow/issues/39238)) + +#### Parquet + +* Fixed issue with max definition levels when writing a Parquet file under certain circumstances ([GH-38503](https://github.com/apache/arrow/issues/38503)) +* File writer now properly tracks the number of rows written beyond the last row group ([GH-38516](https://github.com/apache/arrow/issues/38516)) + +### Enhancements + +#### Arrow + +* Added an Avro OCF reader for converting Avro files directly to Arrow record batches ([GH-36760](https://github.com/apache/arrow/issues/36760)) +* Added support for StringView ([GH-38718](https://github.com/apache/arrow/issues/38718)) and C Data ABI StringViews ([GH-39013](https://github.com/apache/arrow/issues/39013)) +* GC Checks were enabled for CI running integration tests ([GH-38824](https://github.com/apache/arrow/issues/38824)) + +#### Parquet + +* Implemented Float16 logical type for Parquet files ([GH-37582](https://github.com/apache/arrow/issues/37582)) +* Added proper boolean RLE encoding/decoding ([GH-38462](https://github.com/apache/arrow/issues/38462)) + +### Bug Fixes + ### Enhancements ## Java notes From e97a1e3b9793b730f9e3964c12bcc75b69ebdcac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 16 Jan 2024 16:01:07 +0100 Subject: [PATCH 06/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: David Li --- _posts/2024-01-10-15.0.0-release.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index e02c94231ec2..3656430c9731 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -111,6 +111,14 @@ The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and ## Java notes +**We expect a breaking change in the next release, Arrow 16.0.0.** Support for Java 9 modules is coming, but that will require changing the JVM flags used to launch your application (GH-38998). Arrow 15.0.0 is not affected. + +A bill-of-materials (BOM) package was added to make it easier to depend on multiple Arrow libraries (GH-38264). + +The JDBC adapter (separate from the JDBC driver) now supports 256-bit decimals (GH-39484) and throws more informative exceptions (GH-39355). + +Various improvements were made to utilities for working with vectors (GH-38662, GH-38614, GH-38511, GH-38254, GH-38246). + ## JavaScript notes ## Python notes From 235627f9f8963d6832552aaeb4d1913e2c115bef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 16 Jan 2024 16:02:16 +0100 Subject: [PATCH 07/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: Alenka Frim --- _posts/2024-01-10-15.0.0-release.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 3656430c9731..86406b2e9597 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -123,6 +123,30 @@ Various improvements were made to utilities for working with vectors (GH-38662, ## Python notes +Compatibility notes: +* Legacy `ParquetDataset` custom implementation has been removed and only the new dataset API is now in use [GH-31303](https://github.com/apache/arrow/issues/31303). + +New features: +* PyArrow version 14.0.0 included a new specification for Arrow PyCapsules and related dunder methods [GH-35531](https://github.com/apache/arrow/pull/37797) and now a public `RecordBatchReader` constructor from stream object implementing the PyCapsule Protocol has been added [GH-[39217](https://github.com/apache/arrow/issues/39217)](https://github.com/apache/arrow/issues/39217) together with some additional documentation [GH-[39196](https://github.com/apache/arrow/issues/39196)](https://github.com/apache/arrow/issues/39196). +* DLPack protocol support (producer) was added to the Arrow C++ and is exposed in Python through `__dlpack__` and `__dlpack_device__` dunder methods [GH-33984](https://github.com/apache/arrow/issues/33984). +* Python now exposes enabling CRC checksum for read and write operations in Paquet [GH-37242](https://github.com/apache/arrow/issues/37242). CRC checksum are optional and can detect data corruption. +* `CacheOptions` are now configurable from Python as part of the `pyarrow.dataset.ParquetFragmentScanOptions` [GH-36441](https://github.com/apache/arrow/issues/36441). +* Parquet metadata to indicate sort order of the data are now exposed in `RowGroupMetaData` [GH-35331](https://github.com/apache/arrow/issues/35331). + +Other improvements: +* Append parameter from `FileOutputStream` is exposed for the `OSFile` class [GH-38857](https://github.com/apache/arrow/issues/38857). +* File size can be passed to `make_fragment` in the pyarrow datasets (`pyarrow.dataset.FileFormat`and `pyarrow.dataset.ParquetFileFormat`) [GH-37857](https://github.com/apache/arrow/issues/37857). +* Support for mask parameter is added to `FixedSizeListArray.from_arrays` [GH-34316](https://github.com/apache/arrow/issues/34316) +* `to/from_struct_array` are added to the `pyarrow.Table` class [GH-33500](https://github.com/apache/arrow/issues/33500). +* GIL is released in `.nbytes` which is improving performance when calculating the data size [GH-39096](https://github.com/apache/arrow/issues/39096). +* Usage of pandas internals `DatetimeTZBlock` has been removed [GH-38341](https://github.com/apache/arrow/issues/38341). +* `DataType` instance can be passed to `MapType.from_arrays` constructor [GH-39515](https://github.com/apache/arrow/issues/39515). + +Relevant bug fixes: +* `S3FileSystem` equals `None` segfault has been fixed [GH-38535](https://github.com/apache/arrow/issues/38535). +* No-op kernel is added for `dictionary_encode(dictionary)` [GH-34890](https://github.com/apache/arrow/issues/34890). +* PrettyPrint for Timestamp type now adds "Z" at the end of the print string when tz is defined in order to add minimum information about the values being stored in UTC [GH-30117](https://github.com/apache/arrow/issues/30117). + ## R notes For more on what’s in the 15.0.0 R package, see the [R changelog][4]. From 8ae3324ec5feafe4396fee3574537e3cbb847ae6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 16 Jan 2024 16:28:38 +0100 Subject: [PATCH 08/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: Dominik Moritz --- _posts/2024-01-10-15.0.0-release.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 86406b2e9597..2a2cc9ef31d4 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -121,6 +121,20 @@ Various improvements were made to utilities for working with vectors (GH-38662, ## JavaScript notes +This release comes with new features and APIs. We also removed `getByteLength` to reduce bundle sizes. + +New Features with API changes +* [GH-39017: [JS] Add typeId as attribute](https://github.com/apache/arrow/pull/39018) +* [GH-39257: [JS] LargeBinary](https://github.com/apache/arrow/pull/39258) +* [GH-15060: [JS] Add LargeUtf8 type](https://github.com/apache/arrow/pull/35780) +* [GH-39259: [JS] Remove getByteLength](https://github.com/apache/arrow/pull/39260) +* [GH-39435: [JS] Add Vector.nullable](https://github.com/apache/arrow/pull/39436) +* [GH-39255: [JS] Allow customization of schema when passing vectors to table constructor](https://github.com/apache/arrow/pull/39256) +* [GH-37983: [JS] Allow nullable fields in table when constructed from vector with nulls](https://github.com/apache/arrow/pull/39254) + +Package changes +* [GH-39289: [JS] Add types to exports](https://github.com/apache/arrow/pull/39475) + ## Python notes Compatibility notes: From decc706d6156096151f74cf02428647aa6a36102 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 17 Jan 2024 12:13:06 +0100 Subject: [PATCH 09/12] Update _posts/2024-01-10-15.0.0-release.md Co-authored-by: Curt Hagenlocher --- _posts/2024-01-10-15.0.0-release.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 2a2cc9ef31d4..50d7052e5788 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -72,6 +72,24 @@ The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and ## C# notes +Removal of build targets: + * Remove out-of-support versions of .NET and update C# README [GH-31579](https://github.com/apache/arrow/pull/39165) + +New features: + * Better support for decimal values which exceed the range of the BCL's System.Decimal [GH-38351](https://github.com/apache/arrow/pull/38481), [GH-38483](https://github.com/apache/arrow/pull/38508) + * Expose ArrayDataConcentrator.Concatenate publicly [GH-38153](https://github.com/apache/arrow/pull/38154) + * Add ToString methods to Arrow classes [GH-36566](https://github.com/apache/arrow/pull/30717) + * Implement common interfaces for structure arrays and record batches [GH-38757](https://github.com/apache/arrow/pull/38759) + * Make primitive arrays support IReadOnlyList [GH-38348](https://github.com/apache/arrow/pull/38680), [GH-39223](https://github.com/apache/arrow/pull/39224) + * Add ToList to Decimal128Array and Decimal256Array [GH-37359](https://github.com/apache/arrow/pull/37383) + * Support additional types Interval, Utf8View, BinaryView and ListView [GH-38316](https://github.com/apache/arrow/pull/39043), [GH-39341](https://github.com/apache/arrow/pull/39342) + * Support creating FlightClient with Grpc.Core.Channel [GH-39335](https://github.com/apache/arrow/pull/39348) + +Fixes and improved compatibility: + * Make dictionaries in file and memory implementations work correctly and support integration tests [GH-32662](https://github.com/apache/arrow/pull/39146) + * Support blank column names and enable more integration tests [GH-36588](https://github.com/apache/arrow/pull/39167) + + ## Go Notes ### Bug Fixes From 2e1f3e6ec6503cc91005bdcd5b0ee600406d229b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 17 Jan 2024 12:15:08 +0100 Subject: [PATCH 10/12] Update _posts/2024-01-10-15.0.0-release.md --- _posts/2024-01-10-15.0.0-release.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 50d7052e5788..34bf4e5e3813 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -181,6 +181,19 @@ Relevant bug fixes: ## R notes +### New features: + +* Bindings for `base::prod` have been added so you can now use it in your dplyr pipelines (i.e., `tbl |> summarize(prod(col))`) without having to pull the data into R [GH-38601](https://github.com/apache/arrow/pull/38601). +* Calling `dimnames` or `colnames` on `Dataset` objects now returns a useful result rather than just `NULL` [GH-38377](https://github.com/apache/arrow/pull/38377). +* The `code()` method on Schema objects now takes an optional `namespace` argument which, when `TRUE`, prefixes names with `arrow::` which makes the output more portable [GH-38144](https://github.com/apache/arrow/pull/38144). + +### Other improvements: + +* To make debugging problems easier when using arrow with AWS S3 (e..g, `s3_bucket`, `S3FileSystem`), the debug log level for S3 can be set with the `AWS_S3_LOG_LEVEL` environment variable. See `?S3FileSystem` for more information. [GH-38267](https://github.com/apache/arrow/pull/38267) +* An error is now thrown instead of warning and pulling the data into R when any of `sub`, `gsub`, `stringr::str_replace`, `stringr::str_replace_all` are passed a length > 1 vector of values in `pattern` [GH-39219](https://github.com/apache/arrow/pull/39219). +* Missing documentation was added to `?open_dataset` documenting how to use the ND-JSON support added in arrow 13.0.0 [GH-38258](https://github.com/apache/arrow/pull/38258). +* Using arrow with duckdb (i.e., `to_duckdb()`) no longer results in warnings when quitting your R session. [GH-38495](https://github.com/apache/arrow/pull/38495) + For more on what’s in the 15.0.0 R package, see the [R changelog][4]. ## Ruby and C GLib notes From f57e62694b1d6269605885a742dd64a90ab4d95c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Mon, 22 Jan 2024 11:33:12 +0100 Subject: [PATCH 11/12] Add Parquet notes and fix issue links --- _posts/2024-01-10-15.0.0-release.md | 37 +++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 34bf4e5e3813..69c669ba04cb 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -52,24 +52,40 @@ New format strings have been added for ListView, LargeListView, BinaryView and S ## Arrow Flight RPC notes -Flight SQL is now considered stable (GH-39037). The Flight SQL specification was clarified regarding how the result set schema of a prepared statement is affected by bound parameters (GH-37061). +Flight SQL is now considered stable ([GH-39037](https://github.com/apache/arrow/issues/39037)). The Flight SQL specification was clarified regarding how the result set schema of a prepared statement is affected by bound parameters ([GH-37061](https://github.com/apache/arrow/issues/37061)). -The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and bind parameters (GH-33475), follows the Flight RPC spec when fetching data (GH-34532), and can reuse credentials across metadata and data connections (GH-38576). On macOS it will also use the system keychain to be consistent with other platforms (GH-39014). Applications can also retrieve the underlying Flight RPC metadata from the JDBC driver (GH-38024, GH-38022). +The JDBC Arrow Flight SQL driver now supports mTLS authentication ([GH-38460](https://github.com/apache/arrow/issues/38460)) and bind parameters ([GH-33475](https://github.com/apache/arrow/issues/33475)), follows the Flight RPC spec when fetching data ([GH-34532](https://github.com/apache/arrow/issues/34532)), and can reuse credentials across metadata and data connections ([GH-38576](https://github.com/apache/arrow/issues/38576)). On macOS it will also use the system keychain to be consistent with other platforms ([GH-39014](https://github.com/apache/arrow/issues/39014)). Applications can also retrieve the underlying Flight RPC metadata from the JDBC driver (GH-38024, GH-38022). ## C++ notes -### Compute layer +For C++ notes refer to the full changelog. -### Datasets +### Parquet -### Gandiva +#### New features: +* Support row group filtering for nested paths for C++ and Parquet ([GH-39064](https://github.com/apache/arrow/issues/39064)) +* Implement Parquet Float16 logical type ([GH-36036](https://github.com/apache/arrow/issues/36036)) +* Expose sorting_columns in RowGroupMetaData for Parquet files ([GH-35331](https://github.com/apache/arrow/issues/35331)) +* Support decompressing concatenated gzip members (stream) ([GH-38271](https://github.com/apache/arrow/issues/38271)) -### Parquet +#### API change: +* Move EstimatedBufferedValueBytes from TypedColumnWriter to ColumnWriter ([GH-38887](https://github.com/apache/arrow/issues/38887)) +* Change parquet TypedComparator operation to const methods ([GH-38874](https://github.com/apache/arrow/issues/38874)) +* Remove deprecated AppendRowGroup(int64_t num_rows) ([GH-39208](https://github.com/apache/arrow/issues/39208)) +* Add api to get RecordReader from RowGroupReader ([GH-37002](https://github.com/apache/arrow/issues/37002)) -### Substrait +#### Bug fixes: +* Add more closed file checks for ParquetFileWriter to Prevent from used-after-close ([GH-38390](https://github.com/apache/arrow/issues/38390)) + +#### Performance enhancement: +* Faster Scalar BYTE_STREAM_SPLIT encoding/decoding ([GH-38542](https://github.com/apache/arrow/issues/38542)) +* Faster reading Parquet FLBA (GH-39124, GH-39413) +* Using bloom_filter_length in parquet 2.10 to optimize bloom filter read ([GH-38860](https://github.com/apache/arrow/issues/38860)) ### Miscellaneous +* Upgrade ORC to 1.9.2 ([GH-39340](https://github.com/apache/arrow/issues/39430)) + ## C# notes Removal of build targets: @@ -129,11 +145,11 @@ Fixes and improved compatibility: ## Java notes -**We expect a breaking change in the next release, Arrow 16.0.0.** Support for Java 9 modules is coming, but that will require changing the JVM flags used to launch your application (GH-38998). Arrow 15.0.0 is not affected. +**We expect a breaking change in the next release, Arrow 16.0.0.** Support for Java 9 modules is coming, but that will require changing the JVM flags used to launch your application ([GH-38998](https://github.com/apache/arrow/issues/38998)). Arrow 15.0.0 is not affected. -A bill-of-materials (BOM) package was added to make it easier to depend on multiple Arrow libraries (GH-38264). +A bill-of-materials (BOM) package was added to make it easier to depend on multiple Arrow libraries ([GH-38264](https://github.com/apache/arrow/issues/38264)). -The JDBC adapter (separate from the JDBC driver) now supports 256-bit decimals (GH-39484) and throws more informative exceptions (GH-39355). +The JDBC adapter (separate from the JDBC driver) now supports 256-bit decimals ([GH-39484](https://github.com/apache/arrow/issues/39484)) and throws more informative exceptions ([GH-39355](https://github.com/apache/arrow/issues/39355)). Various improvements were made to utilities for working with vectors (GH-38662, GH-38614, GH-38511, GH-38254, GH-38246). @@ -164,6 +180,7 @@ New features: * Python now exposes enabling CRC checksum for read and write operations in Paquet [GH-37242](https://github.com/apache/arrow/issues/37242). CRC checksum are optional and can detect data corruption. * `CacheOptions` are now configurable from Python as part of the `pyarrow.dataset.ParquetFragmentScanOptions` [GH-36441](https://github.com/apache/arrow/issues/36441). * Parquet metadata to indicate sort order of the data are now exposed in `RowGroupMetaData` [GH-35331](https://github.com/apache/arrow/issues/35331). +* Parquet Support write and validate Page CRC ([GH-37242](https://github.com/apache/arrow/issues/37242)) Other improvements: * Append parameter from `FileOutputStream` is exposed for the `OSFile` class [GH-38857](https://github.com/apache/arrow/issues/38857). From 596e2e531068fdbc4ab13658ef5871d4b98b3d1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Mon, 22 Jan 2024 11:35:14 +0100 Subject: [PATCH 12/12] Add data about commits and contributors --- _posts/2024-01-10-15.0.0-release.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_posts/2024-01-10-15.0.0-release.md b/_posts/2024-01-10-15.0.0-release.md index 69c669ba04cb..746ed5294ddc 100644 --- a/_posts/2024-01-10-15.0.0-release.md +++ b/_posts/2024-01-10-15.0.0-release.md @@ -26,8 +26,9 @@ limitations under the License. The Apache Arrow team is pleased to announce the 15.0.0 release. This covers -over 3 months of development work and includes [**XXX resolved issues**][1] -from [**YYY distinct contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) +over 3 months of development work and includes [**344 resolved issues**][1] +on [**536 distinct commits**][2] from [**101 distinct contributors**][2]. +See the [Install Page](https://arrow.apache.org/install/) to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights