-
Notifications
You must be signed in to change notification settings - Fork 108
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add nanoarrow 0.6.0 release post (#545)
This post adds an update for the nanoarrow 0.6.0 release! --------- Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: David Li <[email protected]>
- Loading branch information
1 parent
c7f9fe1
commit 6a9752d
Showing
1 changed file
with
289 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,289 @@ | ||
--- | ||
layout: post | ||
title: "Apache Arrow nanoarrow 0.6.0 Release" | ||
date: "2024-10-07 00:00:00" | ||
author: pmc | ||
categories: [release] | ||
--- | ||
<!-- | ||
{% comment %} | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to you under the Apache License, Version 2.0 | ||
(the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
{% endcomment %} | ||
--> | ||
|
||
The Apache Arrow team is pleased to announce the 0.6.0 release of | ||
Apache Arrow nanoarrow. This release covers 114 resolved issues from | ||
10 contributors. | ||
|
||
## Release Highlights | ||
|
||
- Run End Encoding support | ||
- StringView support | ||
- IPC Write support | ||
- DLPack/device support | ||
- IPC/Device available from CMake/Meson as feature flags | ||
|
||
See the | ||
[Changelog](https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.6.0/CHANGELOG.md) | ||
for a detailed list of contributions to this release. | ||
|
||
## Breaking Changes | ||
|
||
Most changes included in the nanoarrow 0.6.0 release will not break downstream | ||
code; however, two changes with respect to packaging and distribution may require | ||
users to update the code used to bring nanoarrow in as a dependency. | ||
|
||
In nanoarrow 0.5.0 and earlier, the bundled single-file amalgamation was included in | ||
the `dist/` subdirectory or could be generated using a specially-crafted CMake | ||
command. The nanoarrow 0.6.0 release removes the pre-compiled includes and migrates | ||
the code used to generate it to Python. This setup is less confusing for contributors | ||
(whose editors would frequently jump into the wrong `nanoarrow.h`) and is a less confusing | ||
use of CMake. Users can generate the `dist/` subdirectory as it previously existed | ||
with: | ||
|
||
``` shell | ||
python ci/scripts/bundle.py \ | ||
--source-output-dir=dist \ | ||
--include-output-dir=dist \ | ||
--header-namespace= \ | ||
--with-device \ | ||
--with-ipc \ | ||
--with-testing \ | ||
--with-flatcc | ||
``` | ||
|
||
Second, the Arrow IPC and ArrowDeviceArray implementations previously lived in the `extensions/` | ||
subdirectory of the repository. This was helpful during the initial development of these | ||
features; however, the nanoarrow 0.6.0 release added the requisite feature coverage and testing | ||
such that the appropriate home for them is now the main `src/` directory. As such, one | ||
can now build nanoarrow with IPC and/or device support using: | ||
|
||
``` shell | ||
cmake -S . -B build -DNANOARROW_IPC=ON -DNANOARROW_DEVICE=ON | ||
``` | ||
|
||
## Features | ||
|
||
### Float16, StringView, and REE Support | ||
|
||
The nanoarrow 0.6.0 release adds support for Arrow's float16 (half float), string view, | ||
and run-end encoding support. The C library supports building float16 arrays using | ||
`ArrowArrayAppendDouble()` and supports building string view and binary view arrays | ||
using `ArrowArrayAppendString()` and/or `ArrowArrayAppendBytes()` and supports consuming | ||
using `ArrowArrayViewGetStringUnsafe()` and/or `ArrowArrayViewGetBytesUnsafe()`. R and | ||
Python users can request a string view or float16 type when building an array, and | ||
conversion back to R/Python strings is suppored. | ||
|
||
``` python | ||
# pip install nanoarrow | ||
# conda install nanoarrow -c conda-forge | ||
import nanoarrow as na | ||
|
||
na.Array(["abc", "def", None], na.string_view()) | ||
#> nanoarrow.Array<string_view>[3] | ||
#> 'abc' | ||
#> 'def' | ||
#> None | ||
na.Array([1, 2, 3], na.float16()) | ||
#> nanoarrow.Array<half_float>[3] | ||
#> 1.0 | ||
#> 2.0 | ||
#> 3.0 | ||
``` | ||
|
||
``` r | ||
# install.packages("nanoarrow") | ||
library(nanoarrow) | ||
|
||
as_nanoarrow_array(c("abc", "def", NA), schema = na_string_view()) |> | ||
convert_array() | ||
#> [1] "abc" "def" NA | ||
as_nanoarrow_array(c(1, 2, 3), schema = na_half_float()) |> | ||
convert_array() | ||
#> [1] 1 2 3 | ||
``` | ||
|
||
Creating/consuming run-end encoding arrays by element is not yet | ||
supported in C, R, or Python; however, arrays can be built or consumed by assembling | ||
the correct array/buffer structure in C. | ||
|
||
Thank you to [cocoa-xu](https://github.com/cocoa-xu) for adding float16 and run-end encoding | ||
support and thank you to [WillAyd](https://github.com/WillAyd) for adding string view support! | ||
|
||
### IPC Write Support | ||
|
||
The nanoarrow library has supported reading | ||
[Arrow IPC streams](https://arrow.apache.org/docs/format/Columnar.html) | ||
since 0.4.0; however, could not write streams of its own. The nanoarrow 0.6.0 release adds | ||
support for stream writing from C using the `ArrowIpcWriter` and stream writing | ||
from R and Python: | ||
|
||
```python | ||
import io | ||
import nanoarrow as na | ||
from nanoarrow import ipc | ||
|
||
out = io.BytesIO() | ||
with ipc.StreamWriter.from_writable(out) as writer: | ||
writer.write_stream(ipc.InputStream.example()) | ||
|
||
out.seek(0) | ||
na.ArrayStream.from_readable(out).read_all() | ||
#> nanoarrow.Array<non-nullable struct<some_col: int32>>[3] | ||
#> {'some_col': 1} | ||
#> {'some_col': 2} | ||
#> {'some_col': 3} | ||
``` | ||
|
||
``` r | ||
library(nanoarrow) | ||
|
||
tf <- tempfile() | ||
nycflights13::flights |> write_nanoarrow(tf) | ||
|
||
read_nanoarrow(tf) |> tibble::as_tibble() | ||
#> # A tibble: 336,776 × 19 | ||
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time | ||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> | ||
#> 1 2013 1 1 517 515 2 830 819 | ||
#> 2 2013 1 1 533 529 4 850 830 | ||
#> 3 2013 1 1 542 540 2 923 850 | ||
#> 4 2013 1 1 544 545 -1 1004 1022 | ||
#> 5 2013 1 1 554 600 -6 812 837 | ||
#> 6 2013 1 1 554 558 -4 740 728 | ||
#> 7 2013 1 1 555 600 -5 913 854 | ||
#> 8 2013 1 1 557 600 -3 709 723 | ||
#> 9 2013 1 1 557 600 -3 838 846 | ||
#> 10 2013 1 1 558 600 -2 753 745 | ||
#> # ℹ 336,766 more rows | ||
#> # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>, | ||
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, | ||
#> # hour <dbl>, minute <dbl>, time_hour <dttm> | ||
``` | ||
|
||
As a result of the IPC write support, nanoarrow now joins the Arrow IPC integration tests | ||
to ensure compatability across implementations. With the exception of | ||
[arrow-rs due to a bug in the Rust flatbuffers implementation](https://github.com/apache/arrow-rs/issues/5052), | ||
nanoarrow is now tested against all participating Arrow implementations with every commit. | ||
|
||
A huge thank you to [bkietz](https://github.com/bkietz) for implementing this support and | ||
the tests (which included multiple bugfixes and identification of inconsistencies of | ||
flatbuffer verification in C, Rust, and C++!). | ||
|
||
### DLPack/CUDA Support | ||
|
||
The nanoarrow 0.6.0 release includes improved support for the | ||
[Arrow C Device data interface](https://arrow.apache.org/docs/format/CDeviceDataInterface.html). | ||
In particular, the CUDA device implementation was improved to more efficiently coordinate | ||
synchronization when copying arrays to/from the GPU and migrated to use the driver API | ||
for wider compatibility. The nanoarrow Python bindings have limited support for creating | ||
`ArrowDeviceArray` wrappers that implement the | ||
[`__arrow_c_device_array__` protocol](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#export-protocol) | ||
from anything that implements DLPack: | ||
|
||
``` python | ||
# Currently requires: | ||
# export NANOARROW_PYTHON_CUDA=/usr/local/cuda | ||
# pip install --force-reinstall --no-binary=":all:" nanoarrow | ||
import nanoarrow as na | ||
from nanoarrow import device | ||
import cupy as cp | ||
|
||
device.c_device_array(cp.array([1, 2, 3])) | ||
#> <nanoarrow.device.CDeviceArray> | ||
#> - device_type: CUDA <2> | ||
#> - device_id: 0 | ||
#> - array: <nanoarrow.c_array.CArray int64> | ||
#> - length: 3 | ||
#> - offset: 0 | ||
#> - null_count: 0 | ||
#> - buffers: (0, 133980798058496) | ||
#> - dictionary: NULL | ||
#> - children[0]: | ||
|
||
darray = device.c_device_array(cp.array([1, 2, 3])) | ||
cp.from_dlpack(darray.array.view().buffer(1)) | ||
#> array([1, 2, 3]) | ||
``` | ||
|
||
Thank you to [AlenkaF](https://github.com/AlenkaF), [shwina](https://github.com/shwina), | ||
and [danepitkin](https://github.com/danepitkin) for their contributions to and | ||
review of this feature! | ||
|
||
### Build System Support for IPC/Device | ||
|
||
Lastly, the CMake build system was refactored to enable `FetchContent` to | ||
work in an even wider variety of | ||
[develop/build/install scenarios](https://github.com/apache/arrow-nanoarrow/tree/main/examples/cmake-scenarios). In most cases, CMake-based projects should be able | ||
to add the nanoarrow C library with device and/or IPC support as a dependency with: | ||
|
||
``` cmake | ||
include(FetchContent) | ||
# If required: | ||
# set(NANOARROW_IPC ON) | ||
# set(NANOARROW_DEVICE ON) | ||
fetchcontent_declare(nanoarrow | ||
URL "https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/nanoarrow-0.6.0/apache-arrow-0.6.0.tar.gz") | ||
fetchcontent_makeavailable(nanoarrow) | ||
add_executable(some_target ...) | ||
target_link_libraries(some_target | ||
PRIVATE | ||
nanoarrow::nanoarrow | ||
# If needed | ||
# nanoarrow::nanoarrow_ipc | ||
# nanoarrow::nanoarrow_device | ||
) | ||
``` | ||
|
||
Linking against nanoarrow installed via `cmake --install` and located | ||
via `find_package()` is also supported. | ||
|
||
Users of the Meson build system can install the latest nanoarrow with: | ||
|
||
``` shell | ||
mkdir subprojects | ||
meson wrap install nanoarrow | ||
``` | ||
|
||
...and declared as a dependency with: | ||
|
||
``` shell | ||
nanoarrow_dep = dependency('nanoarrow') | ||
example_exec = executable('example_meson_minimal_app', | ||
'src/app.cc', | ||
dependencies: [nanoarrow_dep]) | ||
``` | ||
|
||
## Contributors | ||
|
||
This release consists of contributions from 10 contributors in addition | ||
to the invaluable advice and support of the Apache Arrow community. | ||
|
||
```console | ||
$ git shortlog -sn apache-arrow-nanoarrow-0.6.0.dev..apache-arrow-nanoarrow-0.6.0 | grep -v "GitHub Actions" | ||
64 Dewey Dunnington | ||
19 William Ayd | ||
16 Benjamin Kietzman | ||
5 Cocoa | ||
2 Abhishek Singh | ||
1 Ashwin Srinath | ||
1 Dane Pitkin | ||
1 Jacob Wujciak-Jens | ||
1 Matt Topol | ||
1 Tao Zuhong | ||
``` |