From 0abd3a3cba9b16613361d9208dd9ab8d54a50286 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Fri, 4 Oct 2024 16:51:09 -0500 Subject: [PATCH 1/5] draft post --- _posts/2024-10-07-nanoarrow-0.6.0-release.md | 129 +++++++++++++++++++ 1 file changed, 129 insertions(+) create mode 100644 _posts/2024-10-07-nanoarrow-0.6.0-release.md diff --git a/_posts/2024-10-07-nanoarrow-0.6.0-release.md b/_posts/2024-10-07-nanoarrow-0.6.0-release.md new file mode 100644 index 000000000000..5b12670e8339 --- /dev/null +++ b/_posts/2024-10-07-nanoarrow-0.6.0-release.md @@ -0,0 +1,129 @@ +--- +layout: post +title: "Apache Arrow nanoarrow 0.6.0 Release" +date: "2024-10-07 00:00:00" +author: pmc +categories: [release] +--- + + +The Apache Arrow team is pleased to announce the 0.6.0 release of +Apache Arrow nanoarrow. This release covers 114 resolved issues from +10 contributors. + +## Release Highlights + +- Run End Encoding support +- StringView support +- IPC Write support +- DLPack/device support +- IPC/Device available from CMake/Meson as feature flags + +See the +[Changelog](https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.6.0/CHANGELOG.md) +for a detailed list of contributions to this release. + +## Breaking Changes + +Most changes included in the nanoarrow 0.6.0 release will not break downstream +code; however, several changes in the C library are breaking changes to previous +behaviour. + +- Check changelog + revdep ADBC + +## Features + +### REE Support + +- C library only, build by buffer + +### StringView Support + +- C demo +- Python demo +- R demo + +### IPC Write Support + +- C demo +- Python demo +- R demo + +### DLPack/CUDA Support + + + +### Build System Support for IPC/Device + +Lastly, the CMake build system was refactored to enable `FetchContent` to +work in an even wider variety of +[develop/build/install scenarios](https://github.com/apache/arrow-nanoarrow/tree/main/examples/cmake-scenarios). In most cases, CMake-based projects should be able +to add the nanoarrow C library with device and/or IPC support as a dependency with: + +```cmake +include(FetchContent) + +set(NANOARROW_IPC ON) +set(NANOARROW_DEVICE ON) +fetchcontent_declare(nanoarrow + GIT_REPOSITORY https://github.com/apache/arrow-nanoarrow.git + GIT_TAG apache-arrow-nanoarrow-0.6.0 + GIT_SHALLOW TRUE) +fetchcontent_makeavailable(nanoarrow) + +add_executable(some_target ...) +target_link_libraries(some_target PRIVATE nanoarrow::nanoarrow_ipc nanoarrow::nanoarrow_device) +``` + +- Works in Meson, too + +## Get nanoarrow + +- cxx via CMake/Meson + +The nanoarrow R bindings are distributed as the `nanoarrow` package on +[CRAN](https://cran.r-project.org/). + +The nanoarrow Python bindings are distributed as the `nanoarrow` package on +[PyPI](https://pypi.org/project/nanoarrow/) and [conda-forge](https://anaconda.org/conda-forge/nanoarrow): + +```shell +pip install nanoarrow +conda install nanoarrow -c conda-forge +``` + +## Contributors + +This release consists of contributions from 10 contributors in addition +to the invaluable advice and support of the Apache Arrow community. + +```console +$ git shortlog -sn apache-arrow-nanoarrow-0.6.0.dev..apache-arrow-nanoarrow-0.6.0 | grep -v "GitHub Actions" + 64 Dewey Dunnington + 19 William Ayd + 16 Benjamin Kietzman + 5 Cocoa + 2 Abhishek Singh + 1 Ashwin Srinath + 1 Dane Pitkin + 1 Jacob Wujciak-Jens + 1 Matt Topol + 1 Tao Zuhong +``` From f5af996c0def7a0585367706e1274809dd45fda3 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Mon, 7 Oct 2024 11:24:38 -0500 Subject: [PATCH 2/5] update notes --- _posts/2024-10-07-nanoarrow-0.6.0-release.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/_posts/2024-10-07-nanoarrow-0.6.0-release.md b/_posts/2024-10-07-nanoarrow-0.6.0-release.md index 5b12670e8339..8f8aa4dd1cb3 100644 --- a/_posts/2024-10-07-nanoarrow-0.6.0-release.md +++ b/_posts/2024-10-07-nanoarrow-0.6.0-release.md @@ -47,14 +47,11 @@ code; however, several changes in the C library are breaking changes to previous behaviour. - Check changelog + revdep ADBC +- bundling is now in Python and not distributed in dist/ ## Features -### REE Support - -- C library only, build by buffer - -### StringView Support +### Float16, StringView, and REE Support - C demo - Python demo @@ -68,7 +65,8 @@ behaviour. ### DLPack/CUDA Support - +- C demo +- Python demo ### Build System Support for IPC/Device @@ -93,6 +91,7 @@ target_link_libraries(some_target PRIVATE nanoarrow::nanoarrow_ipc nanoarrow::na ``` - Works in Meson, too +- Demo or link to above demo of Python bundling ## Get nanoarrow From bc9e021ef795a1c68765ee54e2e3198a1fda484c Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Tue, 22 Oct 2024 15:46:56 -0500 Subject: [PATCH 3/5] update --- _posts/2024-10-07-nanoarrow-0.6.0-release.md | 224 ++++++++++++++++--- 1 file changed, 193 insertions(+), 31 deletions(-) diff --git a/_posts/2024-10-07-nanoarrow-0.6.0-release.md b/_posts/2024-10-07-nanoarrow-0.6.0-release.md index 8f8aa4dd1cb3..8cc53307f8b7 100644 --- a/_posts/2024-10-07-nanoarrow-0.6.0-release.md +++ b/_posts/2024-10-07-nanoarrow-0.6.0-release.md @@ -43,30 +43,186 @@ for a detailed list of contributions to this release. ## Breaking Changes Most changes included in the nanoarrow 0.6.0 release will not break downstream -code; however, several changes in the C library are breaking changes to previous -behaviour. +code; however, two changes with respect to packaging and distribution may require +users to update the code used to bring nanoarrow in as a dependency. -- Check changelog + revdep ADBC -- bundling is now in Python and not distributed in dist/ +In nanoarrow 0.5.0 and earlier, the bundled single-file amalgamation was included in +the `dist/` subdirectory or could be generated using a specially-crafted CMake +command. The nanoarrow 0.6.0 release removes the pre-compiled includes and migrates +the code used to generate it to Python. This setup is less confusing for contributors +(whose editors would frequently jump into the wrong `nanoarrow.h`) and is a less confusing +use of CMake. Users can generate the `dist/` subdirectory as it previously existed +with: + +``` shell +python ci/scripts/bundle.py \ + --source-output-dir=dist \ + --include-output-dir=dist \ + --header-namespace= \ + --with-device \ + --with-ipc \ + --with-testing \ + --with-flatcc +``` + +Second, the Arrow IPC and ArrowDeviceArray implementations previously lived in the `extensions/` +subdirectory of the repository. This was helpful during the initial development of these +features; however, the nanoarrow 0.6.0 release added the requisite feature coverage and testing +such that the appropriate home for them is now the main `src/` directory. As such, one +can now build nanoarrow with IPC and/or device support using: + +``` shell +mkdir build && cd build +cmake .. -DNANOARROW_IPC=ON -DNANOARROW_DEVICE=ON +``` ## Features ### Float16, StringView, and REE Support -- C demo -- Python demo -- R demo +The nanoarrow 0.6.0 release adds support for Arrow's float16 (half float), string view, +and run-end encoding support. The C library supports building float16 arrays using +`ArrowArrayAppendDouble()` and supports building string view and binary view arrays +using `ArrowArrayAppendString()` and/or `ArrowArrayAppendBytes()` and supports consuming +using `ArrowArrayViewGetStringUnsafe()` and/or `ArrowArrayViewGetBytesUnsafe()`. R and +Python users can request a string view or float16 type when building an array, and +conversion back to R/Python strings is suppored. + +``` python +# pip install nanoarrow +# conda install nanoarrow -c conda-forge +import nanoarrow as na + +na.Array(["abc", "def", None], na.string_view()) +#> nanoarrow.Array[3] +#> 'abc' +#> 'def' +#> None +na.Array([1, 2, 3], na.float16()) +#> nanoarrow.Array[3] +#> 1.0 +#> 2.0 +#> 3.0 +``` + +``` r +# install.packages("nanoarrow") +library(nanoarrow) + +as_nanoarrow_array(c("abc", "def", NA), schema = na_string_view()) |> + convert_array() +#> [1] "abc" "def" NA +as_nanoarrow_array(c(1, 2, 3), schema = na_half_float()) |> + convert_array() +#> [1] 1 2 3 +``` + +Support for creating/consuming run-end encoding arrays by element is not yet +support in C, R, or Python; however, arrays can be built or consumed by assembling +the correct array/buffer structure in C. + +Thank you to [cocoa-xu](https://github.com/cocoa-xu) for adding float16 and run-end encoding +support and thank you to [WillAyd](https://github.com/WillAyd) for adding string view support! ### IPC Write Support -- C demo -- Python demo -- R demo +The nanoarrow library has supported reading +[Arrow IPC streams](https://arrow.apache.org/docs/format/Columnar.html) +since 0.4.0; however, could not write streams of its own. The nanoarrow 0.6.0 release adds +support for stream writing from C using the `ArrowIpcWriter` and stream writing +from R and Python: + +```python +import io +import nanoarrow as na +from nanoarrow import ipc + +out = io.BytesIO() +with ipc.StreamWriter.from_writable(out) as writer: + writer.write_stream(ipc.InputStream.example()) + +out.seek(0) +na.ArrayStream.from_readable(out).read_all() +#> nanoarrow.Array>[3] +#> {'some_col': 1} +#> {'some_col': 2} +#> {'some_col': 3} +``` + +``` r +library(nanoarrow) + +tf <- tempfile() +nycflights13::flights |> write_nanoarrow(tf) + +read_nanoarrow(tf) |> tibble::as_tibble() +#> # A tibble: 336,776 × 19 +#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time +#> +#> 1 2013 1 1 517 515 2 830 819 +#> 2 2013 1 1 533 529 4 850 830 +#> 3 2013 1 1 542 540 2 923 850 +#> 4 2013 1 1 544 545 -1 1004 1022 +#> 5 2013 1 1 554 600 -6 812 837 +#> 6 2013 1 1 554 558 -4 740 728 +#> 7 2013 1 1 555 600 -5 913 854 +#> 8 2013 1 1 557 600 -3 709 723 +#> 9 2013 1 1 557 600 -3 838 846 +#> 10 2013 1 1 558 600 -2 753 745 +#> # ℹ 336,766 more rows +#> # ℹ 11 more variables: arr_delay , carrier , flight , +#> # tailnum , origin , dest , air_time , distance , +#> # hour , minute , time_hour +``` + +As a result of the IPC write support, nanoarrow now joins the Arrow IPC integration tests +to ensure compatability across implementations. With the exception of +[arrow-rs due to a bug in the Rust flatbuffers implementation](https://github.com/apache/arrow-rs/issues/5052), +nanoarrow is now tested against all participating Arrow implementations with every commit. + +A huge thank you to [bkietz](https://github.com/bkietz) for implementing this support and +the tests (which included multiple bugfixes and identification of inconsistencies of +flatbuffer verification in C, Rust, and C++!). ### DLPack/CUDA Support -- C demo -- Python demo +The nanoarrow 0.6.0 release includes improved support for the +[Arrow C Device data interface](https://arrow.apache.org/docs/format/CDeviceDataInterface.html). +In particular, the CUDA device implementation was improved to more efficiently coordinate +synchronization when copying arrays to/from the GPU and migrated to use the driver API +for wider compatibility. The nanoarrow Python bindings have limited support for creating +`ArrowDeviceArray` wrappers that implement the +[`__arrow_c_device_array__` protocol](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#export-protocol) +from anything that implements DLPack: + +``` python +# Currently requires: +# export NANOARROW_PYTHON_CUDA=/usr/local/cuda +# pip install --force-reinstall --no-binary=":all:" nanoarrow +import nanoarrow as na +from nanoarrow import device +import cupy as cp + +device.c_device_array(cp.array([1, 2, 3])) +#> +#> - device_type: CUDA <2> +#> - device_id: 0 +#> - array: +#> - length: 3 +#> - offset: 0 +#> - null_count: 0 +#> - buffers: (0, 133980798058496) +#> - dictionary: NULL +#> - children[0]: + +darray = device.c_device_array(cp.array([1, 2, 3])) +cp.from_dlpack(darray.array.view().buffer(1)) +#> array([1, 2, 3]) +``` + +Thank you to [AlenkaF](https://github.com/AlenkaF), [shwina](https://github.com/shwina), +and [danepitkin](https://github.com/danepitkin) for their contributions to and +review of this feature! ### Build System Support for IPC/Device @@ -75,37 +231,43 @@ work in an even wider variety of [develop/build/install scenarios](https://github.com/apache/arrow-nanoarrow/tree/main/examples/cmake-scenarios). In most cases, CMake-based projects should be able to add the nanoarrow C library with device and/or IPC support as a dependency with: -```cmake +``` cmake include(FetchContent) -set(NANOARROW_IPC ON) -set(NANOARROW_DEVICE ON) +# If required: +# set(NANOARROW_IPC ON) +# set(NANOARROW_DEVICE ON) fetchcontent_declare(nanoarrow - GIT_REPOSITORY https://github.com/apache/arrow-nanoarrow.git - GIT_TAG apache-arrow-nanoarrow-0.6.0 - GIT_SHALLOW TRUE) + URL "https://www.apache.org/dyn/closer.lua?action=download&filename=arrow/nanoarrow-0.6.0/apache-arrow-0.6.0.tar.gz") fetchcontent_makeavailable(nanoarrow) add_executable(some_target ...) -target_link_libraries(some_target PRIVATE nanoarrow::nanoarrow_ipc nanoarrow::nanoarrow_device) +target_link_libraries(some_target + PRIVATE + nanoarrow::nanoarrow + # If needed + # nanoarrow::nanoarrow_ipc + # nanoarrow::nanoarrow_device + ) ``` -- Works in Meson, too -- Demo or link to above demo of Python bundling - -## Get nanoarrow +Linking against nanoarrow installed via `cmake --install` and located +via `find_package()` is also supported. -- cxx via CMake/Meson +Users of the Meson build system can install the latest nanoarrow with: -The nanoarrow R bindings are distributed as the `nanoarrow` package on -[CRAN](https://cran.r-project.org/). +``` shell +mkdir subprojects +meson wrap install nanoarrow +``` -The nanoarrow Python bindings are distributed as the `nanoarrow` package on -[PyPI](https://pypi.org/project/nanoarrow/) and [conda-forge](https://anaconda.org/conda-forge/nanoarrow): +...and declared as a dependency with: -```shell -pip install nanoarrow -conda install nanoarrow -c conda-forge +``` shell +nanoarrow_dep = dependency('nanoarrow') +example_exec = executable('example_meson_minimal_app', + 'src/app.cc', + dependencies: [nanoarrow_dep]) ``` ## Contributors From 444664cedc157f4a1ac32d76070ae9eb48a492c8 Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Tue, 22 Oct 2024 21:07:33 -0500 Subject: [PATCH 4/5] Update _posts/2024-10-07-nanoarrow-0.6.0-release.md Co-authored-by: Sutou Kouhei --- _posts/2024-10-07-nanoarrow-0.6.0-release.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/_posts/2024-10-07-nanoarrow-0.6.0-release.md b/_posts/2024-10-07-nanoarrow-0.6.0-release.md index 8cc53307f8b7..7e789cb9d5f8 100644 --- a/_posts/2024-10-07-nanoarrow-0.6.0-release.md +++ b/_posts/2024-10-07-nanoarrow-0.6.0-release.md @@ -72,8 +72,7 @@ such that the appropriate home for them is now the main `src/` directory. As suc can now build nanoarrow with IPC and/or device support using: ``` shell -mkdir build && cd build -cmake .. -DNANOARROW_IPC=ON -DNANOARROW_DEVICE=ON +cmake -S . -B build -DNANOARROW_IPC=ON -DNANOARROW_DEVICE=ON ``` ## Features From 70dc0d784979768eb18264ffd8313300cf29666a Mon Sep 17 00:00:00 2001 From: Dewey Dunnington Date: Tue, 22 Oct 2024 21:07:48 -0500 Subject: [PATCH 5/5] Update _posts/2024-10-07-nanoarrow-0.6.0-release.md Co-authored-by: David Li --- _posts/2024-10-07-nanoarrow-0.6.0-release.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2024-10-07-nanoarrow-0.6.0-release.md b/_posts/2024-10-07-nanoarrow-0.6.0-release.md index 7e789cb9d5f8..3496763d5f21 100644 --- a/_posts/2024-10-07-nanoarrow-0.6.0-release.md +++ b/_posts/2024-10-07-nanoarrow-0.6.0-release.md @@ -116,8 +116,8 @@ as_nanoarrow_array(c(1, 2, 3), schema = na_half_float()) |> #> [1] 1 2 3 ``` -Support for creating/consuming run-end encoding arrays by element is not yet -support in C, R, or Python; however, arrays can be built or consumed by assembling +Creating/consuming run-end encoding arrays by element is not yet +supported in C, R, or Python; however, arrays can be built or consumed by assembling the correct array/buffer structure in C. Thank you to [cocoa-xu](https://github.com/cocoa-xu) for adding float16 and run-end encoding