Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

v0.10.0

Compare
Choose a tag to compare
@jorgecarleitao jorgecarleitao released this 12 Mar 21:02
· 460 commits to main since this release

Arrow2 0.10.0 is out! 🚀🚀🚀🚀🚀

Continuing breaking ground, this constitutes one of the most feature rich releases of this crate so far!

Thank you to everyone for the impressive work over the past 2.5 months that make arrow2 so feature rich, safe, fast, and easy to use! 🙇

Here are the main headlines:

Copy on Write

So far, whenever we applied a transformation to an array, we had to create a new array. When multiple operations were used (e.g. c1 x 2 + 1), it lead to the following compute pattern:

1. allocate new region
2. compute
3. allocate new region
4. compute

This was identified by @sundy-li on #741 and addressed by @ritchie46 on #794.

Users can now re-use Arced arrays, just like std::sync::Arc::get_mut. As expected, if the array is being used in multiple places, it will return a None and users do need to allocate a new region (exclusive mutability).

This is being used in Polars to further re-use allocated regions and therefore reduce both memory pressure and wasted compute cycles allocating new regions.

Support for ODBC

This release now supports reading from, and write to, any ODBC driver.

This builds on top of the superb odbc-api created by @pacman82, that allows this crate to use the columnar format provided by ODBC specification.

Given a performant ODBC driver, this is expected to be the fastest way to load data to the Arrow format, as many operations are simple memcopies.

Check out the example and guide for details on how to use it!

async support for writing to Arrow's IPC

Until now, we had limited support to writing to Arrow IPC asynchronously. @dexterduck closed this gap on #878, offering complete async support for both Arrow files and Arrow streams, including implementations of futures::Stream and futures::Sink for them!

Migrated std::simd

After some back and forth with the working group of the project portable simd, this release replaces packed_simd2 by std::simd. This resulted in no performance difference but allow us to leverage the great work that is happening on std::simd.

Support to Serde metadata

A common pain point in using arrow2's logical types is that they are quite rich, making them sometimes difficult
to visualize or represent in e.g. JSON. @houqp closed this with #858, that adds compatibility with Serde for
schema-related structs in this crate (PhysicalType DataType, Field, Schema).

Support for Arrow C stream interface

Arrow has an experimental specification for an FFI to iterators of arrow arrays. This release now fully supports this interface.

Made crate deny(missing_docs)

This makes us developers more conscious about documenting APIs, thereby allowing users more context about them. We have also start documenting IO-related APIs over whether they are CPU or IO-bounded, so that users know which ones block async contexts.

Changelog

Full Changelog

Breaking changes:

New features:

Fixed bugs:

Enhancements:

Documentation updates:

Testing updates: