Skip to content

Commit

Permalink
feat: Expose binary_elementwise_into_string_amortized for plugin auth…
Browse files Browse the repository at this point in the history
…ors, recommend `apply_into_string_amortized` instead of `apply_to_buffer` (#17903)

Co-authored-by: Bruno Conde Kind <[email protected]>
  • Loading branch information
MarcoGorelli and condekind authored Jul 27, 2024
1 parent 8bbc947 commit 5fc791c
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 19 deletions.
15 changes: 0 additions & 15 deletions crates/polars-core/src/chunked_array/ops/apply.rs
Original file line number Diff line number Diff line change
Expand Up @@ -363,21 +363,6 @@ impl StringChunked {
});
StringChunked::from_chunk_iter(self.name(), chunks)
}

/// Utility that reuses an string buffer to amortize allocations.
/// Prefer this over an `apply` that returns an owned `String`.
pub fn apply_to_buffer<'a, F>(&'a self, mut f: F) -> Self
where
F: FnMut(&'a str, &mut String),
{
let mut buf = String::new();
let outer = |s: &'a str| {
buf.clear();
f(s, &mut buf);
unsafe { std::mem::transmute::<&str, &'a str>(buf.as_str()) }
};
self.apply_mut(outer)
}
}

impl BinaryChunked {
Expand Down
41 changes: 39 additions & 2 deletions crates/polars-core/src/chunked_array/ops/arity.rs
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
use std::error::Error;

use arrow::array::{Array, StaticArray};
use arrow::array::{Array, MutablePlString, StaticArray};
use arrow::compute::utils::combine_validities_and;
use polars_error::PolarsResult;

use crate::chunked_array::metadata::MetadataProperties;
use crate::datatypes::{ArrayCollectIterExt, ArrayFromIter};
use crate::prelude::{ChunkedArray, CompatLevel, PolarsDataType, Series};
use crate::prelude::{ChunkedArray, CompatLevel, PolarsDataType, Series, StringChunked};
use crate::utils::{align_chunks_binary, align_chunks_binary_owned, align_chunks_ternary};

// We need this helper because for<'a> notation can't yet be applied properly
Expand Down Expand Up @@ -332,6 +332,43 @@ where
ChunkedArray::from_chunk_iter(lhs.name(), iter)
}

/// Apply elementwise binary function which produces string, amortising allocations.
///
/// Currently unused within Polars itself, but it's a useful utility for plugin authors.
#[inline]
pub fn binary_elementwise_into_string_amortized<T, U, F>(
lhs: &ChunkedArray<T>,
rhs: &ChunkedArray<U>,
mut op: F,
) -> StringChunked
where
T: PolarsDataType,
U: PolarsDataType,
F: for<'a> FnMut(T::Physical<'a>, U::Physical<'a>, &mut String),
{
let (lhs, rhs) = align_chunks_binary(lhs, rhs);
let mut buf = String::new();
let iter = lhs
.downcast_iter()
.zip(rhs.downcast_iter())
.map(|(lhs_arr, rhs_arr)| {
let mut mutarr = MutablePlString::with_capacity(lhs_arr.len());
lhs_arr
.iter()
.zip(rhs_arr.iter())
.for_each(|(lhs_opt, rhs_opt)| match (lhs_opt, rhs_opt) {
(None, _) | (_, None) => mutarr.push_null(),
(Some(lhs_val), Some(rhs_val)) => {
buf.clear();
op(lhs_val, rhs_val, &mut buf);
mutarr.push_value(&buf)
},
});
mutarr.freeze()
});
ChunkedArray::from_chunk_iter(lhs.name(), iter)
}

/// Applies a kernel that produces `Array` types.
///
/// Intended for kernels that apply on values, this function will filter out any
Expand Down
8 changes: 6 additions & 2 deletions docs/user-guide/expressions/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,15 @@ fn pig_latin_str(value: &str, output: &mut String) {
#[polars_expr(output_type=String)]
fn pig_latinnify(inputs: &[Series]) -> PolarsResult<Series> {
let ca = inputs[0].str()?;
let out: StringChunked = ca.apply_to_buffer(pig_latin_str);
let out: StringChunked = ca.apply_into_string_amortized(pig_latin_str);
Ok(out.into_series())
}
```

Note that we use `apply_into_string_amortized`, as opposed to `apply_values`, to avoid allocating a new string for
each row. If your plugin takes in multiple inputs, operates elementwise, and produces a `String` output,
then you may want to look at the `binary_elementwise_into_string_amortized` utility function in `polars::prelude::arity`.

This is all that is needed on the Rust side. On the Python side we must setup a folder with the same name as defined in
the `Cargo.toml`, in this case "expression_lib". We will create a folder in the same directory as our Rust `src` folder
named `expression_lib` and we create an `expression_lib/__init__.py`. The resulting file structure should look something like this:
Expand Down Expand Up @@ -160,7 +164,7 @@ fn append_kwargs(input: &[Series], kwargs: MyKwargs) -> PolarsResult<Series> {
let ca = input.str().unwrap();

Ok(ca
.apply_to_buffer(|val, buf| {
.apply_into_string_amortized(|val, buf| {
write!(
buf,
"{}-{}-{}-{}-{}",
Expand Down

0 comments on commit 5fc791c

Please sign in to comment.