Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Buffer Cannot Exceed Existing Length - Pyo3 #538

Closed
lissahyacinth opened this issue Oct 18, 2021 · 2 comments · Fixed by #564
Closed

Buffer Cannot Exceed Existing Length - Pyo3 #538

lissahyacinth opened this issue Oct 18, 2021 · 2 comments · Fixed by #564
Assignees
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@lissahyacinth
Copy link

Issue

Arrow2 will reliably raise assert error 'the offset of the new Buffer cannot exceed the existing length' when attempting to convert multiple Arrow Record Batches through the Rust FFI layer from C. This error will not happy when a chunksize is not set or when only the first batch is used.

Managed to produce this code using Polars for simplicity, but can also be produced using Rust + pyo3.

Scope

Ultimately unsure if this is an issue with the FFI used, Arrow2, or Arrow itself.

MVE - Python via Polars

from polars.polars import PyDataFrame
import pyarrow as pa
import pandas as pd


if __name__ == "__main__":
    a = pd.DataFrame.from_dict({'a': [1,2,3]})
    df = PyDataFrame.from_arrow_record_batches(
        pa.Table.from_pandas(a).to_batches(max_chunksize=1)
    )

MVE - Rust via pyo3

// src/lib.rs
use arrow2::array::ArrayRef;
use arrow2::ffi;
use pyo3::{ffi::Py_uintptr_t, prelude::*};

pub fn array_to_rust(obj: &PyAny) -> PyResult<ArrayRef> {
    // prepare a pointer to receive the Array struct
    let array = Box::new(ffi::Ffi_ArrowArray::empty());
    let schema = Box::new(ffi::Ffi_ArrowSchema::empty());

    let array_ptr = &*array as *const ffi::Ffi_ArrowArray;
    let schema_ptr = &*schema as *const ffi::Ffi_ArrowSchema;

    // make the conversion through PyArrow's private API
    // this changes the pointer's memory and is thus unsafe. In particular, `_export_to_c` can go out of bounds
    obj.call_method1(
        "_export_to_c",
        (array_ptr as Py_uintptr_t, schema_ptr as Py_uintptr_t),
    )?;

    unsafe {
        let field = ffi::import_field_from_c(schema.as_ref()).unwrap();
        let array = ffi::import_array_from_c(array, &field).unwrap();
        Ok(array.into())
    }
}

#[pyfunction]
pub fn convert_table(rbs: Vec<&PyAny>) -> PyResult<()> {
    for rb in rbs {
        array_to_rust(rb).unwrap();
    }
    Ok(())
}

#[pymodule]
fn polars_test(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(convert_table, m)?)?;
    Ok(())
}

Caller in Python

# main.py
import pandas as pd
import pyarrow as pa

from polars_test import convert_table

if __name__ == "__main__":
    convert_table(
        pa.Table.from_pandas(pd.DataFrame.from_dict({'a': [1,2,3]})).to_batches(max_chunksize=1)
    )

Compilation & Running

> maturin develop
> python3 main.py

Error

thread '<unnamed>' panicked at 'the offset of the new Buffer cannot exceed the existing length', /github/home/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b37568/src/buffer/immutable.rs:99:9

Full Backtrace

``` thread '' panicked at 'the offset of the new Buffer cannot exceed the existing length', /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/buffer/immutable.rs:99:9 stack backtrace: 0: 0x7feb6aa61450 - std::backtrace_rs::backtrace::libunwind::trace::ha0ad43e8a952bfe7 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5 1: 0x7feb6aa61450 - std::backtrace_rs::backtrace::trace_unsynchronized::h6830419c0c4130dc at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x7feb6aa61450 - std::sys_common::backtrace::_print_fmt::h8f3516631ffa1ef5 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/sys_common/backtrace.rs:67:5 3: 0x7feb6aa61450 - ::fmt::he1640d5f0d93f618 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/sys_common/backtrace.rs:46:22 4: 0x7feb6aa7febc - core::fmt::write::h88012e1f01caeebf at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/fmt/mod.rs:1115:17 5: 0x7feb6aa5fc95 - std::io::Write::write_fmt::h360fa85b30182555 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/io/mod.rs:1665:15 6: 0x7feb6aa62e8b - std::sys_common::backtrace::_print::ha1f00492f406a015 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/sys_common/backtrace.rs:49:5 7: 0x7feb6aa62e8b - std::sys_common::backtrace::print::hd54561b13feb6af3 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/sys_common/backtrace.rs:36:9 8: 0x7feb6aa62e8b - std::panicking::default_hook::{{closure}}::h84fe124cd0864662 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:208:50 9: 0x7feb6aa62961 - std::panicking::default_hook::h5a8e74a76ce290a7 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:225:9 10: 0x7feb6aa63554 - std::panicking::rust_panic_with_hook::h67c812a4fe9d4c91 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:622:17 11: 0x7feb6aa45c53 - std::panicking::begin_panic::{{closure}}::h7e980ea83325241f at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:542:9 12: 0x7feb6aa45ad9 - std::sys_common::backtrace::__rust_end_short_backtrace::h3fcdd3532cf7a2ad at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/sys_common/backtrace.rs:141:18 13: 0x7feb6aa45b89 - std::panicking::begin_panic::h8bdd8098de24edf2 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:541:12 14: 0x7feb6a9aa0fb - arrow2::buffer::immutable::Buffer::slice::h5dbff47599e3d0a9 at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/buffer/immutable.rs:99:9 15: 0x7feb6aa12a52 - arrow2::array::primitive::ffi:: for arrow2::array::primitive::PrimitiveArray>::try_from_ffi::hedbbe86adb428b1e at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/array/primitive/ffi.rs:34:22 16: 0x7feb6a9f8857 - arrow2::ffi::array::try_from::h691b1eeb72ff4c7f at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/ffi/array.rs:17:33 17: 0x7feb6a993342 - >::try_from_ffi::{{closure}}::hce9f4699b68afcdc at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/array/struct_.rs:247:20 18: 0x7feb6a9e8427 - core::iter::adapters::map::map_try_fold::{{closure}}::h7f360dbad46806aa at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/adapters/map.rs:89:28 19: 0x7feb6a9b401e - core::iter::traits::iterator::Iterator::try_fold::hbb13d2dce945b0a9 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/traits/iterator.rs:1998:21 20: 0x7feb6a9e75ff - as core::iter::traits::iterator::Iterator>::try_fold::h6c8c8ae535e23ad8 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/adapters/map.rs:115:9 21: 0x7feb6a9b9532 - as core::iter::traits::iterator::Iterator>::try_fold::h8bc901f036e9c2a7 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/adapters/mod.rs:174:9 22: 0x7feb6a9bda7f - core::iter::traits::iterator::Iterator::find::he2edf2ffcbd4cc88 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/traits/iterator.rs:2385:9 23: 0x7feb6a9b909f - as core::iter::traits::iterator::Iterator>::next::h84e16b1e87e4631e at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/adapters/mod.rs:156:9 24: 0x7feb6a97eecf - as alloc::vec::spec_from_iter_nested::SpecFromIterNested>::from_iter::h40e81978d2d625d1 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/alloc/src/vec/spec_from_iter_nested.rs:23:32 25: 0x7feb6a983e79 - as alloc::vec::spec_from_iter::SpecFromIter>::from_iter::h50ad4106170e6827 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/alloc/src/vec/spec_from_iter.rs:33:9 26: 0x7feb6a983861 - as core::iter::traits::collect::FromIterator>::from_iter::h940ed5274700fade at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/alloc/src/vec/mod.rs:2453:9 27: 0x7feb6a9bdb39 - core::iter::traits::iterator::Iterator::collect::h698d0a6cc0bcae2d at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/traits/iterator.rs:1749:9 28: 0x7feb6a9e912f - as core::iter::traits::collect::FromIterator>>::from_iter::{{closure}}::h3354fbe3a39a38ee at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/result.rs:1869:53 29: 0x7feb6a9bdd87 - core::iter::adapters::process_results::h0087c13f2bd84a7a at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/adapters/mod.rs:145:17 30: 0x7feb6a9e9028 - as core::iter::traits::collect::FromIterator>>::from_iter::ha273392d725ebe54 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/result.rs:1869:9 31: 0x7feb6a9e7b21 - core::iter::traits::iterator::Iterator::collect::h3d4d405aa78e98aa at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/iter/traits/iterator.rs:1749:9 32: 0x7feb6a992c0a - >::try_from_ffi::hbc9044912c159a80 at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/array/struct_.rs:244:22 33: 0x7feb6a9fac23 - arrow2::ffi::array::try_from::hc3b6cabefdb7392b at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/ffi/array.rs:26:28 34: 0x7feb6aa37a59 - arrow2::ffi::import_array_from_c::hade856d8ce1ed4e7 at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.6.2/src/ffi/mod.rs:53:5 35: 0x7feb6a942b5b - polars_test::array_to_rust::h1665c3fc500a342d at /home/eden/GitHub/polarsrs/src/lib.rs:22:21 36: 0x7feb6a942d4d - polars_test::convert_table::h310dd4f5c5e8e236 at /home/eden/GitHub/polarsrs/src/lib.rs:30:9 37: 0x7feb6a9497e9 - polars_test::__pyo3_raw_convert_table::{{closure}}::hd098e8d9c2d9a0a8 at /home/eden/GitHub/polarsrs/src/lib.rs:27:1 38: 0x7feb6a941f0a - pyo3::callback::handle_panic::{{closure}}::h19606a981283c243 at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.14.5/src/callback.rs:247:9 39: 0x7feb6a947880 - std::panicking::try::do_call::h21e75d3c2a674c81 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:401:40 40: 0x7feb6a947abd - __rust_try 41: 0x7feb6a947657 - std::panicking::try::h895cd2593cef5745 at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panicking.rs:365:19 42: 0x7feb6a94cbc0 - std::panic::catch_unwind::h4e70d0ba75bb2bff at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/panic.rs:434:14 43: 0x7feb6a941db8 - pyo3::callback::handle_panic::hb1c9ce1fb80f305f at /home/eden/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.14.5/src/callback.rs:245:24 44: 0x7feb6a942e10 - polars_test::__pyo3_raw_convert_table::h27d87a79f89e9fd4 at /home/eden/GitHub/polarsrs/src/lib.rs:27:1 45: 0x5c5417 - 46: 0x56bddd - _PyEval_EvalFrameDefault 47: 0x56a0ba - _PyEval_EvalCodeWithName 48: 0x68d5b7 - PyEval_EvalCode 49: 0x67cd01 - 50: 0x67cd7f - 51: 0x67ce21 - 52: 0x67ef47 - PyRun_SimpleFileExFlags 53: 0x6b7242 - Py_RunMain 54: 0x6b75cd - Py_BytesMain 55: 0x7feb853090b3 - __libc_start_main 56: 0x5fb18e - _start 57: 0x0 - Traceback (most recent call last): File "main.py", line 7, in convert_table( pyo3_runtime.PanicException: the offset of the new Buffer cannot exceed the existing length ``` <\details>
@jorgecarleitao jorgecarleitao self-assigned this Oct 19, 2021
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Oct 19, 2021
@jorgecarleitao
Copy link
Owner

Closed by #540, thanks a lot for the report!

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Oct 25, 2021
@jorgecarleitao
Copy link
Owner

Re-opening as the fix was reverted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants