Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI listarray lead to undefined behavior. #20

Closed
ritchie46 opened this issue Apr 23, 2021 · 3 comments · Fixed by #287
Closed

FFI listarray lead to undefined behavior. #20

ritchie46 opened this issue Apr 23, 2021 · 3 comments · Fixed by #287

Comments

@ritchie46
Copy link
Contributor

Describe the bug
When sending an array array with child data over FFI (e.g. via pyarrow for instance) we encounter undefined behavior.

I have found SIGILL and SEGFAULTS

To Reproduce
The tests in this arrow-pyarrow-integration and arrow/src/ffi.rs

I ran the the tests in MIRI (great tool!), but I have to admit, I am stuck, and seem to go in circles.

running 1 test
test ffi::tests::test_list ... error: Undefined Behavior: pointer to alloc278966 was dereferenced after this allocation got freed
   --> arrow/src/ffi.rs:126:21
    |
126 |         let child = &*child_ptr;
    |                     ^^^^^^^^^^^ pointer to alloc278966 was dereferenced after this allocation got freed
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `ffi::release_schema` at arrow/src/ffi.rs:126:21
note: inside `<ffi::FFI_ArrowSchema as std::ops::Drop>::drop` at arrow/src/ffi.rs:198:39
   --> arrow/src/ffi.rs:198:39
    |
198 |             Some(release) => unsafe { release(self) },
    |                                       ^^^^^^^^^^^^^
    = note: inside `std::intrinsics::drop_in_place::<ffi::FFI_ArrowSchema> - shim(Some(ffi::FFI_ArrowSchema))` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:187:1
    = note: inside `std::sync::Arc::<ffi::FFI_ArrowSchema>::drop_slow` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1039:18
    = note: inside `<std::sync::Arc<ffi::FFI_ArrowSchema> as std::ops::Drop>::drop` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1571:13
    = note: inside `std::intrinsics::drop_in_place::<std::sync::Arc<ffi::FFI_ArrowSchema>> - shim(Some(std::sync::Arc<ffi::FFI_ArrowSchema>))` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:187:1
    = note: inside `std::intrinsics::drop_in_place::<ffi::ArrowArray> - shim(Some(ffi::ArrowArray))` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:187:1
note: inside `array::ffi::<impl std::convert::TryFrom<ffi::ArrowArray> for array::data::ArrayData>::try_from` at arrow/src/array/ffi.rs:60:5
   --> arrow/src/array/ffi.rs:60:5
    |
60  |     }
    |     ^
note: inside `ffi::tests::test_generic_list::<i32>` at arrow/src/ffi.rs:876:20
   --> arrow/src/ffi.rs:876:20
    |
876 |         let data = ArrayData::try_from(array)?;
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `ffi::tests::test_list` at arrow/src/ffi.rs:899:9
   --> arrow/src/ffi.rs:899:9
    |
899 |         test_generic_list::<i32>()
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at arrow/src/ffi.rs:898:5
   --> arrow/src/ffi.rs:898:5
    |
898 | /     fn test_list() -> Result<()> {
899 | |         test_generic_list::<i32>()
900 | |     }
    | |_____^
    = note: inside `<[closure@arrow/src/ffi.rs:898:5: 900:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:567:5
    = note: inside closure at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:558:30
    = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1546:9
    = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:344:9
    = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/ritchie46/.rugged>]─╼ ╾─a280026[<untagged>]─╼ │ ╾──────╼╾──────╼
    0x20 │ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 │ ................
    0x30 │ 01 00 00 00 00 00 00 00 ╾─a279749[<untagged>]─╼ │ ........╾──────╼
    0x40 │ 00 00 00 00 00 00 00 00 ╾─a278656[<untagged>]─╼ │ ........╾──────╼
    0x50 │ ╾─a279863[<untagged>]─╼                         │ ╾──────╼
}
alloc280196 (Rust heap, size: 16, align: 8) {
    00 00 00 00 00 00 00 00 ╾─a275857[<untagged>]─╼ │ ........╾──────╼
}
alloc280316 (Rust heap, size: 56, align: 8) {
    0x00 │ ╾─a279544[<untagged>]─╼ 02 00 00 00 00 00 00 00 │ ╾──────╼........
    0x10 │ 02 00 00 00 00 00 00 00 ╾─a280196[<untagged>]─╼ │ ........╾──────╼
    0x20 │ 02 00 00 00 00 00 00 00 ╾─a279702[<untagged>]─╼ │ ........╾──────╼
    0x30 │ 01 00 00 00 00 00 00 00                         │ ........
}
alloc280344 (Rust heap, size: 96, align: 8) {
    0x00 │ 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 │ ................
    0x10 │ 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 │ ................
    0x20 │ 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 │ ................
    0x30 │ 01 00 00 00 00 00 00 00 ╾─a280196[<untagged>]─╼ │ ........╾──────╼
    0x40 │ ╾─a279702[<untagged>]─╼ 00 00 00 00 00 00 00 00 │ ╾──────╼........
    0x50 │ ╾─a278994[<untagged>]─╼ ╾─a280316[<untagged>]─╼ │ ╾──────╼╾──────╼
}
alloc278656 (fn: ffi::release_schema)
alloc278994 (fn: ffi::release_array)
stup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
    = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
    = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:431:14
    = note: inside `test::run_test_in_process` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:589:18
    = note: inside closure at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:486:39
    = note: inside `test::run_test::run_test_inner` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:522:13
    = note: inside `test::run_test` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:555:28
    = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:301:13
    = note: inside `test::run_tests_console` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:289:5
    = note: inside `test::test_main` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:122:15
    = note: inside `test::test_main_static` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:141:5
    = note: inside `main`
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
    = note: inside closure at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
    = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
    = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
    = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
    = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:431:14
    = note: inside `std::rt::lang_start_internal` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
    = note: inside `std::rt::lang_start::<()>` at /home/ritchie46/.rustup/toolchains/nightly-2021-03-04-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5
    = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

@jorgecarleitao
Copy link
Member

I have spent my weekend on this problem 😭

I now understand what we are doing wrong, but the solution requires a major design change of the FFI.

This gist is that we currently treat children as independent from the parent and drop them when no longer in use, which is out of spec.

The C data interface requires (MUST) that the parent is responsible for all child deallocations. When we destruct a list array imported from C++, the following currently happens:

  1. the list array drop is called
  2. the child array drop is called
  3. the child array FFI release is called
  4. the list array FFI release is called, calling the child array FFI release (as per spec), causing a double free

We should remove step 3 from this sequence, and let step 4 do its job.

However, for this, we need to figure out a way to share a ref counted owned parent to all child arrays, so that when the last array is deallocated, we can release the parent and all associated child arrays on the same release. Together with this, we must consider all imported child ffi arrays as non-owned (e.g. refs), so that they do not call their own release on drop.

@ritchie46
Copy link
Contributor Author

I have spent my weekend on this problem

Oof, I feel you. Working on this depressed me as well.

However, for this, we need to figure out a way to share a ref counted owned parent to all child arrays, so that when the last array is deallocated, we can release the parent and all associated child arrays on the same release. Together with this, we must consider all imported child ffi arrays as non-owned (e.g. refs), so that they do not call their own release on drop.

Just thinking aloud here. Does it make sense to have some kind of marker (boolean) to indicate that an array is a child, and then make Drop a no-op when that marker is set and thereby eliminating step 3 (I assume that one is called in drop)?

@alamb alamb changed the title FFI listarray lead to UB. FFI listarray lead to undefined behavior. May 3, 2021
@jorgecarleitao
Copy link
Member

I solved this in arrow2 jorgecarleitao/arrow2#67 by using a different struct for an ArrowArray and a ChildArrowArray, on which one has owned refs and the other has unowned refs (and an Arc of the parent for ref count).

roee88 added a commit to roee88/arrow-rs that referenced this issue May 12, 2021
roee88 added a commit to roee88/arrow-rs that referenced this issue May 12, 2021
roee88 added a commit to roee88/arrow-rs that referenced this issue May 12, 2021
alamb pushed a commit that referenced this issue May 17, 2021
* fix: support nested types in FFI

Ported from https://github.com/jorgecarleitao/arrow2

Fix #20
Fix #251

Signed-off-by: roee88 <[email protected]>

* Removed Clone from FFI_ArrowArray

Signed-off-by: roee88 <[email protected]>

* Add nesting to FFI struct test

Signed-off-by: roee88 <[email protected]>
alamb referenced this issue in alamb/arrow-rs Jul 23, 2022
* feat: Implement multi-part upload

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* chore: simplify local file implementation

* chore: Remove pin-project

* feat: make cleanup_upload() top-level

* docs: Add some docs for upload

* chore: fix linting issue

* fix: rename to put_multipart

* feat: Implement multi-part upload for GCP

* fix: Get GCS test to pass

* chore: remove more upload language

* fix: Add guard to test so we don't run with fake gcs server

* chore: small tweaks

* fix: apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* feat: switch to quick-xml

* feat: remove throttle implementation of multipart

* fix: rename from cleanup to abort

* feat: enforce upload not readable until shutdown

* fix: ensure we close files before moving them

* chore: fix lint issue

Co-authored-by: Raphael Taylor-Davies <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>
alamb referenced this issue in alamb/arrow-rs Jul 23, 2022
* feat: Implement multi-part upload

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* chore: simplify local file implementation

* chore: Remove pin-project

* feat: make cleanup_upload() top-level

* docs: Add some docs for upload

* chore: fix linting issue

* fix: rename to put_multipart

* feat: Implement multi-part upload for GCP

* fix: Get GCS test to pass

* chore: remove more upload language

* fix: Add guard to test so we don't run with fake gcs server

* chore: small tweaks

* fix: apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* feat: switch to quick-xml

* feat: remove throttle implementation of multipart

* fix: rename from cleanup to abort

* feat: enforce upload not readable until shutdown

* fix: ensure we close files before moving them

* chore: fix lint issue

Co-authored-by: Raphael Taylor-Davies <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>
alamb added a commit that referenced this issue Jul 26, 2022
* feat: Add stream upload (multi-part upload) (#20)

* feat: Implement multi-part upload

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* chore: simplify local file implementation

* chore: Remove pin-project

* feat: make cleanup_upload() top-level

* docs: Add some docs for upload

* chore: fix linting issue

* fix: rename to put_multipart

* feat: Implement multi-part upload for GCP

* fix: Get GCS test to pass

* chore: remove more upload language

* fix: Add guard to test so we don't run with fake gcs server

* chore: small tweaks

* fix: apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* feat: switch to quick-xml

* feat: remove throttle implementation of multipart

* fix: rename from cleanup to abort

* feat: enforce upload not readable until shutdown

* fix: ensure we close files before moving them

* chore: fix lint issue

Co-authored-by: Raphael Taylor-Davies <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>

* fmt

* RAT multipart

* Fix build

* fix: merge issue

Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants