Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10149: [Rust] Improved support for externally owned memory regions #8316

Closed
wants to merge 3 commits into from
Closed

ARROW-10149: [Rust] Improved support for externally owned memory regions #8316

wants to merge 3 commits into from

Conversation

jorgecarleitao
Copy link
Member

Background

Currently, a memory region (arrow::buffer::BufferData) always knows its capacity, that it uses to drop itself once it is no longer needed. It also knows whether it needs to be dropped or not via BufferData::owner: bool.

However, this is insufficient for the purposes of supporting the C Data Interface, which requires informing the owner that the region is no longer needed, typically via a function call (release), for reference counting by the owner of the region.

This PR

This PR generalizes BufferData (and renames it to Bytes, which is more natural name for this structure, a-la bytes::Bytes) to support foreign deallocators. Specifically, it accepts two deallocation modes:

  • Native(usize): the current implementation
  • Foreign(Fn): an implementation that calls a function (which can be used to call a FFI)

FYI @pitrou , @nevi-me @paddyhoran @sunchao

Related to #8052 , which IMO is blocked by this functionality.

@github-actions
Copy link

github-actions bot commented Oct 1, 2020

@@ -253,20 +156,6 @@ impl Buffer {
self.len() / mem::size_of::<T>(),
)
}

/// Returns an empty buffer.
pub fn empty() -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not being used, and thus I dropped it.

let a = Box::new(b"hello");

let dealloc = Arc::new(|bytes: &mut Bytes| {
// println!(""); seems to be the only way to validate that this is actually called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps using a closure that mutates a captured variable?
Apparently you need to use either FnMut or a Cell:
https://stackoverflow.com/questions/38677736/passing-a-closure-that-modifies-its-environment-to-a-function-in-rust

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I had to introduce a Mutex, because a FnMut is now mutable, while the data itself is not. I am not very happy with this, but it makes sense as we cannot assume that the C data interface is thread-safe, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this. I do not think that that Fn should be mutable, as it is just performing an FFI call, over which Rust does not need to know about mutability. I am still trying to test it, but I think that something like Arc<dyn Fn(&mut Bytes)> is a better signature,.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use a Cell?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        let b = Cell::new(false);
        let dealloc = Arc::new(|bytes: &mut Bytes| {
            *b.get_mut() = true;
            assert_eq!(bytes.as_slice(), &b"hello"[1..4]);
        });

does not compile because it requires moving b to inside the closure: if we move b to inside the closure (using move), the closure is no longer Fn, but FnMut. If the closure is FnMut, we can no longer wrap it inside an Arc, as Arc is immutable. To make it immutable, we need to wrap it around a Mutex.

The difference between the code we are going here and the example in SO is that we have an immutable function, as the underlying resource that this function acts upon is outside rust. I.e. from rust's perspective, the function is Fn.

One way to test this would be to make the closure to write something to a file, and verify that that was written. I.e. test that the function mutated something outside of Rust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying out varying things, I got the following to work:

use std::cell::Cell;
use std::sync::Arc;

pub type VoidFn = Arc<dyn Fn()>;

fn main() {
    let integer = Arc::new(Cell::new(5));
    let inner = integer.clone();
    let closure = Arc::new(move || {
        inner.set(inner.get() + 1);
    });
    execute_closure(closure);
    println!("After closure: {}", integer.get());
}

fn execute_closure(func: VoidFn)
{
    func();
}

I may be missing something though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, by definition a destructor will mutate some state (visible or not), so it seems FnMut may be fine too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is awesome! Thanks a lot for the help and insight @pitrou .

The destructor mutates state. Shouldn't that state be only the state of self? IMO that is the reason rust's Drop requires a mutable ref of self, fn drop(&mut self).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, I am out of my depth here (not a Rust developer :-)).

The existing implementation was not useful to support FFI
as it did not specify how to release memory.
@jorgecarleitao jorgecarleitao marked this pull request as draft October 4, 2020 07:10
@jorgecarleitao
Copy link
Member Author

Closing in favor of #8401

@jorgecarleitao jorgecarleitao deleted the external_bytes branch October 9, 2020 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants