Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak moving arrays to and from python #333

Closed
mikeWShef opened this issue Jun 28, 2022 · 9 comments
Closed

Possible memory leak moving arrays to and from python #333

mikeWShef opened this issue Jun 28, 2022 · 9 comments

Comments

@mikeWShef
Copy link

mikeWShef commented Jun 28, 2022

Hi I brought this up on the gitter and people said to post here. It seems like there is a memory leak when moving arrays between rust and python.

I have made an minimal example repo here:
https://github.com/mikeWShef/leak

I am explicitly copying the arrays using into_pyarray and as_array, but as far as I know this should not be an issue.

The jupyter notebook example will quickly fill >30GB of memory when really there are only a maximum of 6 arrays of 2048**2 float64 so ~200MB.

When the rust program ends the memory is cleaned up again, but adding gc.collect() into the python function dosn't fix the leek.

Windows 11, python 3.8, numpy 1.20.3 tested

@adamreichold
Copy link
Member

From a quick glance at your example, I think you are running into PyO3-specific issue where objects on the Python allocated from Rust have their reference counts decreased only when control returns from the Rust module which would explain your observation that:

this calls the pyhton function from rust in a loop which leaks memory for some reason?

(As you note you copy the input array for each invocation so each iteration of the nested for loops allocates one more copy of the input array from the Rust code.)

To verify that this is indeed the issue, you could modify your leek_memory method to move at least the outer loop to the Python side, e.g.

#[pyclass(unsendable)]
struct Looper{
    funcs: Vec<BoxedFunc>,
    input: Array1<f64>,
    output: Array1<f64>,
}

#[pymethods]
impl Looper{
    #[new]
    pub fn new(size: usize) -> Looper{
        let funcs: Vec<BoxedFunc> = vec![];
        Looper{
            funcs,
            input: Array1::zeros(size),
            output: Array1::zeros(size),
        }
    }

    pub fn add_func(&mut self, callable: PyObject) {
        self.funcs.push(Box::new(UserDefinedFunc{callable}));
    }

    pub fn leek_memory(&mut self){
            for func in &mut self.funcs {
                func.call_func(self.input.view_mut(), self.output.view_mut());
            }
            self.input.assign(&self.output);
    }
}

The details of this scheme of managing memory are explained in the "GIL-bound memory" section of https://pyo3.rs/v0.16.4/memory.html

(Btw. &mut ArrayViewMut1 is a bit of redundant indirection as you can just pass the view that you create in each iteration by value.)

@mikeWShef
Copy link
Author

mikeWShef commented Jun 28, 2022

That does seem to fix it, and thanks for the tip with array views. But isn't the original situation I was in more similar to the example directly below "In general we don't want unbounded memory growth during loops! One workaround is to acquire and release the GIL with each iteration of the loop." on the page which you linked? The loop isn't inside the Python::with_gil closure?

Adding the example code from below that text to the class seems to also cause a memory leak (but obviously much slower):


use pyo3::types::PyString;

#[pymethods]
impl Looper{
    ...

    pub fn make_strings(&mut self, number:usize){
        for _ in 0..number {
            Python::with_gil(|py| -> PyResult<()> {
                let hello: &PyString = py.eval("\"Hello World!\"", None, None)?.extract()?;
                println!("Python says: {}", hello);
                Ok(())
            }).expect("didn't work"); // only one copy of `hello` at a time
        }
    }
    ...
}

It seems like some python code needs to actually run between iterations of the loop?

For the array example is there a standard way of doing it which avoids copies/ new allocations? I would be keen to make / add an example if there is, I couldn't find one but it seems like a common task.

@birkenfeld birkenfeld changed the title Possible memory leek moving arrays to and from python Possible memory leak moving arrays to and from python Jun 28, 2022
@adamreichold
Copy link
Member

adamreichold commented Jun 28, 2022

The loop isn't inside the Python::with_gil closure?

The problem here is that the GIL is already held when the with_gil in your example happens because it was acquired when your leek_memory method was called and is held for the whole duration of the call.

For the array example is there a standard way of doing it which avoids copies/ new allocations? I would be keen to make / add an example if there is, I couldn't find one but it seems like a common task.

I think in your example, the solution would to not allocate bare ndarray::Array1 but already start with PyArray1, e.g.

#[pymethods]
impl Looper{
    pub fn leek_memory(&mut self, size:usize, number:usize){
        let input = PyArray1::<f64>::zeros(size);
        let output = PyArray1::<f64>::zeros(size);
        for _i in 0..number {
            for func in &mut self.funcs {
                func.call_func(input, output);
            }
            output.copy_to(input).unwrap();
        }
    }
}

impl Func for UserDefinedFunc{
    fn call_func(&mut self, input: &PyArray1<f64>, output: &PyArray1<f64>){
        Python::with_gil(|py| -> PyResult::<()> {
            let py_ob = self.callable.call1(py, (input,))?;
            let py_array = py_ob.as_ref(py).downcast::<PyArray1<f64>>()?;
            let rust_array = unsafe { py_array.as_array() };
            
            let output = unsafe { output.as_array_mut() };

            Zip::from(output)
                .and(rust_array)
                .for_each(|o, &i| {*o += i});

            Ok(())
        });
    }
}

This might still leave you with the arrays that your callables allocate for their return values. But those could also be handled by providing a third temp: PyArray1<f64> to the callables as an argument into which they write their results. (Depending on the actual context, it might be preferable to change the contract of the callables so they get output as an additional argument and are expected to add their results to it themselves.)

@mikeWShef
Copy link
Author

The problem here is that the GIL is already held when the with_gil in your example happens because it was acquired when your leek_memory method was called and is held for the whole duration of the call.

OK that makes sense, though personally, my naive reading of the example in the memory management doc would still lead me to write leaky code by default? Is there a way to use that example code that wouldn't allocate 10 strings on the python heap as it promises not to do?

Maybe this is beyond an issue now, but the problem in my case with just using PyArrays is that the looper does a lot more than this example including using external libraries that are compatible with Array1's but not PyArray1's. The 'funcs' are also not all user defined, there is a mechanism to use built in ones (using Array1s to avoid copying) for common behaviour. Maybe the best way is for looper to store two vecs of funcs: one using PyArray1s and the other using Array1s then explicitly copy the values over in the equivalent of 'looper.leak_memory'. I can't see where else the PyArrays could be owned where they wouldn't outlive the GIL. It seems a bit inelegant, though I suppose not more so than copying over for each 'func'. Thanks for your help.

@adamreichold
Copy link
Member

adamreichold commented Jun 29, 2022

Is there a way to use that example code that wouldn't allocate 10 strings on the python heap as it promises not to do?

This is usually the case if this happens on separate long-lived thread that was started via e.g. Rust's std::thread::spawn. In that case, the Python::with_gil is indeed the top-level call to that method and does acquire the GIL instead of finding it already acquired.

The only reliable workaround that I know of for long-running computations when even GILPool does not help is to "call myself", meaning to invoke the inner loop body, via Python::run or Python::eval. But maybe @davidhewitt has another idea?

Maybe this is beyond an issue now, but the problem in my case with just using PyArrays is that the looper does a lot more than this example including using external libraries that are compatible with Array1's but not PyArray1's.

Ideally, all your functions based on ndarray would work in terms of ArrayView(Mut) or better yet ArrayBase<S,D> with S: Data(Mut)<f64>. In that case, you can always go from PyArray1 to ArrayView(Mut)1 without copying.

EDIT: The main point is that it is possible to have Rust views into data owned by Python, but making data available to Python invalidates the invariants required by the Rust-side data structures due to pervasive sharing. (At least the possibility of sharing even if it does not actually happen. Just passing a NumPy array to a Python function increases its reference count.)

@mikeWShef
Copy link
Author

OK that makes a lot of sense, so anything that gets passed back and forth should just be a view into python's heap presumably passed in (eg to looper.leak_memory) or made by rust and returned, and can't be owned by a rust struct as it would outlive the gil(?) unless you hold the Py<PyObject> style reference that needs the gil to de ref anyway and is presumably slightly slower. And I can get ArrayView/Mut from these as I need, so it should all be easily compatible with any external library. The rest of the data (internal buffers, etc.: stuff that is not accesible through python and outlives the GIL) can be owned ndarray Arrays and it should all just work together. Have I got that right?

And just to be clear, this only affects python memory allocated from rust, so the user defined python functions can be as wastefull as they want without causing a leak?

@adamreichold
Copy link
Member

Have I got that right?

Appears correct to me.

And just to be clear, this only affects python memory allocated from rust, so the user defined python functions can be as wastefull as they want without causing a leak?

Yes, I think so. This affects the data is "owned via a shared reference whose lifetime is bound to the GIL", i.e. via a GILPool. Also note that there is work underway on the PyO3 side to improve the overall situation, e.g. PyO3/pyo3#1308

@adamreichold
Copy link
Member

Will close this for now as we are I think all concerned are convinced that this is not actually a memory leak. But please feel free to continue the discussion here or at https://github.com/PyO3/rust-numpy/discussions

@davidhewitt
Copy link
Member

Thanks @adamreichold for the very helpful information shared here. Just two things I'd like to add:

  • GILPool should always be able to avoid PyO3's framework overhead. Writing a loop where you create a new GILPool at the beginning of each loop iteration can help you avoid memory leaks. You should just be extremely careful that no PyO3 references created inside the loop ever escape it, or you expose yourself to use after free.

  • I think this memory behaviour is a huge footgun which I've wanted to remove from PyO3's core for a couple years already. Given the big impact this will have on PyO3's API, it was necessary to refine the library first (e.g. remove pyproto) and I want to make sure that we get the future API right as well as able to have a straightforward migration path for existing code. I'm hoping that in PyO3 0.18 (the first version which will not have pyproto) we will be able to release a PyO3 that contains both old and new APIs, so that users can begin gradual migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants