You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current memory management design of rbc is prone to memory leaks and we should design a better way to deal with allocation of custom objects.
Unless I am mistaken, the current rbc type support is as follows:
primitive types such as ints and floats are supported. These don't require any memory allocation
more complex python types such as list, dict and str are not supported
the only supported type which requires memory allocation are omnisci_backend.Buffer and its subclasses such as Array.
In the following I will refer only to Array, but obviously everything can be applied also to other current and future types which require memory allocation.
the compiler searches for all the calls to Array and records the LLVM Value where the result of the allocation is stored. Conceptually, it's equivalent as remembering that the variables a, b, and c must be eventually freed
it realizes that c is returned to the caller and must survive
it inserts calls to free(a) and free(b) before returning
(Note: this is the theory. At the moment of writing, it is buggy and doesn't work, see #377)
This strategy works only for very simple cases but it is fundamentally problematic and cannot be generalized to more complex cases. Solving this it's basically equivalent to the halting problem so we need a better way.
Normally, virtual machines solve the problem by using a runtime garbage collector and/or reference counting, but I don't think this is reasonable for our use case. Let me write down some ideas/proposals for how to solve the issue.
Proposal 0: improve the current algorithm
I call it proposal 0 because I don't think it's really an option, but I add it for completeness. We could try to improve our current algorithm to catch the most common uses cases. Maybe it is even possible to detect unsupported use cases and emit compile error in that cases.
Pros:
mostly transparent for the end user (when it works)
Cons:
hard to implement
compile time errors which might be hard to tackle down and/or confusing for the user
if we fail to write our logic correctly, there is a high risk to cause memory leaks and/or crashes
Proposal 1: explicit memory management
We pretend that the language used to write user functions is Python, but actually it's a statically typed subset of it which is not far from C. So, one option is to stop pretending and let the users to explicit free() their memory. Something like this:
fromrbc.omnisci_backendimportArray@omnisci('float64[](void)')deffunc():
a=Array(10, types.float64)
b=Array(10, types.float64)
c=Array(10, types.float64)
free(a) # or maybe even del afree(b)
returnc
If we want to be advanced we can even provide some syntax niceties (although I don't know how easy it is to add this functionality to numba):
fromrbc.omnisci_backendimportArray@omnisci('float64[](void)')deffunc():
withArray(10, types.float64) asa:
withArray(10, types.float64) asb:
c=Array(10, types.float64)
# a and b are automatically freedreturnc
Pros:
the semantics is very easy to understand and it's obvious to C/C++ programmers
easy to implement
Cons:
error prone: this can lead to leaks and/or crashes. Possibly, this might cause also a crash in the omniscidb server itself
the semantics is confusing for a casual reader: it looks like you are writing python, but as soon as you introduce the @omnisci decorator you suddenly need to manage your memory, which is unexpected
Proposal 2: explicit ownership rules (a.k.a. "The Rust Way")
We could augment our typesytem with ownership rules. Rust has similar concepts.
Basically, the idea is that each Array can have one and only one owner which is responsible to deallocate it. When you pass the value around, you pass a borrowed reference, unless you explicitly give away the ownership. When you return the value, you are also returning the ownership. The typesytem can also statically check that you cannot free the object as long as there are borrowed references around.
This is basically proposal 0 "done right", because it no longer depends on some heuristic but on typesystem rules which can be proven correct. To fully work, we need some syntax to transfer the ownership and possibly a way to free the object early (which basically means to transfer the ownership to nobody).
Also, some innocent-looking code will not work, e.g.:
The idea behind this proposal is that we are not interested in a general memory allocation strategy. What we really want is to be able to call UDF/UDTF without leaking memory, and we can do that with very well defined lifetime rules:
User functions can allocate as many Arrays as they like
The only way for anArray to escape to the "external world" is be to returned to the caller
All other allocated Arrays are automatically destroyed when the function exits
This means for example that user functions are not allowed to persist Arrays between calls, for example by putting them in some global state. But I don't think that this is possible at all at the moment.
One possible way to implement this is to associate all allocations to an explicit "memory manager" which you get from the outside:
deffunc(mgr):
# mgr is the memory managera=Array(mgr, 10, types.float64)
b=omnisci_np.zeros(mgr, 10, types.float64)
...
In the example above, all the memory associated to mgr will be freed when the caller decides so, and it is impossible to allocate memory if you don't have one (because allocation functions expect it). If you need to call a function which internally allocates, you must pass it around explicitly, such as in the call to omnisci_np.zeros() above.
The idea is that the manager is created by omniscidb at the point in which it calls UDF/UDTF. Something like this (in some sort of pseudocode):
voidcall_udf(udf_t function) {
MemoryManager mgr = CreateMemoryManager();
auto result = function(mgr, other_arguments, ...);
mgr.keep_alive(result); // to make sure it is not freed
mgr.free_everythig();
// do something with result
}
@guilhermeleobas suggests that we can reuse the TableManager for that, since there is already a way to pass it to the UDTF.
Pros:
mostly transparent for the end users, apart that they have to pass this mgr around everywhere
it is trivial to prove that it works
it makes it possible to implement very fast allocation strategies. For example, a MemoryManager could decide to allocate a fixed chunk of RAM and keep a pointer to indicate how much it has been used; malloc() will be implemented by just incrementing the pointer (and allocate a new chunk of RAM if we have reached the end of the current one), which is much faster than the stdlib malloc(). (NOTE: this is more or less how the PyPy GC allocates memory and it's one of the reasons it's so fast).
easy to implement
Cons:
you have to pass around mgr everywhere
you cannot have the standard signature for some of the Array API functions, because you need to pass mgr to those which can possibly allocate memory (such as zeros in the example above)
Proposal 3.2: implicit free-it-all strategy
This is the same as 3.1, but we hack at the code generated by numba to implicitly pass mgr around. So for example, you call zeros(10, types.float64) and the call automatically becomes zeros(mgr, 10, types.float64).
Pros:
completely transparent for the end users
you can use the standard Array API signatures
Cons:
harder to implement (I don't even know if numba provides the needed hooks)
explicit is better than implicit. If the user see the mgr, it is trivial to explain that the memory belongs to it. If it's a hidden implementation detail, suddenly the lifetime rules become magical and/or obscure
Proposal 3.3: global mgr
Same as Proposal 3.2, but instead of passing it around, we rely on a global (or thread-local) mgr to be reachable. I think this quickly leads to problems if we have concurrent and/or parallel and/or recursive calls, so it should just be avoided.
Conclusions
Personally, I think that the best is Proposal 3.1, "free-it-all strategy". The cons don't look too bad to me, and the pros are very important, especially the fact that it's easy to implement, which means less bugs.
The text was updated successfully, but these errors were encountered:
#380)
- add rbc.stdlib.array_api, which rbc users can import to use the array-like API
- the array api is no longer specific to omnisci, now it works also in normal @rjit functions. This makes it much easier to develop and test
- introduce rbclib. This is the embryo of a "runtime library" to be called by the code generated by rbc. In particular, it provides rbclib_allocate_varlen_buffer, which can be used to allocate arrays for non-omnisci targets
- store the names of the allocation/deallocation array functions in the TargetInfo. This makes it possible to call the appropriate allocate_varlen_buffer function depending on whether we are targeting omnisci or not
- introduce the "debug allocator" which keeps track of allocations/deallocations and detect memory leaks (see e.g issue #378 )
The current memory management design of rbc is prone to memory leaks and we should design a better way to deal with allocation of custom objects.
Unless I am mistaken, the current rbc type support is as follows:
omnisci_backend.Buffer
and its subclasses such asArray
.In the following I will refer only to
Array
, but obviously everything can be applied also to other current and future types which require memory allocation.Consider the following code:
The current strategy works as follows:
Array
and records the LLVMValue
where the result of the allocation is stored. Conceptually, it's equivalent as remembering that the variablesa
,b
, andc
must be eventually freedc
is returned to the caller and must survivefree(a)
andfree(b)
before returning(Note: this is the theory. At the moment of writing, it is buggy and doesn't work, see #377)
This strategy works only for very simple cases but it is fundamentally problematic and cannot be generalized to more complex cases. Solving this it's basically equivalent to the halting problem so we need a better way.
Normally, virtual machines solve the problem by using a runtime garbage collector and/or reference counting, but I don't think this is reasonable for our use case. Let me write down some ideas/proposals for how to solve the issue.
Proposal 0: improve the current algorithm
I call it proposal 0 because I don't think it's really an option, but I add it for completeness. We could try to improve our current algorithm to catch the most common uses cases. Maybe it is even possible to detect unsupported use cases and emit compile error in that cases.
Pros:
Cons:
Proposal 1: explicit memory management
We pretend that the language used to write user functions is Python, but actually it's a statically typed subset of it which is not far from C. So, one option is to stop pretending and let the users to explicit
free()
their memory. Something like this:If we want to be advanced we can even provide some syntax niceties (although I don't know how easy it is to add this functionality to numba):
Pros:
Cons:
@omnisci
decorator you suddenly need to manage your memory, which is unexpectedProposal 2: explicit ownership rules (a.k.a. "The Rust Way")
We could augment our typesytem with ownership rules. Rust has similar concepts.
Basically, the idea is that each
Array
can have one and only one owner which is responsible to deallocate it. When you pass the value around, you pass a borrowed reference, unless you explicitly give away the ownership. When you return the value, you are also returning the ownership. The typesytem can also statically check that you cannot free the object as long as there are borrowed references around.This is basically proposal 0 "done right", because it no longer depends on some heuristic but on typesystem rules which can be proven correct. To fully work, we need some syntax to transfer the ownership and possibly a way to free the object early (which basically means to transfer the ownership to nobody).
Also, some innocent-looking code will not work, e.g.:
The code above would be reject by the typesytem because the type of
b
has two different ownership rules in the two branches of the if.Pros:
Cons:
Proposal 3.1: free-it-all strategy (a.k.a. "poor man's GC")
The idea behind this proposal is that we are not interested in a general memory allocation strategy. What we really want is to be able to call UDF/UDTF without leaking memory, and we can do that with very well defined lifetime rules:
Array
s as they likeArray
to escape to the "external world" is be toreturn
ed to the callerArrays
are automatically destroyed when the function exitsThis means for example that user functions are not allowed to persist
Array
s between calls, for example by putting them in some global state. But I don't think that this is possible at all at the moment.One possible way to implement this is to associate all allocations to an explicit "memory manager" which you get from the outside:
In the example above, all the memory associated to
mgr
will be freed when the caller decides so, and it is impossible to allocate memory if you don't have one (because allocation functions expect it). If you need to call a function which internally allocates, you must pass it around explicitly, such as in the call toomnisci_np.zeros()
above.The idea is that the manager is created by
omniscidb
at the point in which it calls UDF/UDTF. Something like this (in some sort of pseudocode):@guilhermeleobas suggests that we can reuse the TableManager for that, since there is already a way to pass it to the UDTF.
Pros:
mgr
around everywhereMemoryManager
could decide to allocate a fixed chunk of RAM and keep a pointer to indicate how much it has been used;malloc()
will be implemented by just incrementing the pointer (and allocate a new chunk of RAM if we have reached the end of the current one), which is much faster than the stdlibmalloc()
. (NOTE: this is more or less how the PyPy GC allocates memory and it's one of the reasons it's so fast).Cons:
mgr
everywheremgr
to those which can possibly allocate memory (such aszeros
in the example above)Proposal 3.2: implicit free-it-all strategy
This is the same as 3.1, but we hack at the code generated by numba to implicitly pass
mgr
around. So for example, you callzeros(10, types.float64)
and the call automatically becomeszeros(mgr, 10, types.float64)
.Pros:
Cons:
mgr
, it is trivial to explain that the memory belongs to it. If it's a hidden implementation detail, suddenly the lifetime rules become magical and/or obscureProposal 3.3: global mgr
Same as Proposal 3.2, but instead of passing it around, we rely on a global (or thread-local) mgr to be reachable. I think this quickly leads to problems if we have concurrent and/or parallel and/or recursive calls, so it should just be avoided.
Conclusions
Personally, I think that the best is Proposal 3.1, "free-it-all strategy". The cons don't look too bad to me, and the pros are very important, especially the fact that it's easy to implement, which means less bugs.
The text was updated successfully, but these errors were encountered: