-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Add sumtype mutation pattern #1036
Conversation
It's possible to mutate a reference to a sumtype "in-place" using a cast to a pointer. This is likely only useful in rare cases, but can prevent an additional copy if one is passing a reference to some value from a larger context around.
I think both cases seem useful (and potentially very unsafe! :)) But maybe show the lambda trick after the sumtype one? You could try to run these within the Debug.assert-memory-balanced macro (see test/memory.carp) to ensure the code doesn't leak. I'm also open to adding some kind of auto generated in-place setter for sumtypes (ideally runtime safe) I'm not sure how it would look but seems like a weird limitation that sumtypes have to copy while structs don't. |
It doesn't work with references but if you use match you can mutate without a copy, right?
Isn't that what |
@TimDeve Right... I keep forgetting stuff that is already implemented :) Yes, match on sumtype values already takes ownership and lets you do internal mutation. I have a sense we could supply in-place mutation on refs also but haven't given it any thought. |
|
||
```clojure | ||
(defn up-to-eleven [val] | ||
(let [val* (the (Ptr (Maybe Int)) (Unsafe.coerce &val))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be:
(let [m* (the (Ptr (Maybe Int)) (Unsafe.coerce val))]
Or more "safely":
(let [reference (the (Ref (Maybe Int)) val)
m* (the (Ptr (Maybe Int)) (Unsafe.coerce reference))]
The only problem I see is that you need to make sure you "delete" the other variants, right? I'm assuming we already do that for @scolsen I'm a bit worried to have this next to something as innocuous as |
In this case you actually don't need to manually call int* Unsafe_coerce__int_MUL__int_MUL_(int* b) { return (int*)b; } We're really only utilizing void Pointer_set__int(int* p, int a) { *p = a; } But, as you pointed out, getting your type wrong will definitely produce an error: (defn counter [init]
(let [x init]
(fn []
(let-do [x* (the (Ptr Int) (Unsafe.coerce x))] ;; `x` here should be prefixed with a ref!
(Pointer.set x* (inc x)) x)))) out/main.c:9273:50: error: cast to 'int *' from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
int* Unsafe_coerce__int_int_MUL_(int b) { return (int*)b; }
^
1 error generated. One can also use a less risky approach at the cost of a manual free by making these functions over pointers and eliminating the translation from Refs to pointers: ;; stateful-closure-ptr.carp
(defn counter [init-ptr]
(fn []
(Pointer.set init-ptr (inc (Pointer.to-value init-ptr)))))
(defn main []
(Debug.assert-balanced
(let-do [dest (Pointer.unsafe-alloc 0)
f (counter dest)]
(f) (f) (f) (println* (Pointer.to-value dest)) (Pointer.free dest))))
carp -x --log-memory stateful-closure-ptr.carp
=> 3 This is similar to "destination passing" style in C, looks quite similar to it, and is generally considered better practice in C--but mucking around w/ pointers internally in a function, which would normally be unsafe in C, is actually OK in Carp since we can leverage the borrow checker to handle frees for us if we perform the casts over refs. So, it's a matter of style. The latter destination passing style looks more imperative, but it also requires additional pointer manipulations, including the manual free. It retains type safety at the cost of memory safety. The style originally proposed, the "ref casting" style, avoids manual memory management by leveraging the borrow checker but does so at the cost of type safety. In other words, you can have stateful closures, with type safety or memory safety, but not both. (unless of course there's another way to do it that does ensure both that I haven't discovered yet!) In any case, I do think breaking this out into a separate "unsafe" section is a great idea, though! @eriksvedang Looks like the stateful-closure case is valid in terms of memory-balance, in general the casts shouldn't affect memory management as the cast itself doesn't allocate (it only adds a new stack-allocated var that points to the existing ref , which is already managed elsewhere, and the stack-allocated var is auto-cleaned up once the function scope ends). ;; stateful-closure.carp
(defn counter [init]
(let [x init]
(fn []
(let-do [x* (the (Ptr Int) (Unsafe.coerce &x))]
(Pointer.set x* (inc x)) x))))
(defn main []
(Debug.assert-balanced
(let-do [f (counter 0)]
(ignore (f)) (println* (f)))))
carp -x --log--memory stateful-closure.carp
;; => 2 ;; the "invalid memory balance" branch doesn't happen |
I've not looked into the stateful closure case yet, just thinking about the sumtype one, it only works in your example because what's in your container is stack allocated, no? Surely this must be leaking (defn up-to-eleven [val]
(let [m* (the (Ptr (Maybe String)) (Unsafe.coerce val))]
(Pointer.set m* (Maybe.Just @"11"))))
(defn main []
(ignore (let-do [volume (Maybe.Just @"1")]
(up-to-eleven &volume)
volume))) Edit: =================================================================
==26==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 2 byte(s) in 1 object(s) allocated from:
#0 0x4974ed in malloc (/root/app/out/Untitled+0x4974ed)
#1 0x4cb16f in String_copy /root/Carp/core/carp_string.h:64:18
#2 0x51d1a9 in main /root/app/out/main.c:9242:26
#3 0x7f3ed840dcc9 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26cc9)
SUMMARY: AddressSanitizer: 2 byte(s) leaked in 1 allocation(s). |
The same way that this code using (defn main []
(let-do [ok @"not ok"]
(set! ok @"ok")
(println* ok))) Generate this: int main(int argc, char** argv) {
carp_init_globals(argc, argv);
/* let */ {
static String _6 = "not ok";
String *_6_ref = &_6;
String _7 = String_copy(_6_ref);
String ok = _7;
static String _13 = "ok";
String *_13_ref = &_13;
String _14 = String_copy(_13_ref);
String_delete(ok); <--------------- Delete here
ok = _14; // String = String
String _20 = StringCopy_str(ok);
String* _21 = &_20; // ref
IO_println(_21);
String_delete(_20);
}
return 0;
} You need to call destructors on heap allocated stuff when you replace them or you get a leak. |
I'm gonna read up on "destination passing", seems interesting 🙂 |
@TimDeve You are indeed correct about the sumtype case w/ managed interior values like Carp ∎ carp -x --log-memory sumtype-mutate.carp
(Just @"Mutated!")
Invalid memory balance: 1
[RUNTIME ERROR] '"out/Untitled"' exited with return value 1. I was assuming the borrow checker would just handle the original heap allocated values...which makes no sense 😄 Maybe this pattern isn't as useful after all! |
Could wrap it in something that consumes/borrows the value maybe? Edit: (Debug.sanitize-addresses)
(defn up-to-eleven [val]
(let [m* (the (Ptr (Maybe String)) (Unsafe.coerce val))
_ (Pointer.to-value m*)]
(Pointer.set m* (Maybe.Just @"11"))))
(defn main []
(Debug.memory-logged
(ignore (let-do [volume (Maybe.Just @"1")]
(up-to-eleven &volume)
volume)))) |
It works if we get the value at the pointer and call (defn mutate [m]
(let [m* (the (Ptr (Maybe String)) (Unsafe.coerce m))]
(match-ref m
_ (do (Maybe.delete (Pointer.to-value m*)) (Pointer.set m* (Maybe.Just @"Mutated!"))))))
(defn main []
(Debug.assert-balanced
(let-do [m (Maybe.Just @"Mutant")]
(mutate &m) (println* &m))) This doesn't throw a memory balance error, but I haven't checked the emitted C to see if this really makes sense or not yet. EDIT: Ohh nice, no need to call Looks like we could define a macro around that. |
Yeah that's what I was about to say. Reminds me vaguely of the pair destructure macro I had, still need to make a version that work with structs... |
Likewise, the closure needs to change slightly to handle heap allocated values: (defn appender [init]
(let [x init]
(fn []
(let-do [x* (the (Ptr String) (Unsafe.coerce x))]
(Pointer.set x* (append x x)) (Pointer.to-value x*)))))
(defn main []
(Debug.assert-balanced
(let-do [f (appender "he")]
(ignore (f)) (println* (f)))) |
The similarities are palpable! I wonder if there's a more generic name/terminology we can use for what we're accomplishing here? Roughly speaking, all these cases involve coercing a reference value in a parent scope to a pointer, manipulating it, and ensuring its previous allocations are freed properly. |
I guess we could rename our current Not sure if there is something we're missing here and if it's something desirable to have in core... @eriksvedang has probably thought about that type of stuff before. |
I think The other option is just to change the compiler to allow calling
Would be valid (it currently isn't allowed). |
We could leave the compiler untouched and implement this as some kind of |
I'm curious about how that would collide with stack allocated lambdas (which is something I really want). |
Good question. I think it should be ok since the lambda will be in scope so long as the enclosing We could also call this |
@scolsen This is a safe way to do a stateful closure: (deftype (Box a) [val a])
(defn counter [init]
(let [x (Box.init init)]
(fn []
(do
(Box.set-val! &x (inc @(Box.val &x)))
@(Box.val &x)))))
(defn main []
(let-do [f (counter 0)]
(ignore (f))
(ignore (f))
(println* (f)))) ; Prints 3 |
Very nice! Simple and elegant. So there's no need for all this pointer mumbo jumbo. I'm trying to think if there's any cases in which using a I guess the one advantage to the Pointer version is that it avoids one copy on each call (I think, but maybe not?)--but I think cases in which that matters would be pretty rare. |
Yeah, I was gonna say this one does a lot of copies, it's only a problem with arrays really. |
Not in this particular case:
The compiler simply reduced it to the literal 3. |
Very interesting experiments and discussion... The reason lambdas are not allowed to mutate is mainly because they don't capture a shared environment, so two lambdas in the same scope each get their own env struct. Which makes mutation behave in a quite weird way, of course (one lambda can't see the changes from the other one.) The Box trick works around the existing checks anyways so I'm not sure how I feel about it, but it's quite elegant (and ingenious) to be sure 😃 |
one name for the pattern:
could be |
I think the first example would be simpler as: (defn foo []
(let [x 1]
(fn [y]
(do
(Pointer.set (address x) y)
x))))
(defn main []
(println* ((foo) 2)))
|
Along the same vein the stateful counter example can be simplified as: (defn counter [init]
(let [x init]
(fn []
(do (Pointer.set (address x) (inc x)) x))))
(defn main []
(let-do [f (counter 0)]
(ignore (f)) (ignore (f)) (ignore (f)) (println* (f)))) I could've sworn I originally tried this using |
It's generally unsafe to return a captured variable because you'll potentially get double frees, we discussed it in the other conversation, ideally the compiler would not allow it. |
Ahhh that's right--it's fine for the |
I actually don't remember in which thread we discussed it in, was it just between you, Erik and me on Gitter? Did we ever create an issue? |
Ok, I'll create an issue in the morning so we track it. |
Thanks! |
Is this ready to be merged? |
We talked about putting it somewhere else as it is unsafe. |
I started working toward this tonight--I'll likely have something by the end of this week! |
What should we do about this PR @scolsen ? :) |
I'll close this for now, please re-open any time. |
NOTE: This depends on PR #1012
It's possible to mutate a reference to a sumtype "in-place" using a cast
to a pointer. This is likely only useful in rare cases, but can prevent
an additional copy if one is passing a reference to some value from a
larger context around.
Maybe someone wants to double check the C this generates--it may prove to be frivolous, but from an expressive standpoint I find the pattern useful.
EDIT: The real wow-factor motivating case is this is a fancy way of getting around the "lambdas can't
set!
bindings not defined in the lambda:Actually, this even gives one a way to define the stateful closures described in #694 :
In this case the lambda produced by
foo
"remembers" the context given by thelet
and will addx
to itself each time its called.Maybe I should not focus so much on sumtypes here and re-frame this around this case, which seems more interesting?