Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add sumtype mutation pattern #1036

Closed
wants to merge 1 commit into from

Conversation

scolsen
Copy link
Contributor

@scolsen scolsen commented Nov 30, 2020

NOTE: This depends on PR #1012

It's possible to mutate a reference to a sumtype "in-place" using a cast
to a pointer. This is likely only useful in rare cases, but can prevent
an additional copy if one is passing a reference to some value from a
larger context around.

Maybe someone wants to double check the C this generates--it may prove to be frivolous, but from an expressive standpoint I find the pattern useful.

EDIT: The real wow-factor motivating case is this is a fancy way of getting around the "lambdas can't set! bindings not defined in the lambda:

鲤 (defn foo [] (let [x 1] (fn [y] (set! x y))))
I can’t `set!` the variable `x` at line 1, column 39 in 'REPL' because it's defined outside the lambda. at REPL:1:1.
(defn foo [] (let [x 1] (fn [y] (let-do [x* (the (Ptr Int) (Unsafe.coerce &x))] (Pointer.set x* y) x))))
((foo) 2)
2
=> 0

Actually, this even gives one a way to define the stateful closures described in #694 :

(defn foo [] (let [x 1] (fn [] (let-do [x* (the (Ptr Int) (Unsafe.coerce &x))] (Pointer.set x* (+ x x)) x))))
(defn bar [] (let-do [f (foo)] (ignore (f)) (f)))
(bar)
Compiled to 'out/Untitled' (executable)
4
=> 0
鲤 

In this case the lambda produced by foo "remembers" the context given by the let and will add x to itself each time its called.

Maybe I should not focus so much on sumtypes here and re-frame this around this case, which seems more interesting?

It's possible to mutate a reference to a sumtype "in-place" using a cast
to a pointer. This is likely only useful in rare cases, but can prevent
an additional copy if one is passing a reference to some value from a
larger context around.
@eriksvedang
Copy link
Collaborator

I think both cases seem useful (and potentially very unsafe! :))

But maybe show the lambda trick after the sumtype one?

You could try to run these within the Debug.assert-memory-balanced macro (see test/memory.carp) to ensure the code doesn't leak.

I'm also open to adding some kind of auto generated in-place setter for sumtypes (ideally runtime safe) I'm not sure how it would look but seems like a weird limitation that sumtypes have to copy while structs don't.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

It doesn't work with references but if you use match you can mutate without a copy, right?

(use Array)

(defn main []
 (let-do [maybs (Maybe.Just [1 2 3])]
   (ignore (match maybs
            (Maybe.Just arr) (Maybe.Just (endo-map &inc arr))
            _ maybs))
   (println* "No new alloc?")))

Isn't that what map does?

@eriksvedang
Copy link
Collaborator

@TimDeve Right... I keep forgetting stuff that is already implemented :) Yes, match on sumtype values already takes ownership and lets you do internal mutation. I have a sense we could supply in-place mutation on refs also but haven't given it any thought.


```clojure
(defn up-to-eleven [val]
(let [val* (the (Ptr (Maybe Int)) (Unsafe.coerce &val))]
Copy link
Contributor

@TimDeve TimDeve Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be:

(let [m* (the (Ptr (Maybe Int)) (Unsafe.coerce val))]

Or more "safely":

  (let [reference (the (Ref (Maybe Int)) val)
        m* (the (Ptr (Maybe Int)) (Unsafe.coerce reference))]

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

The only problem I see is that you need to make sure you "delete" the other variants, right? I'm assuming we already do that for ! functions.

@scolsen I'm a bit worried to have this next to something as innocuous as Ref-let, could we split this pattern docs into a "safe" and "unsafe" version (I'm thinking "rust book" vs "rustonmicon")? Ideally we would push people to stick to the safe stuff unless they really need to.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

The only problem I see is that you need to make sure you "delete" the other variants, right? I'm assuming we already do that for ! functions.

@scolsen I'm a bit worried to have this next to something as innocuous as Ref-let, could we split this pattern docs into a "safe" and "unsafe" version (I'm thinking "rust book" vs "rustonmicon")? Ideally we would push people to stick to the safe stuff unless they really need to.

In this case you actually don't need to manually call delete on anything. In terms of borrowing, this is actually relatively safe, since Unsafe.coerce in this case is just casting an existing Ref (which is already managed somewhere else and will remain so) to a Ptr which is exactly what a Ref is under the hood--so it's really a no-op that just enables you to set the value at the address of a Ref. The real risk you run with this technique is accidentally casting to a pointer of a different type:

int* Unsafe_coerce__int_MUL__int_MUL_(int* b) { return (int*)b; }

We're really only utilizing Unsafe.coerce to convince Carp's type system that Ref === Ptr and we can call Pointer operations on a Ref--it has no effect in the emitted c except for setting a new stack-allocated var that points to the original Ref and will be auto-reclaimed once the function scope ends. And Pointer.set just set's the value at the pointer, which will be safe so long as you didn't accidentally cast to a different inhabitant type:

void Pointer_set__int(int* p, int a) { *p = a; }

But, as you pointed out, getting your type wrong will definitely produce an error:

(defn counter [init]                                                                                                                                                                                                                                                   
  (let [x init]                                                                                                                                                                                                                                                        
    (fn []                                                                                                                                                                                                                                                             
      (let-do [x* (the (Ptr Int) (Unsafe.coerce x))] ;; `x` here should be prefixed with a ref!                                                                                                                                                                                                                 
        (Pointer.set x* (inc x)) x))))   
out/main.c:9273:50: error: cast to 'int *' from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
int* Unsafe_coerce__int_int_MUL_(int b) { return (int*)b; }
                                                 ^
1 error generated.

One can also use a less risky approach at the cost of a manual free by making these functions over pointers and eliminating the translation from Refs to pointers:

;; stateful-closure-ptr.carp 
(defn counter [init-ptr]                                                                                                                                                                                                                                               
    (fn []                                                                                                                                                                                                                                                             
      (Pointer.set init-ptr (inc (Pointer.to-value init-ptr)))))                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                       
(defn main []                                                                                                                                                                                                                                                          
  (Debug.assert-balanced                                                                                                                                                                                                                                               
    (let-do [dest (Pointer.unsafe-alloc 0)                                                                                                                                                                                                                             
             f (counter dest)]                                                                                                                                                                                                                                         
      (f) (f) (f) (println* (Pointer.to-value dest)) (Pointer.free dest)))) 

carp -x --log-memory stateful-closure-ptr.carp 
=> 3

This is similar to "destination passing" style in C, looks quite similar to it, and is generally considered better practice in C--but mucking around w/ pointers internally in a function, which would normally be unsafe in C, is actually OK in Carp since we can leverage the borrow checker to handle frees for us if we perform the casts over refs.

So, it's a matter of style. The latter destination passing style looks more imperative, but it also requires additional pointer manipulations, including the manual free. It retains type safety at the cost of memory safety. The style originally proposed, the "ref casting" style, avoids manual memory management by leveraging the borrow checker but does so at the cost of type safety.

In other words, you can have stateful closures, with type safety or memory safety, but not both. (unless of course there's another way to do it that does ensure both that I haven't discovered yet!)

In any case, I do think breaking this out into a separate "unsafe" section is a great idea, though!

@eriksvedang Looks like the stateful-closure case is valid in terms of memory-balance, in general the casts shouldn't affect memory management as the cast itself doesn't allocate (it only adds a new stack-allocated var that points to the existing ref , which is already managed elsewhere, and the stack-allocated var is auto-cleaned up once the function scope ends).

;; stateful-closure.carp
(defn counter [init]                                                                                                                                                                                                                                                   
  (let [x init]                                                                                                                                                                                                                                                        
    (fn []                                                                                                                                                                                                                                                             
      (let-do [x* (the (Ptr Int) (Unsafe.coerce &x))]                                                                                                                                                                                                                  
        (Pointer.set x* (inc x)) x))))                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                       
(defn main []                                                                                                                                                                                                                                                          
  (Debug.assert-balanced                                                                                                                                                                                                                                               
    (let-do [f (counter 0)]                                                                                                                                                                                                                                            
      (ignore (f)) (println* (f))))) 

carp -x --log--memory stateful-closure.carp
;; => 2  ;; the "invalid memory balance" branch doesn't happen

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

I've not looked into the stateful closure case yet, just thinking about the sumtype one, it only works in your example because what's in your container is stack allocated, no?

Surely this must be leaking @"1", what would be freeing it?

(defn up-to-eleven [val]
  (let [m* (the (Ptr (Maybe String)) (Unsafe.coerce val))]
    (Pointer.set m* (Maybe.Just @"11"))))

(defn main []
  (ignore (let-do [volume (Maybe.Just @"1")]
           (up-to-eleven &volume)
           volume)))

Edit:
This is the error I get running it with the Leak Sanitizer on:

=================================================================
==26==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 2 byte(s) in 1 object(s) allocated from:
    #0 0x4974ed in malloc (/root/app/out/Untitled+0x4974ed)
    #1 0x4cb16f in String_copy /root/Carp/core/carp_string.h:64:18
    #2 0x51d1a9 in main /root/app/out/main.c:9242:26
    #3 0x7f3ed840dcc9 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26cc9)

SUMMARY: AddressSanitizer: 2 byte(s) leaked in 1 allocation(s).

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

The same way that this code using set!:

(defn main []
 (let-do [ok @"not ok"]
  (set! ok @"ok")
  (println* ok)))

Generate this:

int main(int argc, char** argv) {
    carp_init_globals(argc, argv);
    /* let */ {
        static String _6 = "not ok";
        String *_6_ref = &_6;
        String _7 = String_copy(_6_ref);
        String ok = _7;
        static String _13 = "ok";
        String *_13_ref = &_13;
        String _14 = String_copy(_13_ref);
        String_delete(ok); <--------------- Delete here
        ok = _14;  // String = String
        String _20 = StringCopy_str(ok);
        String* _21 = &_20; // ref
        IO_println(_21);
        String_delete(_20);
    }
    return 0;
}

You need to call destructors on heap allocated stuff when you replace them or you get a leak.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

I'm gonna read up on "destination passing", seems interesting 🙂

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

@TimDeve You are indeed correct about the sumtype case w/ managed interior values like String! Probably the same thing would happen in the stateful closure as well--the example I'm using works fine b/c Int is not a managed type:

Carp ∎ carp -x --log-memory sumtype-mutate.carp
(Just @"Mutated!")
Invalid memory balance: 1
[RUNTIME ERROR] '"out/Untitled"' exited with return value 1.

I was assuming the borrow checker would just handle the original heap allocated values...which makes no sense 😄

Maybe this pattern isn't as useful after all!

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

Could wrap it in something that consumes/borrows the value maybe?

Edit:
Seems to work...

(Debug.sanitize-addresses)

(defn up-to-eleven [val]
  (let [m* (the (Ptr (Maybe String)) (Unsafe.coerce val))
        _ (Pointer.to-value m*)]
    (Pointer.set m* (Maybe.Just @"11"))))

(defn main []
  (Debug.memory-logged
   (ignore (let-do [volume (Maybe.Just @"1")]
            (up-to-eleven &volume)
            volume))))

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

Could wrap it in something that consumes/borrows the value maybe?

It works if we get the value at the pointer and call delete just as you suggested!:

(defn mutate [m]                                                                                                                                                        
  (let [m* (the (Ptr (Maybe String)) (Unsafe.coerce m))]                                                                                                                
    (match-ref m                                                                                                                                                        
      _ (do (Maybe.delete (Pointer.to-value m*)) (Pointer.set m* (Maybe.Just @"Mutated!"))))))                                                                          
                                     
(defn main []                                                                                                                                                           
  (Debug.assert-balanced                                                                                                                                                
    (let-do [m (Maybe.Just @"Mutant")]                                                                                                                                  
      (mutate &m) (println* &m)))

This doesn't throw a memory balance error, but I haven't checked the emitted C to see if this really makes sense or not yet.

EDIT: Ohh nice, no need to call delete then? I guess the borrow checker handles that once we issue the bind.

Looks like we could define a macro around that.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

Looks like we could define a macro around that.

Yeah that's what I was about to say. Reminds me vaguely of the pair destructure macro I had, still need to make a version that work with structs...

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

Likewise, the closure needs to change slightly to handle heap allocated values:

(defn appender [init]                                                                                                                                                   
  (let [x init]                                                                                                                                                         
    (fn []                                                                                                                                                              
      (let-do [x* (the (Ptr String) (Unsafe.coerce x))]                                                                                                                 
        (Pointer.set x* (append x x)) (Pointer.to-value x*)))))                                                                                                         
                                                                                                                                                                        
(defn main []                                                                                                                                                           
  (Debug.assert-balanced                                                                                                                                                
    (let-do [f (appender "he")]                                                                                                                                         
      (ignore (f)) (println* (f))))

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

Looks like we could define a macro around that.

Yeah that's what I was about to say. Reminds me vaguely of the pair destructure macro I had, still need to make a version that work with structs...

The similarities are palpable! I wonder if there's a more generic name/terminology we can use for what we're accomplishing here? Roughly speaking, all these cases involve coercing a reference value in a parent scope to a pointer, manipulating it, and ensuring its previous allocations are freed properly.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

I guess we could rename our current set! => set because it acts on values and what we're discussing would be set! as it act on references.

Not sure if there is something we're missing here and if it's something desirable to have in core...

@eriksvedang has probably thought about that type of stuff before.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

I think set! still makes sense based on the way it is used in other languages, and it does return Unit. I think the key difference in this case is that this approach enables you to set! local bindings in a different scope/context, which is normally not permitted (expect for the special case of setting the fields of a product type using Foo-set-bar!).

The other option is just to change the compiler to allow calling set! on local bindings from an enclosing scope so that something like:

(defn setter [] 
  (let [x 0] 
    (fn [] (set! x (inc x)))))

Would be valid (it currently isn't allowed).

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

We could leave the compiler untouched and implement this as some kind of indefinite-let or extended-let macro (using terminology from Let Over Lambda)

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

I'm curious about how that would collide with stack allocated lambdas (which is something I really want).

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

I'm curious about how that would collide with stack allocated lambdas (which is something I really want).

Good question. I think it should be ok since the lambda will be in scope so long as the enclosing let is in scope. Actually, at that point we might not even need a mechanism like this...I'm not sure.

We could also call this Unsafe.set! I guess.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

@scolsen This is a safe way to do a stateful closure:

(deftype (Box a) [val a])

(defn counter [init]
  (let [x (Box.init init)]
    (fn []
      (do
        (Box.set-val! &x (inc @(Box.val &x)))
        @(Box.val &x)))))

(defn main []
  (let-do [f (counter 0)]
    (ignore (f))
    (ignore (f))
    (println* (f)))) ; Prints 3

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

@scolsen This is a safe way to do a stateful closure:

(deftype (Box a) [val a])

(defn counter [init]
  (let [x (Box.init init)]
    (fn []
      (do
        (Box.set-val! &x (inc @(Box.val &x)))
        @(Box.val &x)))))

(defn main []
  (let-do [f (counter 0)]
    (ignore (f))
    (ignore (f))
    (println* (f)))) ; Prints 3

Very nice! Simple and elegant. So there's no need for all this pointer mumbo jumbo. I'm trying to think if there's any cases in which using a Ptr instead is advantageous, but I think not? Probably we can document the unsafe versions somewhere for the curious and just document this safe version in patterns.md?

I guess the one advantage to the Pointer version is that it avoids one copy on each call (I think, but maybe not?)--but I think cases in which that matters would be pretty rare.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

Yeah, I was gonna say this one does a lot of copies, it's only a problem with arrays really.

@jacereda
Copy link
Contributor

Yeah, I was gonna say this one does a lot of copies, it's only a problem with arrays really.

Not in this particular case:

0000000100013e06	movl	$0x3, %r8d
0000000100013e0c	xorl	%eax, %eax
0000000100013e0e	movq	%rbx, %rdi
0000000100013e11	movq	%r14, %rcx
0000000100013e14	callq	0x100013e76 ## symbol stub for: ___sprintf_chk

The compiler simply reduced it to the literal 3.

@eriksvedang
Copy link
Collaborator

eriksvedang commented Nov 30, 2020

Very interesting experiments and discussion...

The reason lambdas are not allowed to mutate is mainly because they don't capture a shared environment, so two lambdas in the same scope each get their own env struct. Which makes mutation behave in a quite weird way, of course (one lambda can't see the changes from the other one.)

The Box trick works around the existing checks anyways so I'm not sure how I feel about it, but it's quite elegant (and ingenious) to be sure 😃

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

one name for the pattern:

(let [p* (the (Ptr a) (Unsafe.coerce thing))
        _ (Pointer.to-value p*)]
  (Pointer.set p* other-thing))

could be set-free! :P (liberation for all the values)

@jacereda
Copy link
Contributor

I think the first example would be simpler as:

(defn foo []
  (let [x 1]
    (fn [y]
      (do
        (Pointer.set (address x) y)
        x))))

(defn main []
 (println* ((foo) 2)))

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

I think the first example would be simpler as:

(defn foo []
  (let [x 1]
    (fn [y]
      (do
        (Pointer.set (address x) y)
        x))))

(defn main []
 (println* ((foo) 2)))

Along the same vein the stateful counter example can be simplified as:

(defn counter [init]                                                                                                                                                                                                                                                            
  (let [x init]                                                                                                                                                                                                                                                                 
    (fn []                                                                                                                                                                                                                                                                      
        (do (Pointer.set (address x) (inc x)) x))))                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                
(defn main []                                                                                                                                                                                                                                                                   
  (let-do [f (counter 0)]                                                                                                                                                                                                                                                       
    (ignore (f)) (ignore (f)) (ignore (f)) (println* (f)))) 

I could've sworn I originally tried this using address and no coercion, but I thought it didn't work somehow. I must have just done it incorrectly.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 30, 2020

It's generally unsafe to return a captured variable because you'll potentially get double frees, we discussed it in the other conversation, ideally the compiler would not allow it.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

It's generally unsafe to return a captured variable because you'll potentially get double frees, we discussed it in the other conversation.

Ahhh that's right--it's fine for the Int case since it's an unmanaged type.

@TimDeve
Copy link
Contributor

TimDeve commented Dec 1, 2020

I actually don't remember in which thread we discussed it in, was it just between you, Erik and me on Gitter? Did we ever create an issue?

@scolsen
Copy link
Contributor Author

scolsen commented Dec 1, 2020

I think we chatted about it in the Pointer utility PR #1012 and a bit on gitter, there's also #997 -- but it'd probably be good to have a more direct issue that explains it more succinctly w/ a simpler reproducible case.

@TimDeve
Copy link
Contributor

TimDeve commented Dec 1, 2020

Ok, I'll create an issue in the morning so we track it.

@scolsen
Copy link
Contributor Author

scolsen commented Dec 1, 2020

Ok, I'll create an issue in the morning so we track it.

Thanks!

@eriksvedang
Copy link
Collaborator

Is this ready to be merged?

@TimDeve
Copy link
Contributor

TimDeve commented Dec 6, 2020

We talked about putting it somewhere else as it is unsafe.

@scolsen
Copy link
Contributor Author

scolsen commented Dec 7, 2020

We talked about putting it somewhere else as it is unsafe.

I started working toward this tonight--I'll likely have something by the end of this week!

@eriksvedang
Copy link
Collaborator

What should we do about this PR @scolsen ? :)

@eriksvedang
Copy link
Collaborator

I'll close this for now, please re-open any time.

@eriksvedang eriksvedang closed this Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants