Thoughts on value equality being fast, and data structure mutation being memory efficient in Val. #728
-
I like the look of Val very much and I am very interested to start experimenting with it. I am very familiar with Swift but day to day I mainly use a combination of C++ and Typescript, and also an awful lot of React including for 3D graphics using React Three Fiber. Wherever possible I try to apply the principles of Value Semantics to my code and I find it compliments React especially well. In React there is a Reconciler which is constantly traversing a tree of Components where for each it needs to quickly check if its Props or Hook Dependencies have changed, in which case the component needs to re-rendered. I wanted to share some thoughts about Value Semantics in Typescript (and React) versus Val. In Typescript to implement Value Semantics you end up creating Immutable data structures where any of the mutating methods must return (a reference to) a new data structure with the mutation applied*. The original data structure is unaltered. Furthermore since a mutating method on a data structure typically only mutates a portion of the data structure this can be much more memory efficient than you would first assume because most of the content of the data structure can simply be references to existing immutable data. In this way Object Instance Equality (=== in Typescript) becomes a proxy to Value Equality. This works well with the React Reconciler because this is the kind of equality it uses for each Prop and Hook Dependency to decide if something has changed for it to React to. In Typescript Object Instance Equality is equivalent to a Location in memory. Therefore I shall refer to this implementation pattern as Location Proxy Value Semantics (LPVS). I have intentionally described this as a proxy to Value Equality rather than equivalent because you could have two data structures with equal value at different locations and this pattern would not recognize that they had equal value (LPVS limitation A). This pattern however has the advantage that checking value equality of data structures (which the React Reconciler is doing constantly for the Props and Hook Dependencies) becomes very quick because you don't have to traverse all the data you can just check its location (LPVS advantage A). Another advantage of LPVS is that mutating a large data structure can be memory efficient because you only need allocate additional memory for the (typically small) portion that has changed (LPVS advantage B). I can see how Val's Mutable Value Semantics (MVS) would address LPVS limitation A because it would just check whether the bytes of two data structures were the same and not care about where the bytes were located. I am curious how Val deals with LPVS advantages A and B, the advantages of which become more apparent as data structures increase in size? In regard to LPVS advantage A one could imagine an opt in value_hash member on all Val Type instances that could be populated at initialization time and which could be used once data structures reach above a certain size in order to provide a faster Value Equality comparison. In regard to LPVS advantage B one could imagine that this value_hash member could also be used as a cache key for shared memory on the Heap. * This is also how you would do it in Java. Some languages exclusively behave this way eg. Haskell. |
Beta Was this translation helpful? Give feedback.
Replies: 0 comments 2 replies
-
HI @RMJay! I believe that approach you're referring to is how one implement persistent data structures. Implementations of such data structures typically rely on automatic garbage collection because the There are costs to garbage collection that we don't want to pay in Val. Since we're aiming for a zero-cost abstraction language with transparent performance guarantees, we can't afford to let a garbage collector "stop the world" anytime it wants to free memory. A more acceptable approach would be to use reference counting, noting that there still are costs at runtime to read/modify an atomic counter. One should also factor in the cost of synthesizing the actual value of a particular version of the data structure if fat pointers are used instead of copy-on-write. There exist various techniques to alleviate this problem, including optimizations ahead of time, but none of them can guarantee to get as fast as accessing an element in a struct or contiguous array. It's interesting to note that the collections of Swift's standard library use copy-on-write. They address limitation A by comparing actual contents if and only if storage pointers are different, meaning that equality can take the fast path quite often in application with few mutations. We ran some benchmarks to observe the performance of this approach w.r.t. MVS (http://www.jot.fm/issues/issue_2022_02/article2.pdf) Whether or not we should adopt the same approach for Val is still an open question. If you asked me now, I'd say that we should default to copy-on-write with an opt-out. I might give a different answer after we'll be able to run benchmarks. |
Beta Was this translation helpful? Give feedback.
HI @RMJay!
I believe that approach you're referring to is how one implement persistent data structures.
Implementations of such data structures typically rely on automatic garbage collection because the
n+1
-th version of a value needs access to then
-th version, regardless of where it was created. That's not a problem in a language like Javascript since the runtime will take care of all dynamic allocations anyway. However, in a lower-level language like Val (or C++), one has to worry about memory collection.There are costs to garbage collection that we don't want to pay in Val. Since we're aiming for a zero-cost abstraction language with transparent performance guarantees, we can't affor…