Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of grow/unsafeGrow #36

Closed
snoyberg opened this issue Jul 31, 2014 · 8 comments
Closed

Semantics of grow/unsafeGrow #36

snoyberg opened this issue Jul 31, 2014 · 8 comments
Assignees
Milestone

Comments

@snoyberg
Copy link

I'd like to clarify the semantics of grow and unsafeGrow, and hopefully update the documentation appropriately. From looking at the code for boxed, unboxed, and storable vectors, it seems that in all cases, grow/unsafeGrow will create a new mutable vector and leave the original mutable vector unchanged. The upshot would be that it would be valid to modify the original vector without affecting the new vector, and vice-versa. Two questions:

  1. Have I misread the code, and in fact that's not the way the current three vector types work?
  2. Do we want to codify this contract in the documentation as a promise for future behavior?
@vincenthz
Copy link

I think this is correct. Having looked at the storable and unboxed vectors couple of weeks ago, this is what I read too. If the documentation is improved as a result, it's probably a good idea to explicitly document that the old vector is copied to the new vector, and this is not a 'realloc' type operation. I think I'm not the only one to think grow = realloc.

@cartazio
Copy link
Contributor

I think the intended semantics are that it may do a realloc, but not guaranteed to, and all the current implementations do the simpler copying semantics because for on heap allocations the cost should be roughly the same.

@snoyberg
Copy link
Author

snoyberg commented Aug 1, 2014

@cartazio If I'm reading you correctly, the correct docs would be:

Even though right now, grow will always leave the original mutable vector intact and unmodified, this is behavior that may change in the future, and should not be relied upon. If you need such functionality, you should instead create a new vector and then copy the data from the old vector to the new one.

@cartazio
Copy link
Contributor

cartazio commented Aug 1, 2014

i'm not sure if thats quite true. I'll try to take a look with a few folks at hac_bos this weekend. I believe the motivation for the *grow operations lies in Stream fusion in the case when the stream has unknown size, and any choice in semantics should reflect that.

@jstimpfle
Copy link

jstimpfle commented Jul 20, 2016

Another perspective: the name grow is misleading. Mutable vectors are implemented using a bare MutableArray instead of a mutable reference thereof. So they couldn't grow in-place in case realloc needed to move.

https://hackage.haskell.org/package/vector-0.11.0.0/docs/src/Data-Vector-Mutable.html#MVector

To clarify: currently the interface is grow :: Vector -> Int -> IO Vector but much more sensible would be grow :: Vector -> Int -> IO () (types simplified).

There are probably good reasons (which I don't understand) for not using a mutable reference. But since in-place grow can't be supported that way, another suggestion would be to just remove the interface altogether and make another data type (C++ std::vector style) which can support growing.

@dolio
Copy link
Contributor

dolio commented Jul 20, 2016

I don't really know the original intention of grow.

However, there is a notion of growing that does not mutate the original vector, but also does not mean that the new vector can be used without affecting the old one (perhaps only in the 'unsafe' case).

It is the opposite of slicing. When you slice a mutable vector, you just adjust the bounds, but keep the same underlying array, so writes to one can affect (pieces of) the other. In principle, a vector type could over-allocate the underlying mutable array, and allow growing by just adjusting the bounds. Nothing we have does this, but it could be done, and it has the signature of grow.

@sgraf812
Copy link

sgraf812 commented Sep 29, 2017

Edit: I just realized the below assumes an implementation which reallocs in-place.

I'm quite late to the game, but there is at least one other reason why grow returns a new MVector reference: It's because the old reference doesn't contain the updated length!

#!/usr/bin/env stack
-- stack --resolver lts-9.6 script --package vector

import Data.Vector.Mutable (IOVector)
import qualified Data.Vector.Mutable as Vector

main = do
  v1 <- Vector.new 1
  print (Vector.length v1)
  v2 <- Vector.grow v1 1
  print (Vector.length v1)
  print (Vector.length v2)

In other words, this program prints

$ stack grow.hs
1
1
2

That's because instead of modeling the start and end indices in MVector as mutable variables, too, they are actually regurlar, unboxed Ints. I suspect the reason for this is that Vector.length and Vector.slice and by extension Vector.tail, etc. should remain pure. Also from the plain perspective of efficiency, unboxed Ints are probably preferrable to mutable references to boxed Ints. So, if there were mutable struct fields (there was some HIW talk about that), we should seriously consider using them here.

There's another perspective to this: We can't allow a shrink operation or negative arguments to grow, as that would invalidate all old references which still store the old length.

TLDR; Even if the implementation used realloc, we still need to return a new reference from grow and make sure that we don't use the old one any more.

@lehins
Copy link
Contributor

lehins commented Jan 17, 2021

Anyone involved in this issue cares to review a PR that clarifies those grow semantics, feel free to comment: #360

It only took 6 and a half years to get to and 30 min to implement 😃

@lehins lehins closed this as completed in 467eac0 Jan 20, 2021
lehins added a commit that referenced this issue Jan 20, 2021
Improve haddock of `grow` functions and clarify their semantics. Fix #36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants