-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocation strategies for vector creation #388
Comments
@Shimuuar I guess from your comment in #301 that, parhaps, you intend to assign |
Yes but I don't think this is a problem. It's hint for size of underlying buffer which could be larger than array. |
If we add a new field Data.Vector.Fusion.Bundle.length Bundle{ sSize = Exact n } = n What if someone relies on it? |
Oh no. We have Generic.length = Bundle.length . Generic.stream That is, in order to calculate in constant time the length of an actually heap-allocated vector, we rely on the fact that |
You're right. That's bad on one hand but isn't big problem. We just need to make hint more precise. It could be solved by adding one more constructor: ...
| Exact Int -- Vector will have exactly this length
| Preallocate Int -- Allocate buffer of this size And is we extend size hints with lower bounds on size bounds it could be written as |
@Shimuuar Correct. I think we agree on the fact that we can't keep the type data Size
= Exact { length :: Int } -- exact size
| Preallocate { allocatedMax :: Int } -- Allocate the maximum possible size
| Max { max :: Int } -- firstly zero-allocate and do doubling till max
| Unknown -- unbounded doubling I proposed one solution in #301: data Size
= Exact { length :: Int }
| Max { preAlloc :: Int, max :: Int }
| Unknown { preAlloc :: Int } We could even try to make everything seem backwards compatible: data Size
= Exact { length :: Int }
| DoublingWithMax { preAlloc :: Int, max :: Int }
| DoublingUnbounded { preAlloc :: Int }
pattern Max n <- DoublingWithMax _ n
where Max n = DoublingWithMax 0 n
pattern Unknown <- DoublingWithUnbounded _
where Unknown = DoublingWithUnbounded 0
{-# COMPLETE Exact, Max, Unknown #-} Another reason why I'm proposing solution with minimal bound is the definition of |
Oh, sorry. Bundled pattern synonyms isn't there on GHC 7.10. So we can't make it seem backwards-compatible. |
I tried following size hint data Size = Size
{ lowerBound :: !Int
, upperBound :: !SizeHint
} It seems to work reasonably well. But yes. It's breaking change |
It turns out we don't exercise munstream in the test suite at all. (Easy check is to replace definition with undefined and run test) This is to check equivalence of all variants. This is necessary for any changes to unstream machinery. Such as ones that discussed in haskell#301, haskell#388, haskell#406
It turns out we don't exercise munstream in the test suite at all. (Easy check is to replace definition with undefined and run test) This is to check equivalence of all variants. This is necessary for any changes to unstream machinery. Such as ones that discussed in haskell#301, haskell#388, haskell#406
So after playing a bit with this I converged on following very simple design for vector size hint: data Size = Size
{ lowerBound :: !Int
, upperBound :: !Int
} For every stream we have estimate for lower and upper bound on vector size. We can use Unstreaming strategy should be simple as well: start from vector with |
Sounds reasonable to me. |
Related to #406, doesn't it break the following?: drop maxBound
$ (`unfoldr` (0::Int))
$ \x -> if x < 0 then Nothing else Just ((), x+1) We could additionally declare that any vector longer than Also, we will need to have a flag |
Main problem is streams could be of any size and vectors couldn't be longer then And yes we may want to add |
Uh, I think you ...um... nailed it right. So |
Sorry guys, I had a newborn son just a week after all of you had this discussion last time. Since then the life has been a roller coaster. 😉 Anyways, I think I like this idea of having a region lower+upper bound for the size estimate. It will be quite a bit of work though and it is indeed a breaking change. @Shimuuar I am not sure how far along have you gotten with this, but considering it has been half a year I don't suspect we'll have it done within a month or so, right? Reason why I am bringing this up is because I'd like to have a release done in two weeks time. See my #357 (comment) |
This is much more precise encoding with both lower and upper bound. It implements idea discussed in haskell#388 and for example avoids problems from haskell#301. However benchmarks result are at best mixed: benchmarks change range from 0.75 to 17. Investigation of tridiag benchmark (it's not worst but one of simplest) showed that main loop retained Bundles, allocated closures in inner loop and so were quite slow. It seems that generation of tight loops from vector functions is rather fragile and what worse we have no way to know whether this problem exists for code in the wild and have no way to measure this.
Currently we have following size hints in bundle:
however buffer allocation for vector has only two strategies: doubling for unknown and exact allocation for both
Exact
andMax
. I think we should have three: unbounded doubling, doubling with bound for Max, and preallocation for ExactP.S. It was also proposed to add lower bound to discussion
The text was updated successfully, but these errors were encountered: