Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxed Vectors can crash during compaction #220

Closed
bgamari opened this issue Aug 27, 2018 · 24 comments
Closed

Boxed Vectors can crash during compaction #220

bgamari opened this issue Aug 27, 2018 · 24 comments

Comments

@bgamari
Copy link
Contributor

bgamari commented Aug 27, 2018

Consider the program

import Data.Compact
import Data.Vector as V
 
main :: IO ()
main = do
    compactVec <- compact myVector
    pure ()
 
myVector :: Vector Int
myVector = fromList [1..10]
 

Currently it crashes with,

compact-vector: Data.Vector.Mutable: uninitialised element

Since the (over-sized) accumulator used by fromList is initially filled with bottoms but never shrunk. The compactor consequently traces these bottoms, blowing up the program.

We should try harder to shrink the accumulator down to the proper size after we have finished building the vector

@cartazio
Copy link
Contributor

cartazio commented Aug 28, 2018 via email

@recursion-ninja
Copy link
Contributor

Since the traverse implementation of Data.Vector calls fromList internally, it also causes compact region failures. This makes zipping vectors in parallel before making a compact region impossible because Control.Parallel.Strategies relies heavily on the Traversable typeclass for interacting with structures in parallel.

Can we prioritize fixing this defect?

@recursion-ninja
Copy link
Contributor

See #221 for a pull request addressing the traverse function.

@cartazio
Copy link
Contributor

cartazio commented Sep 26, 2018 via email

cartazio pushed a commit that referenced this issue Oct 9, 2018
…oxed vectors. (gh pr #221, fixes bug #220)

previously traverse used unknown size fromList rather fromListN for constructing Boxed vectors.
In the presence of compact regions the implementation strategy for fromList results in program crashes. Now traverse on Boxed vectors uses the input vector size for constructing the result vector.
cartazio pushed a commit that referenced this issue Nov 14, 2018
…oxed vectors. (gh pr #221, fixes bug #220)

previously traverse used unknown size fromList rather fromListN for constructing Boxed vectors.
In the presence of compact regions the implementation strategy for fromList results in program crashes. Now traverse on Boxed vectors uses the input vector size for constructing the result vector.
@cartazio
Copy link
Contributor

cartazio commented Dec 7, 2018

this should be fixed now

@cartazio cartazio closed this as completed Dec 7, 2018
@alexwl
Copy link
Contributor

alexwl commented Dec 10, 2018

The commit ccf2260 doesn't fix the problem with compaction. The crash is still happening.

For example, this program (ghc-8.4.4, vector-0.12.0.2):

module Main where

import qualified Data.Vector as V
import GHC.Compact

main :: IO ()
main = do
  c <- compact $ V.fromList [1, 2, 3]
  -- c <- compact $ V.fromListN 3 [1,2,3] -- OK
  -- c <- compact $ V.force $ V.fromList [1,2,3] -- OK
  -- c <- compact $ V.fromList $ replicate (2^3) 1 -- OK
  -- c <- compact $ V.fromList $ replicate (2^5) 1 -- OK
  -- c <- compact $ V.fromList $ replicate (2^10) 1 -- OK
  -- c <- compact $ V.fromList $ replicate (2^3+1) 1 -- crashes
  -- c <- compact $ V.fromList $ replicate (2^5+1) 1 -- crashes
  -- c <- compact $ V.fromList $ replicate (2^10+1) 1 -- crashes
  pure ()

crashes with uninitialised element exception:

*** Exception: Data.Vector.Mutable: uninitialised element
CallStack (from HasCallStack):
  error, called at ./Data/Vector/Mutable.hs:188:17 in vector-0.12.0.2-4IpdnxtqTfNJ9xEZNSAM2c:Data.Vector.Mutable

It doesn't crash if

  • a vector is created using V.fromListN function
  • V.force function is called
  • the length of a vector is a power of 2

The problem is that the compaction of a Vector causes an underlying boxed Array to be fully evaluated. This Array may contain bottoms.

V.fromList function assumes that the maximum size of the stream is unknown:

fromList :: Monad m => [a] -> Bundle m v a
{-# INLINE fromList #-}
fromList xs = unsafeFromList Unknown xs

as a result, the underlying Array grows exponentially (2x):
-- | Create a new mutable vector and fill it with elements from the monadic
-- stream. The vector will grow exponentially if the maximum size of the stream
-- is unknown.
vmunstream :: (PrimMonad m, V.Vector v a)
=> MBundle m v a -> m (V.Mutable v (PrimState m) a)
{-# INLINE_FUSED vmunstream #-}
vmunstream s = case upperBound (MBundle.size s) of
Just n -> vmunstreamMax s n
Nothing -> vmunstreamUnknown s

and new elements are filled with error "Data.Vector.Mutable: uninitialised element":
basicUnsafeNew n
= do
arr <- newArray n uninitialised
return (MVector 0 n arr)

@cartazio
Copy link
Contributor

cartazio commented Dec 10, 2018 via email

@cartazio
Copy link
Contributor

cartazio commented Dec 10, 2018 via email

@alexwl
Copy link
Contributor

alexwl commented Dec 10, 2018

I think the biggest problem here is an unhelpful error message: "Data.Vector.Mutable: uninitialised element" (the only way to understand it is to inspect the code of vector library).

For example, when I try to compact a function, the error message is pretty clear:

λ compact (\x -> x)
*** Exception: compaction failed: cannot compact functions

The error message from vector library can be more descriptive, something like this: "Data.Vector.Mutable: uninitialised element. Direct access to the underlying Array is an unsafe operation. Use the force function to remove uninitialised elements from the Array."

@cartazio
Copy link
Contributor

cartazio commented Dec 10, 2018 via email

@alexwl
Copy link
Contributor

alexwl commented Dec 10, 2018

The error "Data.Vector.Mutable: uninitialised element" is coming from the error closure in vector (from an element in an Array).

The compact function fully evaluates its argument before compaction. If there is a bottom value anywhere inside the data structure, then an exception is raised :

λ compact [1,error "message",2]
*** Exception: message

λ compact $ V.fromList [1,error "message",2]
*** Exception: message

alexwl added a commit to alexwl/vector that referenced this issue Dec 12, 2018
…ement' error message

Compaction of an immutable vector may crash with the confusing 'uninitialised element' error message (haskell#220). The workaround is to use the 'force' function.
@recursion-ninja
Copy link
Contributor

recursion-ninja commented Jul 5, 2019

This issue is not fixed!

We hit this issue again.

It took a week of developer time to track down the source. it was a call to fromList that was later used as an argument to compact.


  1. Is it possible for fromList to correctly clean up after the dynamic allocation?
    • This would correct the defect.

  1. Is it possible for a more descriptive error message to be raised when this occurs?
    • This would minimize lost developer time when the defect arises.

  1. Is it possible for the documentation on Hackage to draw explicit attention to this defect and discourage the use of fromList in code where you might invoke compact or the consumer of your library might invoke compact?
    • This would give developers the knowledge to make the Hackage ecosystem more compatible.

@cartazio
Copy link
Contributor

cartazio commented Jul 5, 2019 via email

@cartazio
Copy link
Contributor

cartazio commented Jul 5, 2019 via email

@Boarders
Copy link
Contributor

Boarders commented Jul 9, 2019

One important aspect to this issue is that building with profiling does not help! In particular this error is only hit when building without profiling. I think this is due to the interaction between compact-regions, the garbage collector and stack traces but @bgamari probably has a much better understanding of why this happens. In any case this makes the need for better error messages quite important since it is not possible to get a stack trace to see where the error message was created.

@andrewthad
Copy link
Contributor

The best fix for this would be for someone to implement https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst. This proposal was accepted a while ago, but no one has added the new primop for shrinking arrays yet. If vector used this, there would be no performance penalty, no stream-fusion breakage, and compact regions would work correctly since they would only copy the "live" part of the array.

@recursion-ninja
Copy link
Contributor

recursion-ninja commented Jul 9, 2019

@cartazio @andrewthad, I'm also concerned about good performance and stream fusion. I don't want correcting this interaction with compact regions to negatively impact other users if the vector package.

I have a bandaid solution in place using a custom HLint rule to warn about using fromList instead of fromListN. This keeps call to fromList out of our code (mostly). However, it doesn't address when code from a library we depend on calls fromList to produce a value, which summarily breaks our application on any call to compact with said value. Hence, it's not always possible to (efficiently) sanitize your input as a compact regions user.

I don't know enough about the stream fusion internals of vector to say whether or not an allocation clean up after fromList reaches the end of the input list would negatively impact said fusion. This thinking naively and mutably, it doesn't seem that the following algorithm would be incompatible with stream fusion:

  1. Allocate n elements of space and set the vector's length to n (set n equal to something reasonable, perhaps 8)
  2. Check if you are not at the end of the list. If you are, go to 6.
  3. Check how many elements you have consumed from the list, if it is equal to the vector's length, double the vector's length and memory.
  4. Consume the element at the head of the list and place it the in the vector.
  5. Increment the vector's length.
  6. Go to 1, with the tail of the list.
  7. Do something similar to C's realloc function to truncate the vector's memory so it is equal to it's length.

I think that step 6 is what @andrewthad is suggesting as an option made available from implementing the linked GHC proposal. That is certainly the best, most efficient option if my understanding is correct. However, getting the new functionality into base and updating fromList to utilize the new functionality is a long way off. I'm hoping we can explore more immediate solutions by changing in code the vector package that don't introduce any performance regressions.

@recursion-ninja
Copy link
Contributor

Also, can we reopen this issue?

@cartazio
Copy link
Contributor

cartazio commented Jul 9, 2019 via email

@cartazio
Copy link
Contributor

cartazio commented Jul 9, 2019 via email

@cartazio
Copy link
Contributor

cartazio commented Jul 9, 2019 via email

@recursion-ninja
Copy link
Contributor

recursion-ninja commented Jul 9, 2019

The real issue is that the error handling logic for the compact heap code in the ghc rts and base libs does not do correct error handling.

@cartazio given that the vector package's fromList function exposes a very problematic symptom of the GHC RTS issue, do you recommend linking to this issue in compact and/or ghc so the more general issue can also be resolved, not just the vector package's specific fromList manifestation?

@cartazio
Copy link
Contributor

cartazio commented Jul 9, 2019

yes. its both? I think the compact package just wraps up the stuff in GHC proper. so its a error handling/debugugging support. crashing is fine for haskell stuff, but if its HQ owned stuff, it shouldn't break everyones debuggin.

i digged into what fromList does in vector, and i think the fix (if applicable) would be pretty tricky at the vector layer

-- Bundle.Monadic

-- | Convert the first @n@ elements of a list to a 'Bundle'
fromListN :: Monad m => Int -> [a] -> Bundle m v a
{-# INLINE_FUSED fromListN #-}
fromListN n xs = fromStream (S.fromListN n xs) (Max (delay_inline max n 0))

fromList :: Monad m => [a] -> Bundle m v a
{-# INLINE fromList #-}
fromList xs = unsafeFromList Unknown xs
-- | Convert a list to a 'Bundle' with the given 'Size' hint.
unsafeFromList :: Monad m => Size -> [a] -> Bundle m v a
{-# INLINE_FUSED unsafeFromList #-}
unsafeFromList sz xs = fromStream (S.fromList xs) sz


--
--- which is then used in Bundle
-- | Create a 'Bundle' from a list
fromList :: [a] -> Bundle v a
{-# INLINE fromList #-}
fromList = M.fromList

-- which is then used in Vector.Generic

-- | /O(n)/ Convert a list to a vector
fromList :: Vector v a => [a] -> v a
{-# INLINE fromList #-}
fromList = unstream . Bundle.fromList



-- whjich is then used in Vector 
-- | /O(n)/ Convert a list to a vector
fromList :: [a] -> Vector a
{-# INLINE fromList #-}
fromList = G.fromList

as you can see, unstream is defined in terms of New.unstream, which is quite central


-- | /O(n)/ Construct a vector from a 'Bundle'
unstream :: Vector v a => Bundle v a -> v a
{-# INLINE unstream #-}
unstream s = new (New.unstream s)

{-# RULES

"stream/unstream [Vector]" forall s.
  stream (new (New.unstream s)) = s

"New.unstream/stream [Vector]" forall v.
  New.unstream (stream v) = clone v

"clone/new [Vector]" forall p.
  clone (new p) = p

"inplace [Vector]"
  forall (f :: forall m. Monad m => Stream m a -> Stream m a) g m.
  New.unstream (inplace f g (stream (new m))) = New.transform f g m

"uninplace [Vector]"
  forall (f :: forall m. Monad m => Stream m a -> Stream m a) g m.
  stream (new (New.transform f g m)) = inplace f g (stream (new m))  #-}


it seems that messing with unstream interacts pretty funadmentally with some of the fusion rules. so i'd rather we first improve debugging support first

@enobayram
Copy link

We've just been bitten by this as well and could only figure out what's going on after many many hours of debugging. The fact that something like this can happen means compaction and vector can't be used in production software together.

cartazio pushed a commit that referenced this issue Jan 16, 2020
…ement' error message (#231)

Compaction of an immutable vector may crash with the confusing 'uninitialised element' error message (#220). The workaround is to use the 'force' function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants