Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults and "internal errors" when vector slice overflows #257

Closed
bjornbm opened this issue Sep 24, 2019 · 15 comments · Fixed by #279
Closed

Segfaults and "internal errors" when vector slice overflows #257

bjornbm opened this issue Sep 24, 2019 · 15 comments · Fixed by #279
Labels

Comments

@bjornbm
Copy link

bjornbm commented Sep 24, 2019

I reported this for GHC first. See https://gitlab.haskell.org/ghc/ghc/issues/17233 for full details (could be copied here if clearly a bug in vector). May be related to #188?

@snoyberg
Copy link

Looks to me like the bug is here:

checkSlice :: String -> Int -> Checks -> String -> Int -> Int -> Int -> a -> a
{-# INLINE checkSlice #-}
checkSlice file line kind loc i m n x
= check file line kind loc (checkSlice_msg i m n)
(i >= 0 && m >= 0 && i+m <= n) x

Specifically, i+m <= n is subject to integer overflow. Instead, something like m <= n - i should be applied.

lehins added a commit that referenced this issue Jan 30, 2020
Fix the test case to trigger the sliceCheck
@lehins
Copy link
Contributor

lehins commented Jan 30, 2020

This is great, how many ways can we get the slice wrong? :)

Surprisingly, bug reported here is pretty hard to hit this in a slice function. The uncommon ways you can trigger it is by compiling without optimizations, eg. ghci (how this bug was reported) or by constructing a vector by manually allocating it.

Reason for this is that if a vector is constructed from a stream, or any other way really, this rewrite rule will get triggered and sliceCheck will no longer get triggered!

"slice/new [Vector]" forall i n p.
  slice i n (new p) = new (New.slice i n p)

So, if you put something fun like this instead into slice.hs file:

main = do
  let xs = [1, 2, 3, 4, 5] :: [Int]
      v = V.slice 1 (maxBound `div` 8) (V.fromList xs)
  print $ V.length v

You'll get another very nice issue:

$ stack exec -- ghc slice.hs -O1 && ./slice
[1 of 1] Compiling Main             ( slice.hs, slice.o ) [Optimisation flags changed]
Linking slice ...
slice: Out of memory

But what is even greater consequence of this, is that too large of the size parameter to slice can cause it to fail or not, depending on how the vector was constructed!

main = do
  let xs = [1, 2, 3, 4, 5] :: [Int]
      mySlice = V.slice 1 8
  v <- do
      mv <- MV.new $ List.length xs
      M.mapM_ (uncurry (MV.write mv)) $ List.zip [0..] xs
      V.freeze mv
  print $ mySlice $ V.fromList xs
  print $ mySlice v

This results in:

$ stack exec -- ghc slice.hs -O1 && ./slice
[1 of 1] Compiling Main             ( slice.hs, slice.o ) [Optimisation flags changed]
Linking slice ...
[2,3,4,5]
slice: ./Data/Vector/Generic.hs:396 (slice): invalid slice (1,8,5)
CallStack (from HasCallStack):
  error, called at ./Data/Vector/Internal/Check.hs:87:5 in vector-0.12.0.3-LfvlcMFJAcY18uD1Y2O5Ig:Data.Vector.Internal.Check

I got a fix partially implemented for the original issue: WIP in https://github.com/haskell/vector/tree/lehins/257-fix-slice-overflow

The follow up problem I described has to do with drop from bundle, which I still need some tests for and a proper fix.

@lehins
Copy link
Contributor

lehins commented Jan 30, 2020

I guess a serious question we collectively should answer, which semantics should we keep?

slice 10 k [0..k] == [10..k]

or

slice 10 k [0..k] == error

Reason why I bring it up is because it seems the former might be more prevalent in the wild and some people might even rely on it. I am slightly leaning towards it, since it reduces number of partial functions and sort of follows drop and take philosophy from stream:

slice :: Monad m => Int -- ^ starting index
-> Int -- ^ length
-> Stream m a
-> Stream m a
{-# INLINE slice #-}
slice i n s = take n (drop i s)

@lehins lehins added the bug label Jan 30, 2020
@cartazio
Copy link
Contributor

cartazio commented Jan 30, 2020 via email

@cartazio
Copy link
Contributor

cartazio commented Jan 30, 2020 via email

@lehins
Copy link
Contributor

lehins commented Jan 30, 2020

For the slow here is what I suggest. (just kidding ;)

Make slice a total function instead of a current mess of potential partiality:

==================================================
slice (List): [1,2,3,4,5]
   normal: [2,3,4]
   negative ix: [1,2]
   negative size: []
   negative ix and size: []
   too large ix: []
   too large size: [3,4,5]
   too large ix size: []
==================================================
slice (Vector proposed): [1,2,3,4,5]
   normal: [2,3,4]
   negative ix: [1,2]
   negative size: []
   negative ix and size: []
   too large ix: []
   too large size: [3,4,5]
   too large ix size: []
==================================================
slice (Vector.Boxed current - fused): [1,2,3,4,5]
   normal: [2,3,4]
   negative ix: [1,2]
   negative size: []
   negative ix and size: []
   too large ix: []
   too large size: [3,4,5]
   too large ix size: []
==================================================
slice (Vector.Primitive current - fused): [1,2,3,4,5]
   normal: [2,3,4]
   negative ix: [1,2]
   negative size: 
Primitive.basicUnsafeNew: negative length: -2
   negative ix and size: 
Primitive.basicUnsafeNew: negative length: -1
   too large ix: []
   too large size: [3,4,5]
   too large ix size: []
==================================================
slice (Vector current - unfused): [1,2,3,4,5]
   normal: [2,3,4]
   negative ix: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (-2,2,5)
   negative size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (2,-2,5)
   negative ix and size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (-2,-1,5)
   too large ix: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (6,2,5)
   too large size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (2,6,5)
   too large ix size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (6,6,5)

Code to reproduce above output:

{-# LANGUAGE LambdaCase #-}
module Main where

import Control.Exception
import Control.Monad as M
import Data.List as List
import qualified Data.Vector as V
import qualified Data.Vector.Generic as VG
import qualified Data.Vector.Primitive as VP

sliceList :: Int -> Int -> [a] -> [a]
sliceList i n xs = List.take n (List.drop i xs)

sliceVectorProposed :: VG.Vector v a => Int -> Int -> v a -> v a
sliceVectorProposed i n xs = VG.take n (VG.drop i xs)

tryErrorPrint :: Show a => String -> a -> IO ()
tryErrorPrint prefix doSlice = do
  putStr prefix
  try (pure $! doSlice) >>= \case
    Left (ErrorCall err) -> putStrLn ('\n':err)
    Right result -> print result

printSlices :: (Show a, Show b) => String -> (Int -> Int -> b -> a) -> b -> IO ()
printSlices name sliceWith xs = do
  putStrLn $ replicate 50 '='
  putStrLn $ "slice (" ++ name ++ "): " ++ show xs
  putStrLn $ "   normal: " ++ show (sliceWith 1 3 xs)
  tryErrorPrint "   negative ix: " (sliceWith (-2) 2 xs)
  tryErrorPrint "   negative size: " (sliceWith 2 (-2) xs)
  tryErrorPrint "   negative ix and size: " (sliceWith (-2) (-1) xs)
  tryErrorPrint "   too large ix: " (sliceWith 6 2 xs)
  tryErrorPrint "   too large size: "  (sliceWith 2 6 xs)
  tryErrorPrint "   too large ix size: " (sliceWith 6 6 xs)

main = do
  let xs = [1, 2, 3, 4, 5] :: [Int]
  printSlices "List" sliceList xs
  printSlices "Vector proposed" sliceVectorProposed $ V.fromList xs
  printSlices "Vector.Boxed current - fused" (\i n -> V.slice i n . V.fromList) xs
  printSlices "Vector.Primitive current - fused" (\i n -> VP.slice i n . VP.fromList) xs
  printSlices "Vector current - unfused"  V.slice (V.fromList xs)

@lehins
Copy link
Contributor

lehins commented Jan 30, 2020

Of course on the other end of the spectrum is to always throw an error on invalid indices, but I suspect in order to achieve that we'll have to break fusion on slicing. The benefit would be that we would get the behavior that a lot of folks expect (but don't always get), namely this one:

==================================================
slice (Vector current - unfused): [1,2,3,4,5]
   normal: [2,3,4]
   negative ix: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (-2,2,5)
   negative size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (2,-2,5)
   negative ix and size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (-2,-1,5)
   too large ix: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (6,2,5)
   too large size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (2,6,5)
   too large ix size: 
./Data/Vector/Generic.hs:396 (slice): invalid slice (6,6,5)

I am rarely in favor of partial functions, therefore I lean towards fixing the arguments to slice functions so they don't fail. All this conversation applies to mutable variants as well, but naturally not for the streaming ones.

@cartazio
Copy link
Contributor

cartazio commented Jan 30, 2020 via email

@cartazio
Copy link
Contributor

for now / cutting a release:
current semantics fix for now? i do agree that theres good reasons to have different semantics or expose both flavors and or improvements to the underlying fusion framework are probably sensible long term directions, but less bugs now is best :). would like to make sure the next release is "minor bump" in spirit despite the piles of stuff

@lehins
Copy link
Contributor

lehins commented Jan 30, 2020

Sounds good. I'll then turn off fusion of the slice function, which will give us semantics that most people would expect (error on invalid indices) and remove the Segfault and Out of memory bugs. We can bikeshed about other semantics and alternative functions later :)

@cartazio
Copy link
Contributor

i mean, it'd be great if we could hae fusion AND nice things, but that might be more involved?

@cartazio
Copy link
Contributor

looking at the underlying code,

slice :: Vector v a => Int -> Int -> New v a -> New v a
{-# INLINE_FUSED slice #-}
slice i n m = apply (MVector.slice i n) m

MVector codes never fuse, that fusion rule doesn't preclude

"slice/new [Vector]" forall i n p.
  slice i n (new p) = new (New.slice i n p)

working.... if we somemthing something?
like, maybe some trick with seq ?

@cartazio
Copy link
Contributor

by which i mean, we want slice to check correctly whether or not it gets rewritten away ?

@cartazio
Copy link
Contributor

hrmm, i need to read through this all more

@lehins
Copy link
Contributor

lehins commented Jan 30, 2020

If it gets rewritten it can't be checked. In case you wanna have more fun with it like I did last night here are my notes:

slice i n (fromList xs)

slice i n (unstream (Bundle.fromList xs))                    -- INLINE

slice i n (new (New.unstream (Bundle.fromList xs)))          -- INLINE

new (New.slice i n (New.unstream (Bundle.fromList xs)))      -- REWRITE "slice/new [Vector]"

new (New.unstream (Bundle.slice i n (Bundle.fromList xs)))   -- REWRITE "slice/unstream [New]"


new (New (MVector.vunstream (Bundle.slice i n (Bundle.fromList xs)))) -- INLINE

-- INLINE:
new (New (MVector.vmunstream (Bundle.lift (Bundle.slice i n (MBundle.unsafeFromList Unknown xs)))))


-- INLINE:
new (New (MVector.vmunstream (Bundle.lift (Bundle.take n (Bundle.drop i (MBundle.unsafeFromList Unknown xs))))))

-- INLINE:
new (New (MVector.vmunstream (Bundle.lift (Bundle.take n (Bundle.drop i (Bundle.fromStream (S.fromList xs) Unknown))))))



-- EVAL: Bundle.fromStream
Bundle.fromStream (S.fromList xs) Unknown ==
Bundle (Stream step t) (Stream step' t) Nothing Unknown

-- EVAL: Bundle.drop i
Bundle.drop i (Bundle (Stream step t) (Stream step' t) Nothing Unknown) ==
fromStream (S.drop i (Stream step t)) (clampedSubtract Unknown (Exact i)) ==
fromStream (S.drop i (Stream step t)) Unknown

-- EVAL: fromStream
Bundle.take n (fromStream (S.drop i (Stream step t)) Unknown) ==
Bundle.take n (Bundle (S.drop i (Stream step t)) (Stream step' t) Nothing Unknown)

-- EVAL: Bundle.take
Bundle.take n (Bundle (S.drop i (Stream step t)) (Stream step' t) Nothing Unknown) ==
fromStream (S.take n (S.drop i (Stream step t) Unknown)) (smaller (Exact n) Unknown) ==
fromStream (S.take n (S.drop i (Stream step t) Unknown)) (Max n)

-- EVAL: fromStream
fromStream (S.take n (S.drop i (Stream step t) Unknown)) (Max n) ==
Bundle (S.take n (S.drop i (Stream step t) Unknown)) (Stream step' t) Nothing (Max n)

-- EVAL: lift
new (New (MVector.vmunstream (Bundle.lift (Bundle (S.take n (S.drop i (Stream step t) Unknown)) (Stream step' t) Nothing (Max n))))) ===
new (New (MVector.vmunstream (Bundle (S.take n (S.drop i (Stream step t) Unknown)) (Stream step' t) Nothing (Max n))))

-- EVAL: vmunstream
new (New (MVector.vmunstream (Bundle (S.take n (S.drop i (Stream step t) Unknown)) (Stream step' t) Nothing (Max n)))) ==
new (New (vmunstreamMax (Bundle (S.take n (S.drop i (Stream step t) Unknown)) (Stream step' t) Nothing (Max n))))

lehins added a commit that referenced this issue Jan 31, 2020
* Streaming causing bounds to not be checked, thus not failing slicing.
  Which differed from the semantics of `slice` when compiled with -O0
* Out of memory explosion when size supplied to `slice` is too high
cartazio pushed a commit that referenced this issue Jan 31, 2020
* Streaming causing bounds to not be checked, thus not failing slicing.
  Which differed from the semantics of `slice` when compiled with -O0
* Out of memory explosion when size supplied to `slice` is too high
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants