Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of filterArray #144

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 49 additions & 2 deletions Data/Primitive/Array.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{-# LANGUAGE CPP, MagicHash, UnboxedTuples, DeriveDataTypeable, BangPatterns #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ScopedTypeVariables #-}

-- |
-- Module : Data.Primitive.Array
Expand All @@ -24,7 +25,8 @@ module Data.Primitive.Array (
sizeofArray, sizeofMutableArray,
fromListN, fromList,
mapArray',
traverseArrayP
traverseArrayP,
filterArray
) where

import Control.Monad.Primitive
Expand Down Expand Up @@ -68,12 +70,14 @@ import GHC.Exts (runRW#)
import GHC.Base (runRW#)
#endif

import Text.ParserCombinators.ReadP
import Text.ParserCombinators.ReadP (string, skipSpaces, readS_to_P, readP_to_S)

#if MIN_VERSION_base(4,9,0) || MIN_VERSION_transformers(0,4,0)
import Data.Functor.Classes (Eq1(..),Ord1(..),Show1(..),Read1(..))
#endif

import Data.Primitive.Internal.Bit

-- | Boxed arrays
data Array a = Array
{ array# :: Array# a }
Expand Down Expand Up @@ -591,6 +595,49 @@ arrayFromListN n l =
arrayFromList :: [a] -> Array a
arrayFromList l = arrayFromListN (length l) l

filterArray :: forall a. (a -> Bool) -> Array a -> Array a
filterArray f arr = runArray $
newBitArray s >>= check 0 0
where
s = sizeofArray arr
check :: Int -> Int -> MutableBitArray s -> ST s (MutableArray s a)
check i count ba
| i /= s
= do
v <- indexArrayM arr i
if f v
then setBitArray ba i >> check (i + 1) (count + 1) ba
else check (i + 1) count ba
| otherwise
= do
mary <- newArray count (die "filterArray" "invalid")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If count equals the size of the original array, we have a much better option available to us. Reuse the original array. This prevents use runArray though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we want runArrays and the like.

fill 0 0 ba mary

-- This performs a few bit operations and a conditional
-- jump for every element of the original array. This is
-- not so great if most element are filtered out. We should
-- consider going word by word through the bit array and
-- using countTrailingZeroes. We could even choose
-- a different strategy for each word depending on its
-- popCount.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: I wouldn't be surprised if unordered-containers had some code or ideas we could steal here.

Copy link
Collaborator

@andrewthad andrewthad Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that CTZ (or equivalently CLZ) is going to do best here. I'm suspect that changing strategies with popCount would not be helpful because I cannot think of a strategy that performs better when most of the elements are preserved. However, when all of them corresponding to a word of bits are preserved (meaning: the word is equal to maxBound :: Word), we could use the functions that copy a slice of the array instead. I guess we could also do this when popCount is really high instead of just when it's 64 (or 32 depending on platform), but we'd be doing several copies instead. I wonder where the breakpoint is for this being effective.

fill :: forall s. Int -> Int -> MutableBitArray s -> MutableArray s a -> ST s (MutableArray s a)
fill !i0 !i'0 !ba !mary = go i0 i'0
where
go :: Int -> Int -> ST s (MutableArray s a)
go i i'
| i == s
= return mary
| otherwise
= do
b <- readBitArray ba i
if b
then do
v <- indexArrayM arr i
writeArray mary i' v
go (i + 1) (i' + 1)
else go (i + 1) i'


#if MIN_VERSION_base(4,7,0)
instance Exts.IsList (Array a) where
type Item (Array a) = a
Expand Down
63 changes: 63 additions & 0 deletions Data/Primitive/Internal/Bit.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{-# LANGUAGE CPP #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Data.Primitive.Internal.Bit
(
MutableBitArray
, newBitArray
, readBitArray
, setBitArray
) where

import Data.Primitive.ByteArray
import Control.Monad.Primitive
import Data.Bits

newtype MutableBitArray s = MBA (MutableByteArray s)

newBitArray :: PrimMonad m => Int -> m (MutableBitArray (PrimState m))
newBitArray n = do
let s = ((n + wordSize - 1) `unsafeShiftR` 3)
mary <- newByteArray s
fillByteArray mary 0 s 0
return (MBA mary)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could refrain from filling the array here, and just use a writeBitArray function that takes the value to write. That's probably better, actually.


readBitArray :: PrimMonad m => MutableBitArray (PrimState m) -> Int -> m Bool
readBitArray (MBA mry) i = do
wd :: Word <- readByteArray mry (whichWord i)
return $! (((wd `unsafeShiftR` whichBit i) .&. 1) == 1)

setBitArray :: PrimMonad m => MutableBitArray (PrimState m) -> Int -> m ()
setBitArray (MBA mry) i = do
let ww = whichWord i
wd :: Word <- readByteArray mry ww
let wd' = wd .|. (1 `unsafeShiftL` (whichBit i))
writeByteArray mry ww wd'

wordSize :: Int
wordSize = finiteBitSize (undefined :: Word)

ctlws :: Int
ctlws
| wordSize == 64 = 6
| wordSize == 32 = 5
| otherwise = countTrailingZeros wordSize

whichWord :: Int -> Int
whichWord i = i `unsafeShiftR` ctlws

whichBit :: Int -> Int
whichBit i = i .&. (wordSize - 1)

{-
-- For debugging
freezeByteArray
:: PrimMonad m => MutableByteArray (PrimState m) -> m ByteArray
freezeByteArray mary = do
s <- getSizeofMutableByteArray mary
cop <- newByteArray s
copyMutableByteArray cop 0 mary 0 s
unsafeFreezeByteArray cop

prant :: MutableBitArray RealWorld -> IO ()
prant (MBA x) = freezeByteArray x >>= print
-}
1 change: 1 addition & 0 deletions primitive.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Library
Data.Primitive.MutVar

Other-Modules:
Data.Primitive.Internal.Bit
Data.Primitive.Internal.Compat
Data.Primitive.Internal.Operations

Expand Down