-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations for event dispatch #243
base: master
Are you sure you want to change the base?
Conversation
{----------------------------------------------------------------------------- | ||
Type and class instances | ||
------------------------------------------------------------------------------} | ||
newtype ReaderWriterIOT r w m a = ReaderWriterIOT { run :: r -> IORef w -> m a } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually just incurs more boxing than using CPS'ed WriterT
, because whenever we call run
we need to box up an IORef
@@ -107,14 +108,14 @@ insertNodes :: RWS.Tuple BuildR (EvalPW, BuildW) Lazy.Vault -> [SomeNode] -> Que | |||
insertNodes (RWS.Tuple (time,_) _ _) = go | |||
where | |||
go :: [SomeNode] -> Queue SomeNode -> IO (Queue SomeNode) | |||
go [] q = return q | |||
go (node@(P p):xs) q = do | |||
go [] !q = return q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big reduction in allocations, perhaps the biggest in this branch. I think we could replace insertNodes
with foldl'
or foldr
and this might be a bit more obvious.
@@ -39,7 +39,7 @@ newInput = mdo | |||
} | |||
-- Also add the alwaysP pulse to the inputs. | |||
let run :: a -> Step | |||
run a = step ([P pulse, P always], Lazy.insert key (Just a) Lazy.empty) | |||
run a n = step ([P pulse, P always], Lazy.insert key (Just a) Lazy.empty) n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This eta expansion allows run
to be a two-parameter closure, rather than two nested closures.
-- Recursively execute the buildLater calls. | ||
unfold :: BuildR -> BuildW -> BuildIO a -> IO (a, BuildW) | ||
unfold !i w m = do | ||
(a, BuildW (w1, w2, w3, later)) <- RW.runReaderWriterIOT m i | ||
let !w' = w <> BuildW (w1,w2,w3,mempty) | ||
w'' <- case later of | ||
Just m -> snd <$> unfold i w' m | ||
Nothing -> return w' | ||
return (a,w'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily need to make unfold
a top-level binding, but the important thing is that it doesn't mention any variables bound by runBuildIO
. This allows us to inline a bit of runBuildIO
, and to float the recursion loop out to the top-level. Otherwise, each runBuildIO
allocates a new recursion closure
@@ -23,7 +23,7 @@ import Reactive.Banana.Prim.Util | |||
-- | A 'Network' represents the state of a pulse/latch network, | |||
data Network = Network | |||
{ nTime :: !Time -- Current time. | |||
, nOutputs :: !(OrderedBag Output) -- Remember outputs to prevent garbage collection. | |||
, nOutputs :: {-# unpack #-} !(OrderedBag Output) -- Remember outputs to prevent garbage collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoids actually allocating an OrderedBag
when creating an updated Network
after stepping - instead we can just store the underlying pointers that we have access to from some worker/wrapper transformations.
…c-evaluation Perform evaluation steps `Network` using `GraphGC` ### Overview This pull request completely changes the way that dependencies between `Pulse` are tracked and used in order to perform an `Evaluation.step` for a `Network`. We use the machinery provided by `GraphGC` to * track dependencies between `Pulse`, using `insertEdge` and `clearPredecessors`. * traverse the `Pulse` in dependency order with early exit, using `walkSuccessors_`. ### Comments * This should fix many remaining issues with garbage collection for `Pulse`, specifically #261 * I think that in order to fix *all* remaining issues for `Pulse`, we may have to look at garbage collection and `Vault`. * This pull request doesn't do anything for `Latch`. Still, 🥳! ### Obsoletes * #182 * sadly, #243
No pressure of course but it'd be great to get these memory optimizations in somehow :) As mentioned in #268 this PR was made stale. Maybe smaller chunks would be appropriate? Or, well -- I doubt it -- probably just bad timing. Big rewrites of internals don't happen often. |
Huh, if the benchmarks haven't changed, things seem to have gotten substantially worse. I'm seeing |
Fixing the space leaks was worth it, I would say. 😅 Also note that this is "The total number of bytes allocated by the program over the whole run." Which is not quite the same as "The maximum space actually used by your program is the 'bytes maximum residency' figure." The total number of bytes is highly correlated to CPU time, not so much to memory use. @mitchellwrosen , if you want to play around with such lower-level optimizations, switching class IsInteger v where
-- | `toInteger x = toInteger y` implies `x = y`.
toInteger :: v → Integer or something like that would be fine with me. Hm, or maybe class HasUnique v where
toUnique :: v → Unique |
I think to get down to |
Can we put this under CPP or a flag and use IntMap on x64? |
This commit continues my work on low-level optimizations for event dispatch. My plan of attack is to compile all modules with
-ddump-stg
, and then methodically work from the outside in, finding all unnecessarylet
bindings (let
bindings I know that we will always need to force) and trying to eliminate them. The commits here show the reduction in heap allocations forcabal run benchmarks -- -p Boring --stdev Infinity +RTS -s
.This is still WIP, but I thought I'd throw this up for now so others can see what I'm doing!