Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking down the FastWeak "leak" and solving it for ghc 9.X #515

Open
maralorn opened this issue Jan 1, 2025 · 1 comment
Open

Tracking down the FastWeak "leak" and solving it for ghc 9.X #515

maralorn opened this issue Jan 1, 2025 · 1 comment

Comments

@maralorn
Copy link
Contributor

maralorn commented Jan 1, 2025

reflex-plattform contains a patch for the garbage collector of ghcjs to fix a "leak" in the implementation of FastWeak/FastWeakBag. Before we can migrate production systems to ghc 9.X we need to either forward port that patch, ideally upstreaming it for all ghc backends, determine that it is unnecessary or find another workaround. @ryantrinkle has started on upstreaming this at https://gitlab.haskell.org/ghc/ghc/-/issues/25373, where the core problem is described:

  • Every value pointed to by a waiting finalizer is alive.
  • On GC all dangling Weak pointers will be GC’d, but their finalizer will only be run after the GC phase.
  • Thus, all Weak pointers retained by a finalizer can only be cleared in the next GC step.
  • This leads to problem with chains of Weak pointers:

    Specifically, in the case of a doubly-linked list which is weak in one direction, cleaning up a list of n items takes O(n) full GC+finalization cycles. Since certain workloads (particularly FRP) construct long chains like this frequently, in a steady state, this results in memory effectively leaking - even though it will eventually get cleaned up, garbage can be created much faster than it gets cleaned up.

My goal is to

  1. write a reproducer for the problem as it occurs in reflex.
  2. Figure out whether we can find a workaround or demonstrate that a patch like the above will be needed.
@maralorn
Copy link
Contributor Author

maralorn commented Jan 1, 2025

My progress so far:

  1. I am unclear about the relation of WeakBag and FastWeakBag. WeakBag seems more complicated and thus more prone to leaks, but only FastWeakBag uses the ghcjs patch. My best guess is that both "Bags" can lead to the leak, it wasn’t easy to implement the workaround for WeakBag so the code got split into FastWeakBag where the workaround is crucial and WeakBag with a more flexible interface where we can live with this problem. From that I conclude that reproducing and fixing the problem with FastWeakBag is the crucial point.

  2. I have thus far not been able to reproduce the leak by nesting FastWeakBags. This is my code:

{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE ScopedTypeVariables #-}
module FastWeakLeak(main) where

import Data.FastWeakBag
import System.Mem
import Control.Monad (forM_)

n :: Int
n = 10

main :: IO ()
main = do
  chain <- weakBagChain
  forM_ ns $ \_ -> do
    printChainDepth chain
    performMajorGC
    putStrLn "Running GC …"
 where
  ns = [0 :: Int .. 3]

printChainDepth :: Chain -> IO ()
printChainDepth c' = do
  putStr "Printing chain: "
  go 0 c'
  where
    go !(k :: Int) (Chain c) = do
      e <- isEmpty c
      if e then
        putStrLn $ "Empty bag at depth " <> show k
       else
        traverse_ c (go (k+1))


newtype Chain = Chain (FastWeakBag Chain)

weakBagChain :: IO Chain
weakBagChain = go n
 where
  go 0 = Chain <$> empty
  go k = do
    c' <- go (k-1)
    c <- empty
    _ <- insert c' c
    pure $ Chain c

With the result:

Printing chain: Empty bag at depth 10
Running GC …
Printing chain: Empty bag at depth 0
Running GC …
Printing chain: Empty bag at depth 0
Running GC …
Printing chain: Empty bag at depth 0
Running GC …

From the inspection of the FastWeakBag code it is also a bit unclear to me why it would leak in the described way, because the finalizers of the Weak values only reference the children IORef and not the FastWeakBag itself.

So either:

  1. in my reproducer ghc is sneakily running 10 gc's at a time. (but running with n=10000 looks equivalent).
  2. The "leak" in reflex is gone either by an earlier refactoring of FastWeakBag or a newer ghc 9.6 version.
  3. The "leak" in reflex is more complicated to reproduce and/or only affects WeakBag.

My money is on 3 although I am overall still confused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant