Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Event and Behavior to have a representational role #221

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ocharles
Copy link
Collaborator

@ocharles ocharles commented Sep 14, 2021

Fixes #219. The main problem is that in newtype Event a = E (Prim.Event a), we have

type Prim.Event a = Cached Moment (Pulse a)
data Cached m a = Cached (m a)

and that Cached m a definition forces a to be nominal.

This PR works around this nominal role by using a Yoneda-trick, which restores the role to be representational (at the cost of carrying a mapping function around).

@Icelandjack
Copy link

What if you unfold Cached in Event + make Cached a newtype? Then Event a is coercible to Cached Moment (Pulse a) and you should have a representable argument.

newtype Cached m a = Cached (m a)

newtype Event a = E_ (Moment (Pulse a))

Also: Semigroup, Monoid, Num can be derived for Behaviour via Ap Behaviour a and Future/Moment/MomentIO can newtype derive Functor, Applicative, Monad, MonadFix.

Screenshot from 2021-09-15 01-41-14

@ocharles
Copy link
Collaborator Author

@Icelandjack I need to confirm but I think we might need cached to be data for laziness stuff. But if we don't need that, then your idea is great and gives us one less pointer indirection, too!

@mitchellwrosen
Copy link
Collaborator

Last I looked - and my understanding could be way off here - the Cached type is actually not necessary, it just serves as sort of a reminder that some Cached m computation (namely the ones built fron cache, not dontCache) only run the m effects at most once. But this of course only requires MonadIO and MonadFix.

So if I had to speculate wildly, I'd guess Cached could be a newtype.

@ocharles
Copy link
Collaborator Author

So if I had to speculate wildly, I'd guess Cached could be a newtype.

Let's give it a try then! I'm very keen to chase one fewer pointer if possible. I'll look into this. What @Icelandjack proposes is much nicer than my Yoneda hacking, as cute as it is.

@HeinrichApfelmus
Copy link
Owner

Alas, Cache needs to use data because we want to memoize the value ("laziness stuff") which is used to implement observable sharing.

Observable allows us to use an Event value multiple times without duplicating an Event internally. For example, the code

let e1 = … some event …; e1 = e2 in unionWith const e1 e2

uses the same event internally for e1 and e2; this is not automatic. It would be semantically correct if we duplicate the underlying stream of events of e1 in e2, but less performant.

Observable sharing must use unsafePerformIO (or similar), which happens here:

cache :: (MonadFix m, MonadIO m) => m a -> Cached m a
cache m = unsafePerformIO $ do
key <- liftIO $ newIORef Nothing
return $ Cached $ do

The fact that Cache is data ensures that the unsafePerformIO is not called twice for the same value. Subtle, but very important.

The need for observable sharing likely also rules out other solutions, such as a Yoneda embedding like E a ~ (something a → r) → r. The issue is that function arrows do not play well with sharing (though it is not impossible to make it work).

@HeinrichApfelmus
Copy link
Owner

HeinrichApfelmus commented Sep 15, 2021

How important is this issue of type roles? If it is a minor improvement, I would recommend against complicating the representation of Event or likewise in any way.

If I understand correctly, the issue seems to be that Cached uses m a which is only nominal in a, because m could be a type family. However, we only use the special case m = Moment, which is a type synonym where the parameter a is representational. If we can override the type role, I recommend we do that, if not, then this example is probably something we should show to the GHC developers and leave it alone for now.

@ocharles
Copy link
Collaborator Author

How important is this issue of type roles? If it is a minor improvement, I would recommend against complicating the representation of Event or likewise in any way.

Not hugely, though I refer back to my original example. This is code I wrote and expected to work, and was a bit disappointed to find I was going to have to hand-write the Monoid and Semigroup instances.

If I understand correctly, the issue seems to be that Cached uses m a which is only nominal in a, because m could be a type family. However, we only use the special case m = Moment, which is a type synonym where the parameter a is representational. If we can override the type role, I recommend we do that

Yep, your understanding is correct. I'll see if I can make m representational. I wonder also if data CachedMoment a = CachedMoment (Moment a) is something that could be considered

@mitchellwrosen
Copy link
Collaborator

Thanks for the clarification, that is subtle indeed... so subtle that I'm not sure I understand it yet!

Simplifying the machinery a tiny bit, let's say I decorate an IO action with an unsafePerformIO preamble, just like cache:

{-# NOINLINE cache #-}
cache :: IO a -> IO a
cache action =
  unsafePerformIO do
    putStrLn "hello, world!"
    pure action

I should still expect this function to return a thunk that, when forced to WHNF, prints "hello, world" and returns the IO action passed in. So regardless of how many pointers to the returned action exist, the thunk will only be updated in place once.

Sorry for dragging this on, feel free to ignore this comment

@HeinrichApfelmus
Copy link
Owner

I should still expect this function to return a thunk that, when forced to WHNF, prints "hello, world" and returns the IO action passed in. So regardless of how many pointers to the returned action exist, the thunk will only be updated in place once.

If values of type IO a were thunks, this would be correct. But IO gets special treatment from the compiler. I just checked that it works as you expect it to work in the interpreter, but I don't think that it will (always) work in compiled code.

I remember recording a remark on this in the HeinrichApfelmus/optimize-monad-trans repository at one point:

Note that the same reasoning could be applied to the state token for the IO monad. Why doesn't GHC build closures for that? The answer is that the compiler treats the State# type constructor as a special case and doesn't refrain from recomputing IO actions. This is GHC's infamous state hack.

@HeinrichApfelmus
Copy link
Owner

In the documentation on role annotations, a data type that is identical to Cache m a is mentioned specifically. They call it Tricky a b. 😂 I don't suppose that there is a way to tell the compiler that m can never be a type synonym? There could be a StackOverflow question for that. The case where m is a monad parameter actually seems like a fairly common use case to me.

@tstat
Copy link

tstat commented Sep 18, 2021

Does the "state hack" thwart the sharing attempt in this case? I don't see how yet:

As I understand it, the "state hack" is to label all lambdas binding State# as one shot so that a let on the outside will be floated into the lambda, thus eta-expanding the state lambda beyond the let. This comes back to bite us when the we execute that IO action multiple times. A small example would be:

expensive :: Int -> Int
expensive n = n
{-# NOINLINE expensive #-}

stateHack :: Int -> State# RealWorld -> (# State# RealWorld, Int #)
stateHack n =
  let i = expensive n
   in (\s -> (# s, i #))

where stateHack produces the core:

stateHack
  = \ (n_avF :: Int) (s_avH :: State# RealWorld) ->
      (# s_avH, expensive n_avF #)

and we see that our state lambda was eta-expanded outside of our let.

Compare this with a non-state argument:

normal :: Int -> () -> (# (), Int #)
normal n =
  let i = expensive n
   in (\s -> (# s, i #))

and we see that expensive is shared across calls of the returned function, as expected

normal
  = \ (n_avI :: Int) ->
      let {
        i_sFV :: Int
        [LclId]
        i_sFV = expensive n_avI } in
      \ (s_avK :: ()) -> (# s_avK, i_sFV #)

We aren't discussing the usual "state hack" problem of a let expression returning an IO function. We are using unsafePerformIO which itself uses runRW#, and runRW# is not inlined until after all core-to-core passes.

So, let's instead consider a variant of expensive and stateHack where expensive is in IO and we run it with runRW#:

expensiveIO :: Int -> State# RealWorld -> (# State# RealWorld, Int #)
expensiveIO n s = (# s, n #)
{-# NOINLINE expensiveIO #-}

stateHackIO :: Int -> (State# RealWorld -> (# State# RealWorld, Int #))
stateHackIO n =
  -- newAction :: IO (IO Int)
  let newAction :: State# RealWorld -> (# State# RealWorld, State# RealWorld -> (# State# RealWorld, Int #) #)
      newAction st0 = case expensiveIO n st0 of
        (# st1, i #) -> (# st1, \s -> (# s, i #) #)
   in case runRW# newAction of
        (# _, a #) -> a

In this variant of stateHack the returned state lambda is inside a tuple, and expensiveIO n st0 cannot be inlined into the state lambda because the resulting state token, st1, is used in the unboxed tuple that contains the state lambda.

Peeking at the core reveals that we get the desired sharing:

Core
stateHackIO
  = \ (n_axw :: Int) ->
      case runRW#
             @ ('TupleRep '[ 'TupleRep '[], 'LiftedRep])
             @ (# State# RealWorld,
                  State# RealWorld -> (# State# RealWorld, Int #) #)
             (\ (st0_axy [OS=OneShot] :: State# RealWorld) ->
                case expensiveIO n_axw st0_axy of { (# ipv_syX, ipv1_syY #) -> -- execute `expensiveIO` and bind result to `ipv1_syY`
                (# ipv_syX,
                   \ (s_axB :: State# RealWorld) -> (# s_axB, ipv1_syY #) #) -- `ipv1_syY` is captured in the returned IO function
                })
      of
      { (# ipv_sz1, ipv1_sz2 #) ->
      ipv1_sz2
      }

Now consider a cache example using IORef

cache :: IO a -> IO a
cache action =
  unsafePerformIO do
    ref <- newIORef Nothing
    let newAction = do
          readIORef ref >>= \case
            Nothing -> do
              res <- action
              writeIORef ref (Just res)
              pure res
            Just res -> pure res
    pure newAction

We are returning an IO function, that is, a State# bound lambda, and it seems to me that the important question would be: "Does the returned State# lambda eta-expand past the creation of the IORef?" I don't see how it would, and the core I compiled below shows that it did not. I think this is because of the reasons I outlined in the simpler example above, namely, that the returned IO function is inside an unboxed tuple that provides no eta-expansion opportunity, and runRW# is not inlined until CorePrep.

Core
cache
  = \ (@ a_axD) (action_awc :: IO a_axD) ->
      case runRW#
             @ ('TupleRep '[ 'TupleRep '[], 'LiftedRep])
             @ (# State# RealWorld, IO a_axD #)
             (\ (s_aE8 [OS=OneShot] :: State# RealWorld) ->
                case noDuplicate# @ RealWorld s_aE8 of s'_aE9 { __DEFAULT ->
                case newMutVar#
                       @ (Maybe a_axD) @ RealWorld (GHC.Maybe.Nothing @ a_axD) s'_aE9
                of
                { (# ipv_aET, ipv1_aEU #) ->  -- Our IORef is created here
                (# ipv_aET,
                   (\ (s1_aEK :: State# RealWorld) -> -- And here is our State# lambda capturing the IORef
                      case readMutVar# @ RealWorld @ (Maybe a_axD) ipv1_aEU s1_aEK of
                      { (# ipv2_XF8, ipv3_XFa #) ->
                      case ipv3_XFa of {
                        Nothing ->
                          case (action_awc
                                `cast` (GHC.Types.N:IO[0] <a_axD>_R
                                        :: IO a_axD
                                           ~R# (State# RealWorld -> (# State# RealWorld, a_axD #))))
                                 ipv2_XF8
                          of
                          { (# ipv4_XFi, ipv5_XFk #) ->
                          case writeMutVar#
                                 @ RealWorld
                                 @ (Maybe a_axD)
                                 ipv1_aEU
                                 (GHC.Maybe.Just @ a_axD ipv5_XFk)
                                 ipv4_XFi
                          of s2#_aGj
                          { __DEFAULT ->
                          (# s2#_aGj, ipv5_XFk #)
                          }
                          };
                        Just res_axr -> (# ipv2_XF8, res_axr #)
                      }
                      })
                   `cast` (Sym (GHC.Types.N:IO[0] <a_axD>_R)
                           :: (State# RealWorld -> (# State# RealWorld, a_axD #))
                              ~R# IO a_axD) #)
                }
                })
      of
      { (# ipv_aEb, ipv1_aEc #) ->
      ipv1_aEc
      }

This can perhaps be seen even more plainly in the STG:

STG
    \r [action_sIE]
        case
            case noDuplicate# [GHC.Prim.realWorld#] of s'_sIF [Occ=Once] {
            __DEFAULT ->
            case newMutVar# [GHC.Maybe.Nothing s'_sIF] of {
            (#,#) ipv_sIH [Occ=Once] ipv1_sII ->  -- We create our IORef and bind it to ipv1_sII here
            let {
              sat_sIU [Occ=Once] :: GHC.Types.IO a_axD -- We allocate a closure for our returned IO function here, capturing our IORef
              [LclId] =
                  \r [s1_sIJ] -- Here is the state lambda
                      case readMutVar# [ipv1_sII s1_sIJ] of {
                      (#,#) ipv2_sIL [Occ=Once*] ipv3_sIM [Occ=Once!] ->
                      case ipv3_sIM of {
                        GHC.Maybe.Nothing ->
                            case action_sIE ipv2_sIL of {
                            (#,#) ipv4_sIP [Occ=Once] ipv5_sIQ ->
                            let {
                              sat_sIR [Occ=Once] :: GHC.Maybe.Maybe a_axD
                              [LclId] =
                                  CCCS GHC.Maybe.Just! [ipv5_sIQ];
                            } in
                              case
                                  writeMutVar# [ipv1_sII sat_sIR ipv4_sIP]
                              of
                              s2#_sIS [Occ=Once]
                              {
                              __DEFAULT -> (#,#) [s2#_sIS ipv5_sIQ];
                              };
                            };
                        GHC.Maybe.Just res_sIT [Occ=Once] -> (#,#) [ipv2_sIL res_sIT];
                      };
                      };
            } in  (#,#) [ipv_sIH sat_sIU];
            };
            }
        of
        {
        (#,#) _ [Occ=Dead] ipv1_sIX [Occ=Once] -> ipv1_sIX;
        };

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add role annotations to Behavior and Event
5 participants