-
-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compile time quadratic in number of routes, with -O2 #986
Comments
I think we've seen a couple of similar issues. A rather efficient workaround is to split up the API and servers in several modules (which most people tend to do naturally anyways, when web apps become that large), as things suddenly become a lot less overwhelming for GHC. But yes, any information would be useful. If you can try other servant & GHC versions, that would be lovely. This is however definitely annoying and I would personally be delighted to see any improvement there, may it come from servant's code that would put less pressure on GHC, or from GHC dealing with this type of code more efficiently. The first thing to do though, for anyone interested in investigating this, to figure out what's going on and where all that time is spent, would be to run a profiling-enabled GHC over your module. |
I tested with a few other versions.
So it's been quadratic for some time, it seems as though it got substantially worse either in GHC 8.2 or since servant-0.13. I'll take a look at which next time I work on this - and also on the profiling-enabled GHC. My production, non-benchmark code base does indeed split the routes across modules. I tried to measure some difference from that earlier today, but perhaps I didn't have enough routes to see an effect. It's good to hear that splitting is helpful. |
Building with 16
32
64
hoist?As we inline This reminds me of https://ghc.haskell.org/trac/ghc/ticket/13253, There's also a patch in https://ghc.haskell.org/trac/ghc/ticket/13253#comment:24, |
This is still a problem in GHC 8.6.3. This example takes me 60 sec to build with only 20 routes. Running |
In testing, this reduced compile time from 1 min to 37 sec for the sample project. See haskell-servant/servant#986 for details.
I looked a bit more into this. I generated a few modules that look like: {-# LANGUAGE DataKinds, TypeOperators, TypeFamilies #-}
module API_2 where
import Data.Proxy
import Servant
type API = "1" :> Get '[JSON] String :<|> "2" :> Get '[JSON] String
api = Proxy :: Proxy API
server :: Server API
server = return "1" :<|> return "2"
app = serve api server where I'm varying the number of endpoints from a module to another. I have APIs with 1,2,3,4,5,10,20,30,40,50 endpoints. And I run this little script: #! /usr/bin/env sh
rm -f API_*
for i in 1 2 3 4 5 10 20 30 40 50; do
runghc gen_api.hs $i
time ghc -c --make API_$i.hs -o API_$i.o \
-freduction-depth=0 -O -ddump-timings -v2 \
-ddump-simpl -ddump-simpl-stats -ddump-to-file \
-dsuppress-all \
+RTS -s -RTS |& tee API_$i.log.txt
done
I can see some interesting things in the log. For example, if we look at the Core optimisation stats for the 50-endpoints API... We see (see the bottom of this comment) that parsing/renaming/typechecking/desugaring don't take very long, and that after desugaring, the code is not too big yet:
The simplifier runs on that, the result size doesn't change all that much. But then, boom, we specialise:
We float out:
and try to simplify:
and from that point onwards, we're working with thousands and thousands of terms, millions of types and coercions. Many passes take whole seconds and gigabytes of allocations, until...
Core Tidy really deserves its name. Here are the timings & allocs for all the passes:
I'm not going to look at the Core for $ wc -l API_*.dump-simpl
944 API_10.dump-simpl
219 API_1.dump-simpl
2134 API_20.dump-simpl
281 API_2.dump-simpl
3724 API_30.dump-simpl
349 API_3.dump-simpl
5714 API_40.dump-simpl
422 API_4.dump-simpl
8104 API_50.dump-simpl
499 API_5.dump-simpl but let's see what we get for ==================== Tidy Core ====================
Result size of Tidy Core
= {terms: 243, types: 840, coercions: 573, joins: 0/0}
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
$s$fHasServerTYPE:<|>context_$croute2
$s$fHasServerTYPE:<|>context_$croute2 = "2"#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
$s$fHasServerTYPE:<|>context_$croute1
$s$fHasServerTYPE:<|>context_$croute1
= unpackCString# $s$fHasServerTYPE:<|>context_$croute2
-- RHS size: {terms: 2, types: 3, coercions: 0, joins: 0/0}
lvl_r9QC
lvl_r9QC = Right $s$fHasServerTYPE:<|>context_$croute1
-- RHS size: {terms: 4, types: 12, coercions: 0, joins: 0/0}
app6
app6 = \ s_a6U0 -> (# s_a6U0, lvl_r9QC #)
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
$s$fHasServerTYPE:<|>context_$croute5
$s$fHasServerTYPE:<|>context_$croute5 = "1"#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
$s$fHasServerTYPE:<|>context_$croute4
$s$fHasServerTYPE:<|>context_$croute4
= unpackCString# $s$fHasServerTYPE:<|>context_$croute5
-- RHS size: {terms: 2, types: 3, coercions: 0, joins: 0/0}
lvl1_r9QD
lvl1_r9QD = Right $s$fHasServerTYPE:<|>context_$croute4
-- RHS size: {terms: 4, types: 12, coercions: 0, joins: 0/0}
app7
app7 = \ s_a6U0 -> (# s_a6U0, lvl1_r9QD #)
-- RHS size: {terms: 3, types: 6, coercions: 36, joins: 0/0}
app5
app5 = :<|> (app7 `cast` <Co:18>) (app6 `cast` <Co:18>)
-- RHS size: {terms: 1, types: 0, coercions: 98, joins: 0/0}
server
server = app5 `cast` <Co:98>
-- RHS size: {terms: 2, types: 5, coercions: 98, joins: 0/0}
app_result
app_result = Route (app5 `cast` <Co:98>)
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
$trModule2
$trModule2 = "API_2"#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
$trModule1
$trModule1 = TrNameS $trModule2
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
$trModule4
$trModule4 = "main"#
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
$trModule3
$trModule3 = TrNameS $trModule4
-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
$trModule
$trModule = Module $trModule3 $trModule1
-- RHS size: {terms: 1, types: 3, coercions: 0, joins: 0/0}
api
api = Proxy
-- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
lvl2_r9QE
lvl2_r9QE = : $fAcceptTYPEJSON4 $fAcceptTYPEJSON2
-- RHS size: {terms: 2, types: 11, coercions: 0, joins: 0/0}
lvl3_r9QF
lvl3_r9QF = \ _ -> lvl2_r9QE
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
$s$fHasServerTYPE:<|>context_$croute3
$s$fHasServerTYPE:<|>context_$croute3 = 200
-- RHS size: {terms: 12, types: 16, coercions: 0, joins: 0/0}
lvl4_r9QG
lvl4_r9QG
= \ s1_a8on ->
case newByteArray# 10# s1_a8on of { (# ipv_a8or, ipv1_a8os #) ->
$wouter
ipv1_a8os 4# $s$fHasServerTYPE:<|>context_$croute1 0# ipv_a8or
}
-- RHS size: {terms: 12, types: 16, coercions: 0, joins: 0/0}
lvl5_r9QH
lvl5_r9QH
= \ s1_a8on ->
case newByteArray# 10# s1_a8on of { (# ipv_a8or, ipv1_a8os #) ->
$wouter
ipv1_a8os 4# $s$fHasServerTYPE:<|>context_$croute4 0# ipv_a8or
}
-- RHS size: {terms: 4, types: 2, coercions: 0, joins: 0/0}
lvl7_r9QJ
lvl7_r9QJ = ++_$s++ [] $fAcceptTYPEJSON4 $fAcceptTYPEJSON2
-- RHS size: {terms: 2, types: 11, coercions: 0, joins: 0/0}
$s$fAllCTRender:a_$callMime
$s$fAllCTRender:a_$callMime = \ _ -> lvl7_r9QJ
-- RHS size: {terms: 4, types: 2, coercions: 15, joins: 0/0}
$sencode
$sencode
= \ eta_a9of ->
toLazyByteString ((string1 eta_a9of) `cast` <Co:15>)
-- RHS size: {terms: 2, types: 4, coercions: 0, joins: 0/0}
lvl8_r9QK
lvl8_r9QK = \ _ -> $sencode
-- RHS size: {terms: 3, types: 5, coercions: 0, joins: 0/0}
$s$fAllCTRender:a_$s$fMimeRenderTYPEJSONa
$s$fAllCTRender:a_$s$fMimeRenderTYPEJSONa
= C:MimeRender $fAcceptTYPEJSON lvl8_r9QK
-- RHS size: {terms: 5, types: 16, coercions: 0, joins: 0/0}
$s$fAllMimeRender:a0_$callMimeRender
$s$fAllMimeRender:a0_$callMimeRender
= \ _ w2_a9nu ->
$w$callMimeRender1
$s$fAllCTRender:a_$s$fMimeRenderTYPEJSONa w2_a9nu
-- RHS size: {terms: 3, types: 9, coercions: 9, joins: 0/0}
$s$fAllCTRender:a_$s$fAllMimeRender:a0
$s$fAllCTRender:a_$s$fAllMimeRender:a0
= C:AllMimeRender
(lvl3_r9QF `cast` <Co:9>) $s$fAllMimeRender:a0_$callMimeRender
-- RHS size: {terms: 21, types: 30, coercions: 0, joins: 0/0}
$wlvl_r9QL
$wlvl_r9QL
= \ ww_s9J5 ww1_s9J6 ww2_s9J7 ww3_s9J8 w_s9J2 ->
case $w$sparseQuality2 ww_s9J5 ww1_s9J6 ww2_s9J7 ww3_s9J8 of {
Nothing -> Nothing;
Just x1_a9nm ->
mapQuality_$smapQuality2
(map
$fAllCTRender:a1
($w$callMimeRender1
$s$fAllCTRender:a_$s$fMimeRenderTYPEJSONa w_s9J2))
x1_a9nm
}
-- RHS size: {terms: 12, types: 19, coercions: 1, joins: 0/0}
lvl9_r9QM
lvl9_r9QM
= \ _ w1_s9J1 w2_s9J2 ->
case w1_s9J1 `cast` <Co:1> of
{ PS ww1_s9J5 ww2_s9J6 ww3_s9J7 ww4_s9J8 ->
$wlvl_r9QL ww1_s9J5 ww2_s9J6 ww3_s9J7 ww4_s9J8 w2_s9J2
}
-- RHS size: {terms: 3, types: 9, coercions: 9, joins: 0/0}
$s$fAllCTRender:a
$s$fAllCTRender:a
= C:AllCTRender
($s$fAllCTRender:a_$callMime `cast` <Co:9>) lvl9_r9QM
-- RHS size: {terms: 6, types: 33, coercions: 11, joins: 0/0}
lvl11_r9QO
lvl11_r9QO
= \ @ env_a8nX ->
$w$croute13
$s$fAllCTRender:a
($fReflectMethodStdMethodGET_$creflectMethod `cast` <Co:4>)
($s$fHasServerTYPE:<|>context_$croute3 `cast` <Co:7>)
Proxy
-- RHS size: {terms: 23, types: 113, coercions: 18, joins: 0/0}
$s$fHasServerTYPE:>context4_$croute
$s$fHasServerTYPE:>context4_$croute
= \ @ env_a8nX w2_a8nY _ w4_a8o0 ->
case w2_a8nY of { Proxy ->
StaticRouter
(case runRW# lvl4_r9QG of { (# ipv_a8ow, ipv1_a8ox #) ->
case ipv1_a8ox of dt_a8oB { Text ipv2_a8Hq ipv3_a8Hr ipv4_a8Hs ->
Bin 1# dt_a8oB (lvl11_r9QO (w4_a8o0 `cast` <Co:18>)) Tip Tip
}
})
[]
}
-- RHS size: {terms: 23, types: 113, coercions: 18, joins: 0/0}
$s$fHasServerTYPE:>context4_$croute1
$s$fHasServerTYPE:>context4_$croute1
= \ @ env_a8nX w2_a8nY _ w4_a8o0 ->
case w2_a8nY of { Proxy ->
StaticRouter
(case runRW# lvl5_r9QH of { (# ipv_a8ow, ipv1_a8ox #) ->
case ipv1_a8ox of dt_a8oB { Text ipv2_a8Hq ipv3_a8Hr ipv4_a8Hs ->
Bin 1# dt_a8oB (lvl11_r9QO (w4_a8o0 `cast` <Co:18>)) Tip Tip
}
})
[]
}
-- RHS size: {terms: 7, types: 6, coercions: 0, joins: 0/0}
app4
app4 = \ _ _ _ _ _ _ -> app_result
-- RHS size: {terms: 10, types: 13, coercions: 260, joins: 0/0}
app3
app3
= Delayed
(emptyDelayed4 `cast` <Co:34>)
(emptyDelayed2 `cast` <Co:32>)
(emptyDelayed2 `cast` <Co:32>)
(emptyDelayed2 `cast` <Co:32>)
(emptyDelayed2 `cast` <Co:32>)
(emptyDelayed2 `cast` <Co:32>)
(emptyDelayed2 `cast` <Co:32>)
(emptyDelayed1 `cast` <Co:34>)
app4
-- RHS size: {terms: 6, types: 31, coercions: 0, joins: 0/0}
app2
app2
= $w$croute
$s$fHasServerTYPE:>context4_$croute1
$s$fHasServerTYPE:>context4_$croute
Proxy
$WEmptyContext
app3
-- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
app1
app1 = runRouterEnv app2 ()
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
app
app = toApplication app1
------ Local rules for imported ids --------
"SPEC/API_2 $fHasServerTYPE:>context4_$croute @ "2" @ (Verb
'GET 200 '[JSON] String) @ '[]" [2]
forall w1_s8Hz w_s8Hy.
$fHasServerTYPE:>context4_$croute w_s8Hy w1_s8Hz
= $s$fHasServerTYPE:>context4_$croute
"SPEC/API_2 $fHasServerTYPE:>context4_$croute @ "1" @ (Verb
'GET 200 '[JSON] String) @ '[]" [2]
forall w1_s8Hw w_s8Hv.
$fHasServerTYPE:>context4_$croute w_s8Hv w1_s8Hw
= $s$fHasServerTYPE:>context4_$croute1
"SPEC/API_2 $fAllCTRender:a_$callMime @ JSON @ '[]" [2]
forall w1_s9j3 w_s9j2.
$fAllCTRender:a_$callMime w_s9j2 w1_s9j3
= $s$fAllCTRender:a_$callMime
"SPEC/API_2 $fAllCTRender:a @ JSON @ '[] @ [Char]"
forall v2_s90i v1_s90h v_s90g.
$fAllCTRender:a v_s90g v1_s90h v2_s90i
= $s$fAllCTRender:a
"SPEC/API_2 $fAllMimeRender:a0_$callMimeRender @ JSON @ [Char]" [2]
forall w_s9nT.
$fAllMimeRender:a0_$callMimeRender w_s9nT
= $s$fAllMimeRender:a0_$callMimeRender
"SPEC/API_2 $fAllMimeRender:a0 @ JSON @ [Char]"
forall v_s9np.
$fAllMimeRender:a0 v_s9np
= $s$fAllCTRender:a_$s$fAllMimeRender:a0
"SPEC/API_2 encode @ [Char]"
forall $dToJSON_s9om. encode $dToJSON_s9om = $sencode
"SPEC/API_2 $fMimeRenderTYPEJSONa @ [Char]"
forall v_s9oc.
$fMimeRenderTYPEJSONa v_s9oc
= $s$fAllCTRender:a_$s$fMimeRenderTYPEJSONa I'm not sure GHC does anything particularly wrong here, but if anyone sees something, let me know. Perhaps the servant libraries could give better hints so as to make sure users don't end up with gigantic Core. It'd be interesting to see on what kind of Core terms the blowup happens. Whole core stats:
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.6.5
$ ghc-pkg list servant*
/nix/store/j86yn21gz4zly8w31hjd9vd5gw9yd2h1-ghc-8.6.5-with-packages/lib/ghc-8.6.5/package.conf.d
servant-0.15
servant-server-0.15 |
Would be interesting to take a look at this one with the GHC profiling story improved. |
These issues also happens with I believe the core issue could be related to the 7 year old TypeFamilies painfully slow issue. If I'm reading trac correctly, a fix related to exponential inlining should be in GHC 9.0.1. I haven't tested this since
|
I don't think the issue is just type families, they're used quite a bit in servant, but not everywhere. Quite likely the problem is similar to what we see with https://gitlab.haskell.org/ghc/ghc/-/issues/5642 and many other tickets, where we have a bunch of values that have different types, and where each sub-expression that combines those values ends up having its own type too. Simply generating the Core to describe all those things ends up taking quite a bit of time and space, and this all has to be processed by every single step of the optimisation pipeline etc. I suspect |
I took your example from above and expanded it to 37 routes (random copy pasting gave an odd number): {-# LANGUAGE DataKinds, TypeOperators, TypeFamilies #-}
module API_2 where
import Data.Proxy
import Servant
type API = "1" :> Get '[JSON] String :<|> "2" :> Get '[JSON] String
api = Proxy :: Proxy API
server :: Server API
server = return "1" :<|> return "2"
app = serve api server Compiling this with
Turning off
Before I knew to try
Resulting in:
I think that
Where things grow, but definitely don't blow up. Types blow up in the specialization pass right after:
Maybe if I do Edit: With
Note that types were at I'm guessing if someone were to see the core after pre-inlining and after specialization they might be able to say what optimization causes the types blowup? |
GHC 9 takes things with that example from 10s to 16s, so going in the wrong direction... Something interesting is the blow up seems to happen in the 3rd chunk of "Simplifier" instead of the inliner though using 9.0. Hopefully GHC 9.2 helps, but servant definitely doesn't build with it right now and I'm not sure how to make this problem more minimal yet. |
I think you are right @alpmestan, I tried replicating the compile slowdown with mini servant and could not. I noticed one thing that the minimal version is missing is content types. Is there a way to isolate testing just the content type pieces compile time? I'm currently trying to figure out if that's possible understand how the content type machinery works with the serve function, but that will take some time for me. I also wonder if ghc is just slow with type level lists and the operations that seem to be behind a lot of the content-type machinery |
Probably it's not about the content types alone, but about adding them to everything else. However, you can grab the |
Just ran the same benchmark as @bergey in the first post on my machine (laptop from like 2014-2015), and then changed all benchmark implementations to using I built everything with It seems it fixes the quadratic times in compilation:
|
Here examples of how the modules look and the complete output of the bench: Example of the anonymous routes{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Semigroup
import Data.Text (Text)
import Servant hiding (NotSecure)
import Servant.Server
import Network.Wai.Handler.Warp (defaultSettings, runSettings,
setLogger, setPort)
import qualified Data.Text as T
type MainAPI =
"static" :> "1" :> Get '[JSON] Text
:<|> "static" :> "2" :> Get '[JSON] Text
:<|> "static" :> "3" :> Get '[JSON] Text
:<|> "static" :> "4" :> Get '[JSON] Text
:<|> "static" :> "5" :> Get '[JSON] Text
:<|> "static" :> "6" :> Get '[JSON] Text
:<|> "static" :> "7" :> Get '[JSON] Text
:<|> "static" :> "8" :> Get '[JSON] Text
mainServer :: Server MainAPI
mainServer = foo :<|> foo :<|> foo :<|> foo
:<|> foo :<|> foo :<|> foo :<|> foo
foo = return "foo"
main :: IO ()
main = runSettings defaultSettings $
serveWithContext (Proxy :: Proxy MainAPI) EmptyContext mainServer Benchmark output of anonymous routes
Example of NamedRoutes{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeOperators #-}
module Main where
import Data.Semigroup
import Data.Text (Text)
import GHC.Generics (Generic)
import Servant hiding (NotSecure)
import Servant.API.Generic
import Servant.Server
import Network.Wai.Handler.Warp (defaultSettings, runSettings,
setLogger, setPort)
import qualified Data.Text as T
type MainAPI = NamedRoutes API
data API mode = API
{ static1 :: mode :- "static" :> "1" :> Get '[JSON] Text
, static2 :: mode :- "static" :> "2" :> Get '[JSON] Text
, static3 :: mode :- "static" :> "3" :> Get '[JSON] Text
, static4 :: mode :- "static" :> "4" :> Get '[JSON] Text
, static5 :: mode :- "static" :> "5" :> Get '[JSON] Text
, static6 :: mode :- "static" :> "6" :> Get '[JSON] Text
, static7 :: mode :- "static" :> "7" :> Get '[JSON] Text
, static8 :: mode :- "static" :> "8" :> Get '[JSON] Text
} deriving Generic
mainServer :: Server MainAPI
mainServer = API
{ static1 = foo
, static2 = foo
, static3 = foo
, static4 = foo
, static5 = foo
, static6 = foo
, static7 = foo
, static8 = foo
}
foo = return "foo"
main :: IO ()
main = runSettings defaultSettings $
serveWithContext (Proxy :: Proxy MainAPI) EmptyContext mainServer
Benchmark output of NamedRoutes
If anyone else wants to bench the NamedRoutes: Here's my fork |
It is extremely surprising to me that I am guessing |
Is it maybe because the flat API isn't actually flat with the anonymous routes, but nested with every extra
|
The We piggy-back on the instance of |
2 ideas then:
Maybe there's a better way to say "print all fired optimizations" in a way you could diff though. |
For what it's worth, I was not able to reproduce those benchmark results. For me, named routes were slower than the old way of doing things by a fair bit. You can see the entire benchmark script and results here: https://gist.github.com/tfausak/8019733fb5c703994d1665143c60ad0f Click to expand example "old" module.{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
module M ( main ) where
import qualified Network.Wai as Wai
import qualified Network.Wai.Handler.Warp as Warp
import qualified Servant
main :: IO ()
main = Warp.run 3000 application
application :: Wai.Application
application = Servant.serve proxy server
proxy :: Servant.Proxy Api
proxy = Servant.Proxy
server :: Servant.Server Api
server
= pure 1
Servant.:<|> pure 2
type Api
= "1" Servant.:> Servant.Get '[Servant.JSON] Int
Servant.:<|> "2" Servant.:> Servant.Get '[Servant.JSON] Int Click to expand example "new" module.{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TypeOperators #-}
module M ( main ) where
import qualified Network.Wai as Wai
import qualified Network.Wai.Handler.Warp as Warp
import qualified Servant
import qualified Servant.API.Generic as Servant
import qualified Servant.Server.Generic as Servant
import qualified GHC.Generics as Generics
main :: IO ()
main = Warp.run 3000 application
application :: Wai.Application
application = Servant.genericServe routes
routes :: Routes Servant.AsServer
routes = Routes
{ _1 = pure 1
, _2 = pure 2
}
data Routes routes = Routes
{ _1 :: routes Servant.:- "1" Servant.:> Servant.Get '[Servant.JSON] Int
, _2 :: routes Servant.:- "2" Servant.:> Servant.Get '[Servant.JSON] Int
} deriving (Generics.Generic) With 64 routes, the old module compiles in 4.4 seconds compared to 16.2 seconds for the new module. I know that benchmarking is hard, so perhaps I've done something wrong here. Or maybe my server definitions are subtly different than @Vlix's in some way. But I wasn't able to get any module with named routes compiling faster than an equivalent module with plain old |
Also, I notice that compilation times for the |
@tfausak EDIT: Their try gave me an idea, though, brb EDIT2: Ok, it's not done benching, but I think I'm already seeing what happened. I didn't build the "old timing" with tl;dr MY BAD! It's not
|
Ok, rebuilt again with just So yeah... if anyone knows the diff between
Regular (flat) anonymous routes with
|
You could run git bisect with a script for 32 routes that fails if it takes more than 20 seconds to narrow things down if there are a lot of commits. |
The plot thickens :-) Off the top of my head, I can't think of any change in 0.19 that could be responsible for such improvements. I am going to try bissecting on my end. |
@Vlix I am not able to reproduce your results either. Compilation times for a module of 64 anonymous routes seems pretty much the same on
This is using GHC 8.10.4 though. |
I am able to reproduce the performance difference between Servant 0.18.3 and Servant 0.19. I'm using the same setup as before, which is with Stackage LTS 18.28, which uses GHC 8.10.7.
Comparing the output of 46a47
> constraints 0.13.3
114,115c115,116
< servant 0.18.3
< servant-server 0.18.3
---
> servant 0.19
> servant-server 0.19
142a144
> type-equality 1 |
Just tried with GHC 8.10.7 in a slightly modified servant repository — still can't reproduce the huge gains witnessed by some, so obviously I am not able to @ParetoOptimalDev It looks like you intended to go down that route, did you by any chance figure it out ? |
Are you using a Stack LTS? Or are you using cabal? 🤔 |
This is using I am benching with something along those lines:
where My changes are in this commit: 142681c If you use nix, you may run the tests in the same environment as I using where |
Bisect can't accurately be done with cabal because
I'll try this out later and see if I can reproduce with your code. I am now suspect of getting compile time differences from cabal though. I guess we have to cabal clean, compile, make a change in the module, then time that compile. |
Yeah, I noticed that as well, which is why I am removing the output manually before each run:
It does seem to effectively do the same thing as |
Does this cause the recompile to actually happen? I had inconsistent results removing stuff from dist and assumed there might be another cache layer or something. |
It did work, AFAICT. |
Ok,
Which seems to be the addition of Anyone else who can verify the same behaviour? (so slow with It's also the commit that removed support for GHC |
@Vlix I can repro the performance difference between |
Can also repro the difference between the two commits - 6.6s on b7c6, and 70s on 51c8. However, I'm not able to reproduce any speedup from updating the API at work from .18 to .19. |
I have managed to isolate it to the I've then run it through
I can't quite work out enough to know why that's the reason though. |
Perhaps generating the core and posting here and somewhere like haskell-cafe can help us find answers? |
Inlining the diff here for convenience: - serveWithContext :: ( HasServer api context
+ serveWithContext :: forall api context. ( HasServer api context
, HasContextEntry (context .++ DefaultErrorFormatters) ErrorFormatters )
=> Proxy api -> Context context -> Server api -> Application
- serveWithContext p context server =
- toApplication (runRouter format404 (route p context (emptyDelayed (Route server))))
+ serveWithContext p context server = toApplication (runRouter format404 (route p context (emptyDelayed router)))
where
+ (router :: RouteResult (ServerT api Handler)) = Route $ (hoistServerWithContext p (Proxy :: Proxy context) (id :: Handler a -> Handler a) server :: ServerT api Handler)
format404 = notFoundErrorFormatter . getContextEntry . mkContextWithErrorFormatter $ context The thing that stands out for me here is the explicit quantification and explicit type signature. I'm unsure how we could reduce this to a more minimal reproduction. |
I've not gotten to generating the core properly and posting it up somewhere, but I did find the following:
This is where the issue happens:
And in the slow version, you can see the explosion:
I also ran Fast:
Slow:
I know this isn't the most useful thing, but I forgot to save the splices... |
Note that many of these types of issues are fixed in ghc 9.2, so it might be fixed there. Edit: not sure if they all made it into 9.2 |
Did I misread? I thought it was linked to this issue: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3497 |
I've been digging a bit to understand where time is spent while building my $WORK codebase, and found this interesting behavior. I'm hoping others interested in investigating the root cause will discuss here. I'm open to changes in application code, servant, or GHC, if we can narrow the cause enough.
The small examples I'm using for benchmarking are at https://github.com/bergey/studious-telegram/tree/trunk/servant
I'm timing with bench invoked as
bench 'stack clean && stack build
. I'm running on a GCP VM with nothing else running on it.With
stack build --fast
(which I believe passes -O0 to GHC), the build time doesn't change dramatically as the number of HTTP routes increases. Without--fast
:The only mitigation I've found so far is testing as much as possible in GHCi or with
--fast
. This may be related to #870; I have not yet tested with anything except stackage-11.14, which has GHC-8.2.2 and servant-0.13.0.1.The text was updated successfully, but these errors were encountered: