Out of memory on a big query #82

Raveline · 2018-12-23T10:45:57Z

I'm trying to compile a query returning 37 fields, using 14 joins, 1 single where clause, a group by on 33 fields, and a order by on 4 fields. Sadly, I get "unable to commit 745537536 bytes of memory" when ghc is trying to compile the module containing the query. (I cannot post the query for IP reasons, sorry)

Do you have any idea of what I could do to help the compiler on this ?

echatav · 2018-12-24T06:13:06Z

Oh no :-( that’s not good. I tried Googling the error message but nothing useful came out. You could try putting the query alone in its own module or somehow giving GHC more memory (swap space?) to work with. Squeal uses type level lists which are quite inefficient when calculating Join and Has and the rest. At runtime all that inefficiency should completely go away but compile time is a different story. If I had an equivalent example I could investigate more thoroughly.

Raveline · 2018-12-24T08:04:05Z

Putting the query alone in its own module doesn't seem to make much of a difference. I still have to check if splitting in several functions helps (though it probably shouldn't !).
However, a colleague with more experience in GHC suggested I add a pragma on the file containing the query:

{-# OPTIONS_GHC -fno-specialise -fno-full-laziness  #-}

It still consumes 3.5 GB but that's already way more manageable.

adfretlink · 2019-01-25T08:47:06Z

Some more information about this, thanks to the remarkable investigative work done by @haitlahcen.
There are two issues at hands:

One with Stack and its use of dump-hi files (a non-binary version of GHC's hi), leading to very big files being printed out, with very high memory usage.
One with GHC systematically unfolding types, which takes a lof of memory for the types we use in Squeal.

The current workaround, rather than the --fno-specialise and -fno-full-laziness is to use -fomit-interface-pragmas but @haitlahcen is doing his best to solve the issues in both Stack and GHC. See his issue here for more information: https://ghc.haskell.org/trac/ghc/ticket/8095#comment:58.

For current Squeal users with problematic compilation time and memory usage, -fomit-interface-pragmas is probably the best current solution.

echatav · 2019-01-25T14:53:53Z

Wow! Thanks so much @adfretlink and @haitlahcen ! This is great. Sorry Squeal stresses GHC out so much.

haitlahcen · 2019-01-25T16:37:48Z

Hey! I've opened an issue for stack as well

ilyakooo0 · 2019-10-16T13:12:36Z

Manually unrolling recursive type families should radically improve compile time and memory usage.

Might open a PR today.

adfretlink · 2019-11-12T15:09:53Z

Small update on this topic: we've just squashed our migrations, redifining our Schema as if it was the initial one. We had around ~30 migrations over it. Compilation time for the project went from 40 minutes to 7 ! So there's at least a lead as to the "main culprit" of compilation cost.

echatav · 2019-11-14T18:18:51Z

Pretty interesting. I wonder what would happen with aggressive use of partial type signatures. If all intermediate schemas are wild-carded _, and only the initial and final schemas are explicitly typed, I wonder if that would help both from a compilation efficiency perspective and a code cleanliness perspective...

adfretlink · 2019-11-15T08:02:14Z

How would we do this ?

Something like:

type Base = -- some schema

type AddATable = Create "myTable" ('Table SomeTable) _

type FinalMig = Alter "myTable" ('Table SomeTableV2) AddATable

But how would GHC be able to fetch the order migrations properly ultimately ?

echatav · 2019-11-22T19:16:55Z

The way I do it in my projects is I have a directory structure like

Schema.hs
Schema/V0.hs
Schema/V1.hs
Schema/V2.hs
..

where each V{n}.hs has a SchemasType called DB (or Schemas) and for n > 0

setup :: Definition V{n-1}.DB DB
teardown :: Definition DB V{n-1}.DB
migration :: Migration Definition V{n-1}.DB DB

and Schema.hs has a

migrations :: AlignedList (Migration Definition) V0.DB V{max}.DB
migrations = V1.migration :>> .. :>> V{max}.migration :>> Done

and re-exports V{max}.DB.
And every other module imports the DB from Schema.hs.

Now, we shouldn't need to define any of the intermediate DBs between V0.DB and V{max}.DB because they should all be inferable and nowhere else referenced. I don't know if that would speed up or slow down or have no effect on compilation time, but it would cut down on some redundancy. I haven't settled on best practice for migrations over time yet. I read this review of Beam's migration system which was pretty negative. Some of the critiques might apply to Squeal as well. I'm a little worried that migrations in Squeal are redundant and cause compilation time issues.

Attempt to work around OOM. See: morphismtech/squeal#82

gasi · 2021-02-28T01:13:58Z

@adfretlink Thank you for documenting the workaround using {-# OPTIONS_GHC -fomit-interface-pragmas #-}. I had to use this + globally disabling optimizations using stack build --ghc-options='-O0' to have it pass on the CircleCI free tier (4GB of RAM) without running out of memory.

In case anyone needs a repro, here’s a PR on my open source project that exhibits this problem:
zoomhub/zoomhub#158

@echatav Thanks for documenting how you organize your schema migrations. I ended up doing something similar on my own but it’s nice to see it being validated: https://github.com/zoomhub/zoomhub/tree/69f420ee9f2d6b88392cfa2657948e1c2c74db30/src/ZoomHub/Storage/PostgreSQL/Schema

Raveline assigned Raveline and unassigned Raveline Dec 24, 2018

echatav added the question label Jan 4, 2019

gasi added a commit to zoomhub/zoomhub that referenced this issue Feb 28, 2021

GHC: Omit interface pragmas

b9ac054

Attempt to work around OOM. See: morphismtech/squeal#82

gasi added a commit to zoomhub/zoomhub that referenced this issue Feb 28, 2021

GHC: Omit interface pragmas

66e25f8

Attempt to work around OOM. See: morphismtech/squeal#82

gasi added a commit to zoomhub/zoomhub that referenced this issue Feb 28, 2021

GHC: Omit interface pragmas

916cc7a

Attempt to work around OOM. See: morphismtech/squeal#82

gasi mentioned this issue Mar 11, 2021

Community #279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory on a big query #82

Out of memory on a big query #82

Raveline commented Dec 23, 2018

echatav commented Dec 24, 2018 •

edited

Loading

Raveline commented Dec 24, 2018

adfretlink commented Jan 25, 2019

echatav commented Jan 25, 2019

haitlahcen commented Jan 25, 2019

ilyakooo0 commented Oct 16, 2019 •

edited

Loading

adfretlink commented Nov 12, 2019

echatav commented Nov 14, 2019

adfretlink commented Nov 15, 2019

echatav commented Nov 22, 2019 •

edited

Loading

gasi commented Feb 28, 2021 •

edited

Loading

Out of memory on a big query #82

Out of memory on a big query #82

Comments

Raveline commented Dec 23, 2018

echatav commented Dec 24, 2018 • edited Loading

Raveline commented Dec 24, 2018

adfretlink commented Jan 25, 2019

echatav commented Jan 25, 2019

haitlahcen commented Jan 25, 2019

ilyakooo0 commented Oct 16, 2019 • edited Loading

adfretlink commented Nov 12, 2019

echatav commented Nov 14, 2019

adfretlink commented Nov 15, 2019

echatav commented Nov 22, 2019 • edited Loading

gasi commented Feb 28, 2021 • edited Loading

echatav commented Dec 24, 2018 •

edited

Loading

ilyakooo0 commented Oct 16, 2019 •

edited

Loading

echatav commented Nov 22, 2019 •

edited

Loading

gasi commented Feb 28, 2021 •

edited

Loading