Simple query causing massive memory usage (pushing over the half gigabyte level allowed by Heroku) #312

benjie · 2017-01-16T10:37:52Z

This is an issue that I'm digging into - I'm tracking my findings here so I can ask for help later if necessary.

The minimal query to trigger this is:

{
  sketchById(id: 11) {
    sketchIntegrationEventRunsBySketchId(orderBy: ID_DESC, first: 100) {
      edges {
        node {
          __id
        }
      }
    }
  }
}

benjie · 2017-01-16T10:45:48Z

So I've downloaded the data from my staging environment and am running the dev version of PostGraphQL locally. I have 16GB of RAM whereas Heroku only gives 512MB so it's not a very fair comparison, and I have to monitor the memory very fast to see the issue.

In order to do so, I've set up this loop (OSX, so using gdate from coreutils):

while sleep 0.001; do
  echo -n "`gdate +%s%3N` ";
  ps -o 'pid=,rss=' -p 45416;
done | tee log.txt

This has given me quite a high resolution file exposing the memory usage. I'm going to build a graph from it and upload... cat log.txt | sed 's/ /'$'\t''/g' > memory.tsv

benjie · 2017-01-16T10:58:30Z

I ran the query six times; first with gaps of a few seconds between each one, finally in quick succession. The x-axis is time in milliseconds, y-axis is RSS memory usage in kilobytes.

As can be seen, this simple query is allocating a hell of a lot of RAM.

benjie · 2017-01-16T11:07:54Z

I've now added --inspect to the ts-node command line and connected Chrome Canary as a debugger to the node process. I ran the query a few times to "warm up" the system; then used the "Record Allocation Profile" profiling tool for a single query execution to see where the RAM is being allocated:

benjie · 2017-01-16T11:11:11Z

Setting DEBUG="*" I get:


  postgraphql:request GraphQL query request has begun. +7s
  body-parser:json content-type "application/json" +11s
  body-parser:json content-encoding "identity" +2ms
  body-parser:json read body +1ms
  body-parser:json parse body +4ms
  body-parser:json parse json +0ms
  body-parser:urlencoded body already parsed +0ms
  body-parser:text body already parsed +1ms
  postgraphql:request GraphQL query is parsed. +13ms
  postgraphql:request GraphQL query is validated. +10ms
  postgraphql:graphql { sketchById(id: 11) { sketchIntegrationEventRunsBySketchId(orderBy: ID_DESC, first: 100) { edges { node { __id } } } } } +1ms
  postgraphql:postgres begin +1ms
  postgraphql:postgres select row_to_json(__local_0__) as object from "hookhaven_data"."sketch" as __local_0__ where "id" = any($1) limit $2 +8ms
  postgraphql:postgres select to_json(__local_0__) as value from "hookhaven_data"."sketch_integration_event_run" as __local_0__ where "id" is not null and true and true and ("sketch_id" = $1) order by "id" using > limit $2 +6ms
  postgraphql:postgres:explain
Limit  (cost=0.00..2.14 rows=1 width=32) (actual time=0.129..0.129 rows=1 loops=1)
  ->  Seq Scan on sketch __local_0__  (cost=0.00..2.14 rows=1 width=32) (actual time=0.128..0.128 rows=1 loops=1)
        Filter: (id = ANY ('{11}'::integer[]))
        Rows Removed by Filter: 3
Planning time: 0.052 ms
Execution time: 0.142 ms +1ms
  postgraphql:postgres commit +2s
  postgraphql:postgres:explain
Limit  (cost=222.36..222.59 rows=90 width=36) (actual time=419.363..438.374 rows=90 loops=1)
  ->  Sort  (cost=222.36..222.59 rows=90 width=36) (actual time=419.362..438.358 rows=90 loops=1)
        Sort Key: id DESC
        Sort Method: external merge  Disk: 69008kB
        ->  Bitmap Heap Scan on sketch_integration_event_run __local_0__  (cost=4.98..219.44 rows=90 width=36) (actual time=1.892..145.971 rows=90 loops=1)
              Recheck Cond: (sketch_id = 11)
              Filter: (id IS NOT NULL)
              Heap Blocks: exact=80
              ->  Bitmap Index Scan on index_sketch_integration_event_run__fk_sketch_id  (cost=0.00..4.96 rows=90 width=0) (actual time=0.019..0.019 rows=90 loops=1)
                    Index Cond: (sketch_id = 11)
Planning time: 0.088 ms
Execution time: 509.332 ms +219ms
  postgraphql:request GraphQL query has been executed. +1ms
  postgraphql:request GraphQL query request finished. +2ms
0 error(s) in 2487.72ms :: { sketchById(id: 11) { sketchIntegrationEventRunsBySketchId(orderBy: ID_DESC, first: 100) { edges { node { __id } } } } }

benjie · 2017-01-16T11:13:29Z

The important part of that seems to be:

  postgraphql:postgres begin +1ms
  postgraphql:postgres select row_to_json(__local_0__) as object from "hookhaven_data"."sketch" as __local_0__ where "id" = any($1) limit $2 +8ms
  postgraphql:postgres select to_json(__local_0__) as value from "hookhaven_data"."sketch_integration_event_run" as __local_0__ where "id" is not null and true and true and ("sketch_id" = $1) order by "id" using > limit $2 +6ms
  postgraphql:postgres:explain
Limit  (cost=0.00..2.14 rows=1 width=32) (actual time=0.129..0.129 rows=1 loops=1)
  ->  Seq Scan on sketch __local_0__  (cost=0.00..2.14 rows=1 width=32) (actual time=0.128..0.128 rows=1 loops=1)
        Filter: (id = ANY ('{11}'::integer[]))
        Rows Removed by Filter: 3
Planning time: 0.052 ms
Execution time: 0.142 ms +1ms
  postgraphql:postgres commit +2s

The commit (last line) didn't fire until 2 seconds after the explain suggesting the memory allocation/slow processing is happening after postgres has returned the data whilst it is being processed by postgraphql

benjie · 2017-01-16T11:21:21Z

Running a CPU profile over a single query yields:

Quite a lot of this time is spent in the pg module itself.

benjie · 2017-01-16T11:33:47Z

So running the second query returns 72MB of data.

$ echo 'select to_json(__local_0__) as value from "hookhaven_data"."sketch_integration_event_run" as __local_0__ where "id" is not null and true and true and ("sketch_id" = 11) order by "id" using > limit 100' | time psql bots_staging | wc -c
 72609787
psql bots_staging  6.36s user 0.13s system 75% cpu 8.646 total
wc -c  0.29s user 0.04s system 3% cpu 8.645 total

So it seems that the issue here could be fixed along with #265: basically we're requesting data that we don't need, and if that data is large then it can cause massive performance/memory allocation issues.

calebmer · 2017-01-16T11:45:20Z

This issue seems to match what you're hitting: brianc/node-postgres#1103

How many fields are you selecting? This may be fixable by not using * and only getting the fields you need.

calebmer · 2017-01-16T11:46:59Z

You posted the comment before I did 😉

I'm going to look into fixing this. It's high priority so I'll see what I can do.

benjie · 2017-01-16T11:56:45Z

I've been looking into #265 for a while; it seems this is a good starting point. Basically we want to remove the createPgProcedureObjectTypeGqlFieldEntry workaround, and instead do something like:

resolve(source, args, {fieldASTs}) {
  // Look up SQL columns (and computed columns!) required by fieldASTs
  // and add them to the select right here
}

which instead of doing select row_to_json(${sql.identifier(aliasIdentifier)}) as object would do something more like

select json_build_object('col1', ${tbl}.col1, 'col2', ${tbl}.col2, 'computedField1', computedField1(${tbl}), 'computedField2', computedField2(${tbl}, 27, 'Foo')) only referencing the requested columns.

I've not yet dug into fieldASTs so I'm not sure how transparent/opaque it is.

benjie · 2017-01-16T12:03:12Z

@calebmer Why are we doing row_to_json(...) rather than merely returning * directly - were you expecting to add more "meta" columns later, or maybe it's for compatibility with the _hooks that send the data back to postgresql later?

benjie · 2017-01-16T12:05:55Z

(Using psql, replacing to_json(__local_0__) with __local_0__.* cuts the query time from ~8 seconds to ~6.8, so it's only about 15% faster - I don't think that's worth making the code changes for.)

calebmer · 2017-01-16T12:45:12Z

We use row_to_json for easier composite type support. row_to_json will turn composite types into nested objects which is convenient.

The first fix I want to implement is just selecting the columns that are requested when there are no procedures. So for now at least a procedure column will cause a de-opt as we revert to the * case. Of course in the future I want to implement the solution we discussed in #265 for procedures, but for now I’m at least focusing on the more common case which is at least a little easier to fix. Does the query that ends up breaking your Heroku box use procedure columns?

benjie · 2017-01-16T12:50:03Z

I use computed columns heavily, but fortunately not in this one case. (I do have performance issues related to computed columns, but they're not as problematic as this one is.)

calebmer · 2017-01-16T12:53:36Z

Ok. Computed columns are definitely something I really want to make fast as well given how great they are for the developer experience. Out of curiosity do you use a lot of connection computed columns? Those will be pretty tricky to optimize and may always require a de-opt.

benjie · 2017-01-16T12:57:36Z

I'm going to keep taking notes here as I dig further into understanding the codebase (something I've been meaning to do for a while).

Using the awesome ts-node --inspect I've added a debugger statement at the top of readPage in PgPaginatorOrderingAttributes; and am now stepping through the backtrace because I can't find the fieldASTs property. Here's how the data looks as it leaves the graphql module and enters postgraphql:

benjie · 2017-01-16T12:59:08Z

Digging deeper into the info object, we can see the fields that are being requested on sketchIntegrationEventRunsBySketchId; ultimately our {edges: {node: { __id}}} path:

So this is probably what fieldASTs has become?

benjie · 2017-01-16T13:00:48Z

Confirmed: in v0.8.0 fieldASTs was renamed to fieldNodes:

https://github.com/graphql/graphql-js/releases

calebmer · 2017-01-16T13:02:09Z

Awesome 👍

I will note the approach I’m taking here. Because the GraphQL and Postgres parts of the codebase are kept separate (and intentionally so) I’m going to add a optimizations object which the GraphQL adapter can pass through the interface. The optimizations object will contain all of the fields selected in GraphQL. My current designs for an optimizations object depend on the changes in #281. So I’m rushing to finish that first.

Out of curiosity, does fieldNodes resolve fragment fields? Or will we have to resolve fragment fields?

benjie · 2017-01-16T13:05:38Z

Looks like resolve takes 4 arguments: source, args, context and info; and it's info.fieldNodes that we need.

benjie · 2017-01-16T13:06:41Z

Good question, I shall investigate...

benjie · 2017-01-16T13:12:34Z

New query, with an inline and an external fragment and a relation lookup:

{
  sketchById(id: 11) {
    sketchIntegrationEventRunsBySketchId(orderBy: ID_DESC, first: 100) {
      edges {
        node {
          __id
          ... on SketchIntegrationEventRun {
            createdAt
          }
          ... F0
        }
      }
    }
  }
}
fragment F0 on SketchIntegrationEventRun {
  id
  sketchBySketchId {
		...F1
  }
}
fragment F1 on Sketch {
  updatedAt
}

It seems it does not lookup the fragments for you, you'll need to resolve them yourself:

benjie · 2017-01-16T13:15:54Z

Looks like operation is the entire query, whereas fieldNodes is specifically just the current thing that's being resolved.

calebmer · 2017-01-16T17:44:42Z

Unfortunately I timed out on this. I’m not sure when I’m going to get the chance to work on it again. In order to solve some of the major issues like this one I’m going to need more community support. Either fiscally or in code contributions.

benjie · 2017-07-09T21:17:02Z

I'm guessing this is fixed in v4 so I'm going to tag it as so - if I accidentally close this without confirming please let me know!

benjie · 2018-07-21T12:57:03Z

#nostalgia

benjie added the 👨‍💻 Fix in v4 label Jul 9, 2017

benjie removed the 👨‍💻 Fix in v4 label Sep 22, 2017

benjie closed this as completed Jul 21, 2018

benjie added this to the 4.0 milestone Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple query causing massive memory usage (pushing over the half gigabyte level allowed by Heroku) #312

Simple query causing massive memory usage (pushing over the half gigabyte level allowed by Heroku) #312

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017 •

edited

Loading

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jul 9, 2017

benjie commented Jul 21, 2018

Simple query causing massive memory usage (pushing over the half gigabyte level allowed by Heroku) #312

Simple query causing massive memory usage (pushing over the half gigabyte level allowed by Heroku) #312

Comments

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017 • edited Loading

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

benjie commented Jan 16, 2017

calebmer commented Jan 16, 2017

benjie commented Jul 9, 2017

benjie commented Jul 21, 2018

benjie commented Jan 16, 2017 •

edited

Loading