Node 4.2 large memory spikes and timeouts on Heroku #3370

AndrewBarba · 2015-10-14T16:29:36Z

Below is a screenshot of the past 12 hours which has been a total disaster for us to say the least. After updating to node 4.2 (from 0.10) in production we immediately exceeded all memory quotas and experienced a high volume of timeouts (even with no load and memory under the 1GB limit).

First, I apologize if this is not the place for this. I am happy to move discussion somewhere else and we will help diagnose whatever you guys need. We did this same parade with Node 0.12 and had to downgrade to 0.10.

Second, and I guess the real question here, is Heroku's 512MB of ram simply not enough to run Node 4.x? If that is the case, cool, but memory constraints definitely need to be made more clear.

Timeline:

Tue, Oct. 13th, at 2pm EST we deployed Node 4.2.0 with cluster enabled running 2 threads. Immediately hit 512 memory limit as seen in the picture below.
Tue, Oct. 13th, at 2:15pm EST we removed cluster completely. Hit memory limits 30 min later.
Tue, Oct. 13th, at 4:00pm EST we saw Node 4.2.1 was released, deployed 4.2.1. Continued to hit memory limits
Tue, Oct. 13th, at 6:00pm EST we doubled memory to 1GB

In general you can see the memory is all over the place, maybe that is expected with newer versions of V8...

Although I don't have a screen, you can see in the first part of the graph running Node 0.10 that it stays almost perfectly flat at 256MB of ram. Under any load, that was consistent.

For reference, here is a load test we did in a dev environment running Node 4.2.1, cluster forked to 4 processes, and about 5k rpms. Also immediately hit the higher 1GB memory limit. We then dropped this down to 2 forked processes with the same result.

jasnell · 2015-10-14T16:32:24Z

@rvagg @trevnorris @Fishrock123 ... ideas?

bnoordhuis · 2015-10-14T16:34:03Z

What happens when you start with (for example) node --max_old_space_size=256? The GC is lazy (lazier than v0.10) and the default upper limit is about 1.5 GB so you'll need to cap it if your machine has less memory.

AndrewBarba · 2015-10-14T16:49:04Z

Okay just tried running with --max_old_space_size=256 using cluster and 4 processes. Hit the limit immediately so dropping down to 1 process again. Is the 256mb per process?

2015-10-14T16:46:29.457954+00:00 heroku[web.1]: source=web.1 dyno=heroku.18327029.28d659a4-2e6e-4ee5-9fb9-18dc6d99e1de sample#memory_total=578.31MB sample#memory_rss=445.58MB sample#memory_cache=0.00MB sample#memory_swap=132.73MB sample#memory_pgpgin=338334pages sample#memory_pgpgout=225287pages

bnoordhuis · 2015-10-14T16:58:31Z

Is the 256mb per process?

Yes.

AndrewBarba · 2015-10-14T17:02:28Z

Dropping down to 1 process --max_old_space_size=256 and --max_old_space_size=128 both returned VERY consistent results. Huge difference. Here is running at 128:

And here is running at --max_old_space_size=512:

jasnell · 2015-10-14T17:07:52Z

@bnoordhuis ... looking at this, we likely definitely want to make sure this is documented better

AndrewBarba · 2015-10-14T17:21:46Z

Yes very eye-opening for me at least. Fingers crossed, but I think you guys just saved us from having to revert everything again. Really appreciate the quick response. Going to leave this open through end of day while we do a few more tests on our end.

bnoordhuis · 2015-10-14T20:25:55Z

@jasnell I don't disagree but where would you put it? Maybe it's time we add a FAQ.

mscdex · 2015-10-14T21:19:32Z

FWIW I just added a FAQ to the wiki here and it is linked to on the main wiki page. Perhaps this could also be linked to somewhere on nodejs.org and other places?

friism · 2015-10-15T04:46:58Z

/cc @hunterloftis

jbergstroem · 2015-10-15T05:04:21Z

perhaps dig through a few old issues that shared the same type of characteristics so google/seo/etc can drive people in the right direction?

hunterloftis · 2015-10-15T16:10:37Z

@friism I recommend --max_old_space_size when this occasionally pops up; the biggest downside is that when you specify that flag and then exceed the limit, your app will shut down hard. Heroku allows you to burst 5x the memory limit, so it's not always best to crash an app instead of letting it temporarily exceed soft limits.

This is probably documentation we should provide as well in a 'Memory-management with Node on Heroku' article.

AndrewBarba · 2015-10-15T20:19:37Z

@hunterloftis From what we've seen, the second we hit a memory limit everything pretty much starts timing out. I'd actually rather a hard kill at that point so we can get a fresh node up that can respond to requests again.

You mention that if we specify --max_old_space_size the node will shut down immediately if we hit that limit, but doesn't that flag have to do with V8's heap and when it should start cleaning things up? I don't see how that relates to Heroku's memory limit and when a node is killed (other than exceeding Heroku's limit of course). In the tests above where we passed --max_old_space_size=128 the memory on the node was around 220MB.

hunterloftis · 2015-10-15T20:25:19Z

@AndrewBarba yeah it looks like your app is particularly hard-hit by lazy collection, and in that case the lesser evil is shutting down a process.

I'd be interested in hearing from @bnoordhuis on this but, from what I've seen, if your app requires more space than max_old_space_size allows (ie, you're actually storing > X mb in objects, vs storing < X mb in objects and needing more frequent sweeps to clear out things you don't store anymore)... then you'll get 'allocation failed' errors (hard shutdowns).

Also keep in mind that 'old space' is just one part of the application's memory footprint, so you can expect the whole app to take > max_old_space_size as a whole.

trevnorris · 2015-10-15T20:34:47Z

Might try playing with flags like --gc_interval and/or --always_compact. Will slow down the application, but if there's a memory crunch then is worth giving it a go.

hunterloftis · 2015-10-15T20:44:28Z

For a kitchen sink example with arbitrary values:

node --gc_global --optimize_for_size --max_old_space_size=960 --use_idle_notification --always_compact --max_executable_size=64 --gc_interval=100 --expose_gc server.js

bnoordhuis · 2015-10-17T17:02:49Z

@hunterloftis Yes, --max_old_space_size is a hard limit. --use_idle_notification is currently a no-op, by the way.

claudiorodriguez · 2015-10-20T20:38:05Z

Just weighing in to share my experience: I had this happen to me with 0.12.x and 4.0.x - before I migrated everything to 4.0, Heroku Support recommended --max_old_space_size and it worked like a charm. I still get some instances of the error every now and then, but the affected dyno shuts down and no great harm is done, just a couple dozen timed out requests once a week or so (out of thousands per minute).
Also, migrating to 4.0.x cut down memory usage from an average of about 500MB to 270 per dyno.

AndrewBarba · 2015-10-20T21:04:05Z

@fansworld-claudio That's great to hear. What size dyno are you running and what did you end up setting max_old_space_size to? Also have you used this setting with cluster?

claudiorodriguez · 2015-10-20T21:11:03Z

@AndrewBarba At first I had the Standard-2X (1GB) dynos and max_old at 960, then after a couple weeks of stability I scaled it down to Standard-1X (512MB) and max_old at 480 - has been stable for a couple months. YMMV though, this is a REST API that does mostly i/o with redis and mongo.

mike-zorn · 2015-10-30T14:14:23Z

It's kinda a pain to set this for all your heroku apps (and remember to vary this based on dyno size), so I made heroku-node wraps node and sets the max_old_space_size based on the values of $WEB_MEMORY and $WEB_CONCURRENCY which heroku sets based on the size of dyno you've chosen.

AndrewBarba · 2015-10-31T14:56:27Z

@ApeChimp I just gave that a go and this showed in the console immediately:

2015-10-31T14:51:14.352577+00:00 app[web.1]: <--- Last few GCs --->
2015-10-31T14:51:14.352577+00:00 app[web.1]:
2015-10-31T14:51:14.352579+00:00 app[web.1]:    14856 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 93.5 / 0 ms [allocation failure] [GC in old space requested].
2015-10-31T14:51:14.352580+00:00 app[web.1]:    14967 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 111.4 / 0 ms [allocation failure] [GC in old space requested].
2015-10-31T14:51:14.352581+00:00 app[web.1]:    15055 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 87.9 / 0 ms [last resort gc].
2015-10-31T14:51:14.352582+00:00 app[web.1]:    15178 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 122.7 / 0 ms [last resort gc].
2015-10-31T14:51:14.352582+00:00 app[web.1]:
2015-10-31T14:51:14.352583+00:00 app[web.1]:
2015-10-31T14:51:14.352583+00:00 app[web.1]: <--- JS stacktrace --->
2015-10-31T14:51:14.352584+00:00 app[web.1]:
2015-10-31T14:51:14.352585+00:00 app[web.1]: ==== JS stack trace =========================================

Looks like you are dividing WEB_MEMORY by some factor of concurrency but WEB_MEMORY as explained by that article you reference is the recommended memory for each process. As is, running this on the Performance M dyno would give you:

WEB_CONCURRENCY = 5;
WEB_MEMORY = 512;
WEB_MEMORY / (2 * WEB_CONCURRENCY) = 51.2MB

51.2MB of ram is not enough to even start up. I think you can just simplify to WEB_MEMORY / 2 or even WEB_MEMORY / 1.5. For us 256 seems to be the sweet spot so we'll stick with / 2

mike-zorn · 2015-10-31T20:55:24Z

@AndrewBarba, mea culpa That's fixed as of [email protected].

piranna · 2015-11-19T15:05:00Z

Couldn't the garbage collector also being executed when getting and out of memory exception? This would helps in a more generic way on systems with really low memory constrains, for example when executing npm on NodeOS it starts killing processes on QEmu with the default memory settings (128mb), and the same would happen on Raspberry Pi without swap...

ChALkeR · 2015-11-19T16:45:46Z

@piranna What exact exception do you mean? If you are having allocation failures in mind, catching all of those in some generic way is a terrible idea — many things could go wrong from that. If you are speaking about the system oom killer, intercepting it requires specific settings on the system level.

piranna · 2015-11-19T17:30:06Z

@piranna What exact exception do you mean?

The error I get is an Out of Memory exception, thrown at process level:

It happens when calling malloc() and the system don't have enough free memory or a contiguos free chunk big enough for the requested size. I propose that when creating a new Javascript object, when trying to reserve memory capture the exception on the C++ area and if so, exec the garbage colector and try to reserve it again. Probably this is more a v8 issue...

ChALkeR · 2015-11-19T17:41:33Z

@pirania That screenshot you have posted shows the OOM killer triggered. It's not interceptable by default.

Also, it's not necessary caused by node process. Your process could allocate some memory at the beginning, then do nothing and still be killed by the OOM killer later (while doing nothing).

piranna · 2015-11-19T17:50:25Z

@pirania That screenshot you have posted shows the OOM killer triggered. It's not interceptable by default.

What a shame, I though it would be a good feature :-(

Also, it's not necessary caused by node process. Your process could allocate some memory at the beginning, then do nothing and still be killed by the OOM killer later (while doing nothing).

Yeah, I know it's killing random processes after it. In fact in that screenshot the error was caused by slap but nsh stopped working too :-/

joanniclaborde · 2015-11-20T20:58:17Z

For what it's worth, here's the solution we're trying now (I sure wish I had found this conversation before!!):

if [ ! "$WEB_MEMORY" = "" ]; then
  if [ $WEB_MEMORY -le 512 ]; then
    NODE_FLAGS="--max_semi_space_size=2 --max_old_space_size=256 --max_executable_size=192"
  elif [ $WEB_MEMORY -le 768 ]; then
    NODE_FLAGS="--max_semi_space_size=8 --max_old_space_size=512 --max_executable_size=384"
  elif [ $WEB_MEMORY -le 1024 ]; then
    NODE_FLAGS="--max_semi_space_size=16 --max_old_space_size=1024 --max_executable_size=512"
  fi
fi

node $NODE_FLAGS "$@"

I'm getting the values for those flags from the V8 defaults, and it seems to be working great so far (Heroku / 512mb).

piranna · 2015-12-11T19:02:59Z

the default upper limit is about 1.5

Could it be possible to lower it when the sum of phisical memory + swap is smaller than this? It doesn't makes sense to have such a big limit if it's impossible to achieve it... This would help on memory-constrained systems.

hunterloftis · 2015-12-11T19:51:52Z

@piranna see #3370 (comment) and #3370 (comment)

piranna · 2015-12-11T19:56:30Z

@piranna see #3370 (comment) and #3370 (comment)

Yes, I've read them, what I'm asking for is to calculate and set them automatically on start instead of setting the values with some flags.

joanniclaborde · 2015-12-11T20:00:44Z

Or even a single flag could be a nice option: node --physical_memory=$WEB_MEMORY

joanniclaborde · 2015-12-11T20:05:23Z

I also added custom settings for modulus.io's 396mb servos:

if [ $1 -le 396 ]; then
  NODE_FLAGS="--max_semi_space_size=1 --max_old_space_size=198 --max_executable_size=148"
elif ...

piranna · 2015-12-11T22:17:02Z

Or even a single flag could be a nice option: node --physical_memory=$WEB_MEMORY

Is not max_old_space_size flag equivalent to this? Maybe it would be extended to accept auto to calculate it from the current system memory...

TylerBrock · 2016-01-12T00:21:51Z

So because setting max-old-space-size just crashes your application when the limit is reached it isn't very helpful. I made this https://github.com/HustleInc/regiment which will seamlessly create and preemptively replace workers so that you never hit max-old-space-size.

TylerBrock · 2016-01-14T01:13:12Z

Follow up: this is pure win. I'm running serveral 2x dynos in production for a very leaky app with Regiment.middleware.MemoryFootprint(750) and I've had no dropped requests, no memory overuse, and my aggregate memory usage hovers around 700-750mb/dyno.

You do not want to just set --max-semi-space-size, max-old-space-size, max-executable-size etc..., because when those limits are reached your node process simply crashes and any currently running requests are cut short.

abienkowski · 2016-02-03T13:51:49Z

Thank you for the detailed analysis.

Just spent all day debugging a critical infrastructure performance issues, and finally found that one of the services was move to v4.2.8. After rolling back to v0.12 everything returned to normal.

bjfletcher · 2016-05-03T09:20:45Z

@TylerBrock you mentioned that:

So because setting max-old-space-size just crashes your application when the limit is reached

I've been experimenting and noticed if --max-old-space-size is used without --max_semi_space_size being also adjusted then, yes, the application will get killed. If --max_semi_space_size is set to 1 then usually it doesn't get killed.

If anyone's interested, I've written more about the experiment here. If you have any ideas on how I can improve the experiment & understanding, I'd really appreciate it.

davibe · 2016-07-12T08:15:27Z

I am testing --max-old-space-size=128 and i can confirm that when the application exceed 128mb of actually used memory the process gets killed.

damianmr · 2016-07-12T14:06:17Z

Hello, the info in this issue was very useful. I made a module based on @joanniclaborde suggestion.
It is here: https://github.com/damianmr/heroku-node-settings

The good thing is that the node process never reaches its max memory available, so it never crashes in Heroku.

We have been using it in a real project with thousand of requests per minute and had no crash whatsoever.

I encourage you to try it out and report any problems you find.

joanniclaborde · 2016-07-12T14:19:56Z

Nice!! We've been using those same settings for a few months now, and it's running fine. Good idea to turn that code into a module @damianmr !

Lwdthe1 · 2018-08-12T03:39:39Z

🔥 I had a terrible memory leak somewhere: it turned out to be another node module that my system depends on. I took it out and my memory usage went from an average of 8GB down to 4GB. That still meant that I had to pay for the $500 a month plan to keep my app from exceeding the memory limit of the lower plans. I looked everywhere and haven't been able to find why my app is still consuming so much memory.

💯 🙌 Then I came across this today and after using @damianmr's heroku-node-settings. Now I have more time to look into any potential leaks without having to fork over $500/month since I've downgrade to the 2X plan and can actually do >1 dynos! Thanks everyone for the tips that lead to the solution. Here's a pic that's worth sharing:

Strangely enough, my app's memory usage is showing as ~constant now rather than the exponential rise from before.

mscdex added the memory Issues and PRs related to the memory management or memory footprint. label Oct 14, 2015

ahdinosaur mentioned this issue Oct 16, 2015

seems to start lagging after being used for a while mmckegg/loop-drop-app#32

Closed

ErisDS mentioned this issue Oct 21, 2015

Support for newer versions of nodejs (4.0.0) TryGhost/Ghost#5821

Closed

nstepien mentioned this issue Nov 4, 2015

High memory usage nstepien/iltorb#3

Closed

spalger mentioned this issue Dec 8, 2015

Lets talk about v8's memory settings elastic/kibana#5595

Closed

hunterloftis mentioned this issue Jan 13, 2016

Alias node with flags for dyno-specific memory limits heroku/heroku-buildpack-nodejs#287

Closed

ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016

4kochi mentioned this issue Apr 13, 2016

Memory Leak litixsoft/log4js-node-mongodb#3

Closed

AndrewBarba closed this as completed Jul 12, 2016

jvrsgsty mentioned this issue Aug 30, 2016

Upgrade Node to 4.2 & memory management? cobyism/ghost-on-heroku#75

Closed

craigspaeth mentioned this issue Nov 30, 2016

@alloy: Use forever and max_old_space size to cap memory usage artsy/metaphysics#482

Merged

gregberge mentioned this issue Mar 11, 2017

chore: fix V8 memory on heroku argos-ci/argos#173

Merged

izakp mentioned this issue Mar 23, 2017

set node vm GC options artsy/force#1091

Merged

saschanaz mentioned this issue Dec 23, 2017

Use emsdk master branch emscripten-core/emscripten#5972

Merged

Shahor mentioned this issue May 4, 2018

On use_idle_notification usage nodejs/help#1250

Closed

Lwdthe1 mentioned this issue Aug 12, 2018

Constant max-old-size damianmr/heroku-node-settings#4

Closed

FoxxMD mentioned this issue Jul 20, 2022

Investigate increased memory usage FoxxMD/context-mod#90

Closed

Node 4.2 large memory spikes and timeouts on Heroku #3370

Node 4.2 large memory spikes and timeouts on Heroku #3370

Comments

AndrewBarba commented Oct 14, 2015

jasnell commented Oct 14, 2015

bnoordhuis commented Oct 14, 2015

AndrewBarba commented Oct 14, 2015

bnoordhuis commented Oct 14, 2015

AndrewBarba commented Oct 14, 2015

jasnell commented Oct 14, 2015

AndrewBarba commented Oct 14, 2015

bnoordhuis commented Oct 14, 2015

mscdex commented Oct 14, 2015

friism commented Oct 15, 2015

jbergstroem commented Oct 15, 2015

hunterloftis commented Oct 15, 2015

AndrewBarba commented Oct 15, 2015

hunterloftis commented Oct 15, 2015

trevnorris commented Oct 15, 2015

hunterloftis commented Oct 15, 2015

bnoordhuis commented Oct 17, 2015

claudiorodriguez commented Oct 20, 2015

AndrewBarba commented Oct 20, 2015

claudiorodriguez commented Oct 20, 2015

mike-zorn commented Oct 30, 2015

AndrewBarba commented Oct 31, 2015

mike-zorn commented Oct 31, 2015

piranna commented Nov 19, 2015

ChALkeR commented Nov 19, 2015

piranna commented Nov 19, 2015

ChALkeR commented Nov 19, 2015

piranna commented Nov 19, 2015

joanniclaborde commented Nov 20, 2015

piranna commented Dec 11, 2015

hunterloftis commented Dec 11, 2015

piranna commented Dec 11, 2015

joanniclaborde commented Dec 11, 2015

joanniclaborde commented Dec 11, 2015

piranna commented Dec 11, 2015

TylerBrock commented Jan 12, 2016

TylerBrock commented Jan 14, 2016

abienkowski commented Feb 3, 2016

bjfletcher commented May 3, 2016

davibe commented Jul 12, 2016

damianmr commented Jul 12, 2016 • edited Loading

joanniclaborde commented Jul 12, 2016

Lwdthe1 commented Aug 12, 2018 • edited Loading

damianmr commented Jul 12, 2016 •

edited

Loading

Lwdthe1 commented Aug 12, 2018 •

edited

Loading