The java buildpack in Cloud Foundry calculates the memory settings for a java process. It has a hard job because it only has one input (the container memory limit) and it needs to come up with at least 5 numbers for the JVM. To do this it uses a standalone memory calculator program. We downloaded the memory calculator and used it to drive some tests on memory usage in a Spring Boot application.
Here are the command line options generated by the default settings with some typical container memory limits:
Container | -Xmx (Heap) | -XX:MaxMetaspaceSize (Metaspace) | -Xss (Stack) |
---|---|---|---|
128m | 54613K | 64M | 568K |
256m | 160M | 64M | 853K |
512m | 382293K | 64M | 995K |
1g | 768M | 104857K | 1M |
Example command line:
$ java-buildpack-memory-calculator-linux -memorySizes='metaspace:64m..' -memoryWeights=heap:75,metaspace:10,native:10,stack:5 -memoryInitials=heap:100%,metaspace:100% -totMemory=128m
-Xmx54613K -XX:MaxMetaspaceSize=64M -Xss568K -Xms54613K -XX:MetaspaceSize=64M
By default the initial metaspace and heap are identical with the
maximum sizes (hence the -memoryInitials
in the command line). The
default arguments for the java-buildpack-memory-calculator-linux
come from a
config file in the buildpack.
The "native" value in the memory calculator is a margin for memory used by the process that isn't explicitly accounted for by metaspace or stack. Some of it will go to class loader and JIT caches at runtime and some of it will be untraceable from within the JVM (seems to be correlated with the amount of JAR data on the classpath).
For the impatient here is a quick summary of the analysis. Java applications can barely run in a container memory of 128m in Cloud Foundry, and loading JAR files is enough of a burden on memory to kill some apps that are on the boundary. By default Spring Boot apps are much happier with 512m or 1g of container memory even if they don't need it, but you can tweak the command line up to a point.
It would be nice if the default memory calculation would change, but until it does you have two things that might help: 1) shading the jar uses a less memory at runtime, but it only really matters for the small containers, and it makes it much less efficient to upload; 2) configuring the buildpack using a custom command to allow the app a bigger "native" margin.
The calculation of the stack memory is idiosyncratic and probably should be improved.
The Freemarker sample in Spring Boot is a decent representative of a small, but active Spring Boot application. It does server side rendering with Freemarker, so it's not just serving static content, but it doesn't need a lot of memory. Locally, it is more than happy to run in a heap of 32MB, and you can squeeze it down to 24MB before you start to see garbage collection affecting performance.
We ran that app in Cloud Foundry (PWS) with a range of different parameters, trying to see where it runs and where it crashes. When it crashes it is almost always because the container manager killed it for using too much memory. Garbage collection can contribute to slow startup times if you squeeze the heap, but it is never so slow that the platform kills it because of a timeout. Here's a memory-oriented summary of the experiments (MB unless stated otherwise):
Heap | Metaspace | Stack | Native | Limit | Start(sec) | Key |
---|---|---|---|---|---|---|
23 | 64 | 11 | 72 | 170 | 9 | A |
55 | 64 | 18 | -9 | 128 | - | B |
160 | 64 | 28 | 4 | 256 | - | C |
382 | 64 | 35 | 31 | 512 | 8 | D |
768 | 104 | 35 | 117 | 1024 | 4 | E |
53 | 64 | 11 | 128 | 256 | 3 | M |
The memory settings above are all are generated by the
java-buildpack-memory-calculator-linux
tool. It provides settings
for heap, metaspace and stack size, and in the table we convert the
latter to a total stack by assuming there are 35 threads. The "native"
value is the balance of the container limit that isn't explicitly
claimed by the JVM (so when it is small or negative the app may fail
to start, as in experiments "B" and "C").
Experiment "A" is the smallest viable settings we found using the build pack to calculate the memory (but tweaking the inputs) for the freemarker sample launched with the default Spring Boot tooling. In this experiment the "native" memory is explicitly constrained to force the buildpack to allow enough memory for off-heap usage in the JVM. Additionally the thread count is fixed to ensure that the stack size can be explicitly constrained (if you let it take its default values the stack sizes end up rather large). Experiment B has the default settings for a 128m container memory limit, and it is doomed to fail because it needs too much memory for its stack (the native memory margin is -9MB). Experiment C has the default settings for a 256m container limit. It also fails to start because it only has 4m left of "native" memory once all it's threads get going. Experiments D and E are the only ones that run successfully with the default buildpack memory settings (512m and 1g container limit respectively). Experiment D is pretty close to the bone with the native memory margin, and has over allocated stack space.
If you want to verify the memory settings for each experiment, the memory calculator can be run with parameters as follows:
Key | memoryWeights | memorySizes | stackThreads | Notes |
---|---|---|---|---|
A | heap:40,metaspace:10,native:10,stack:20 | metaspace:64..,native:72m.. | 35 | Smallest viable |
B | heap:75,metaspace:10,native:10,stack:5 | metaspace:64.. | Default for 128m | |
C | heap:75,metaspace:10,native:10,stack:5 | metaspace:64.. | Default for 256m | |
D | heap:75,metaspace:10,native:10,stack:5 | metaspace:64.. | Default for 512m | |
E | heap:75,metaspace:10,native:10,stack:5 | metaspace:64.. | Default for 1024m | |
M | heap:40,metaspace:10,native:10,stack:10 | metaspace:64..,native:128m | 35 | Comfortable 256m |
The "Stack" values in the summary are calculated by assuming there
will be 35 threads in a running app under load, and that each one uses
an amount of memory specified in the -Xss
argumemt to the JVM. If
the number of threads sounds high, bear in mind that 14 of those
threads are started by the JVM itself and are nothing to do with the
app. The -Xss
values are in turn calculated by the buildpack, and
vary quite widely from about 300K to about 1M. There is no evidence
that this serves any purpose, and indeed the stack size needed by an
app depends more on the libraries and languages it uses than the size
of overall memory consumption, e.g. Groovy uses large stacks. A stack
of 256K would have been perfectly adequate for this application, and
it would have been useful to be able to configure it that way,
independent of the container memory limit.
Spring Boot uses a generic main class called JarLauncher
which deals
with nested jars on the classpath, and there is some overhead
associated with that, mainly in heap usage, but also a little bit of
extra time to process the archives. If heap is scarce then it can slow
things down dramatically, but if there is plenty of heap available on
startup it will perform quite well. In Cloud Foundry the archive is
exploded so we might think about using a different main class and see
whether that can help speed up the start up at all.
The default command for a Spring Boot app is this:
$PWD/.java-buildpack/open_jdk_jre/bin/java -cp $PWD/.:$PWD/.java-buildpack/spring_auto_reconfiguration/spring_auto_reconfiguration-1.10.0_RELEASE.jar -Djava.io.tmpdir=$TMPDIR -XX:OnOutOfMemoryError=$PWD/.java-buildpack/open_jdk_jre/bin/killjava.sh $CALCULATED_MEMORY -Djava.security.egd=file:/dev/./urandom -verbose:gc org.springframework.boot.loader.JarLauncher
where $CALCULATED_MEMORY
is the result of the
java-buildpack-memory-calculator-linux
command with default
parameters as listed above (and documented in the buildpack).
If we can use a custom command we need to fix the memory explicitly,
and at the same time we can run the main class directly instead of
through the indirect (and slightly memory hungry) JarLauncher
:
$PWD/.java-buildpack/open_jdk_jre/bin/java -cp $PWD/.:$PWD/lib/*:$PWD/.java-buildpack/spring_auto_reconfiguration/spring_auto_reconfiguration-1.10.0_RELEASE.jar -Djava.io.tmpdir=$TMPDIR -XX:OnOutOfMemoryError=$PWD/.java-buildpack/open_jdk_jre/bin/killjava.sh -XX:MaxMetaspaceSize=64M -Xss568K -Xmx54613K -Xms54613K -XX:MetaspaceSize=64M -Djava.security.egd=file:/dev/./urandom -verbose:gc sample.freemarker.SampleWebFreeMarkerApplication
This works with 256m (startup 3s) and fails with a 128m container limit, which is unsurprising given what we know about the native memory margin.
There are some things you can do to squeeze the app into 128m. For a
start you can use initial values for heap and metaspace that are less
than the maximum. We also found that some of the more obscure JVM
flags do help: i.e. -XX:CompressedClassSpaceSize
and
-XX:ReservedCodeCacheSize
, so this works in 128m even with the
default JarLauncher
:
-XX:MetaspaceSize=20M -XX:MaxMetaspaceSize=38M -Xss256K -Xms16M -Xmx32M -XX:CompressedClassSpaceSize=8M -XX:ReservedCodeCacheSize=4M
An other option which might avoid the overhead of JarLauncher
and
doesn't require a custom command would be to use the
PropertiesLauncher
(documented in the Spring Boot user guide). It
still has to read the JAR files though, unless you use a customized
assembly, so it is unlikely in practice to be much of a help.
Shading is a (technically poor) alternative to the Spring Boot tooling for creating executable jars, merging all the dependencies into a common root directory. It results in very slow uploads to Cloud Foundry because of the way the CLI interacts with the platform, but it doesn't require the JVM to open any JAR files, so it might give better memory performance on startup.
NOTE: Shaded versions of Spring Boot jars are easy to make using Maven if you use the
spring-boot-starter-parent
. You need to add a<start-class/>
property to point to the main class, and swap themaven-shade-plugin
for thespring-boot-maven-plugin
.
This one started in 128m and then crashed under load:
-XX:MaxMetaspaceSize=40M -Xss256K -Xms16M -Xmx24M -XX:MetaspaceSize=20M
It bounces on startup, and interestingly has already passed a health check when it is killed with "out of memory":
...
2016-01-07T07:51:07.25+0000 [APP/0] OUT 2016-01-07 07:51:07.252 INFO 11 --- [ main] s.f.SampleWebFreeMarkerApplication : Started SampleWebFreeMarkerApplication in 7.234 seconds (JVM running for 7.573)
2016-01-07T07:51:07.28+0000 [HEALTH/0] OUT healthcheck passed
2016-01-07T07:51:07.29+0000 [HEALTH/0] OUT Exit status 0
2016-01-07T07:51:07.29+0000 [CELL/0] OUT Container became healthy
2016-01-07T07:51:09.67+0000 [CELL/0] OUT Exit status 255
2016-01-07T07:51:10.03+0000 [APP/0] OUT Exit status 255
2016-01-07T07:51:10.08+0000 [API/4] OUT App instance exited with guid cfa61cd1-348f-434c-8e96-6de4a5e89b63 payload: {"instance"=>"00b640b3-3777-4d16-5dea-06f8ceddc36e", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"2 error(s) occurred:\n\n* 2 error(s) occurred:\n\n* Exited with status 255 (out of memory)\n* cancelled\n* cancelled", "crash_count"=>1, "crash_timestamp"=>1452153070046693621, "version"=>"20e712ca-d7ca-4be2-9adc-3810458d1fdc"}
...
it tries again (without re-staging) and starts more quickly:
...
2016-01-07T07:51:20.41+0000 [APP/0] OUT 2016-01-07 07:51:20.418 INFO 14 --- [ main] s.f.SampleWebFreeMarkerApplication : Started SampleWebFreeMarkerApplication in 3.242 seconds (JVM running for 3.623)
2016-01-07T07:51:20.56+0000 [HEALTH/0] OUT healthcheck passed
2016-01-07T07:51:20.58+0000 [HEALTH/0] OUT Exit status 0
2016-01-07T07:51:20.58+0000 [CELL/0] OUT Container became healthy
...
ASIDE: One hypothesis we came up with for this behaviour is that the file system is buffering JAR files as they are read, and although that memory is not needed by the process once the classes are loaded, it is not always returned to the OS promptly. In fact whether or not it is returned is essentially random and depends on the total load on the host, and not anything that you can control from within the container.
This one was the same but continued to run under load:
-XX:MaxMetaspaceSize=32M -Xss256K -Xms16M -Xmx24M -XX:MetaspaceSize=20M
The unshaded jar runs with these parameters in a 140m container but
not 128m. Furthermore -Dsun.zip.disableMemoryMapping=true
didn't
help, neither did MALLOC_ARENA_MAX: 4
. It also runs in 128m with the
additional -XX
parameters already listed above, i.e.
-XX:MetaspaceSize=20M -XX:MaxMetaspaceSize=38M -Xss256K -Xms16M -Xmx32M -XX:CompressedClassSpaceSize=8M -XX:ReservedCodeCacheSize=4M
A final data point: the shaded jar also runs fine with a heap of 70M
(as long as the -Xms
is set lower).
We can attempt to simulate the behaviour of the container using
ulimit
(a bash primitive that works in Linux and OSX). Example:
$ ulimit -m 128000
$ java -XX:MetaspaceSize=20M -XX:MaxMetaspaceSize=32M -Xss256K -Xms16M -Xmx32M -XX:CompressedClassSpaceSize=8M -XX:ReservedCodeCacheSize=4M -jar target/spring-boot-test-web-freemarker-1.3.2.BUILD-SNAPSHOT.jar
We had some success with that, but in the end it seems to be more
lenient than the container in Cloud Foundry, so not a really good
simulation. (Using ulimit -v
causes the app to fail immediately
because apparently you can't stop the JVM from requesting virtual
memory at that level.)
As another experiment, we added some (unused dependencies to the vanilla 256m app): spring-boot-starter-data-jpa, spring-cloud-starter-feign, spring-cloud-starter-stream-rabbit, h2. The jar goes up from 15MB to 42MB without any additional threads, although unfortunately there are quite a lot more classes loaded. Predictably it failed to start in PWS and it started locally (but took 29 seconds). Here's a summary of the local startup times of the two jars:
Non-heap | Classes | Startup Time (java -jar) | Startup Time (spring-boot:run) |
---|---|---|---|
72M | 9200 | 29s | 6s |
45M | 5800 | 3s | 3s |
The slow startup of this larger jar suggests that JarLauncher
could
be a target for optimization.
Ratpack uses fewer threads by default (it has additional thread pools that can be called on if needed), and starts up very quickly. The standard ratpack sample from Github is very minimal. It does start up successfully in Cloud Foundry with 128m container, and even serves HTTP requests under load. This is quite hard to explain given that it has 24 threads (measured locally), and so, even though it is using less memory than a Spring Boot Tomcat app, it should need more than the 128m available. It doesn't run in 64m. Here are the numbers:
Heap | Metaspace | Stack | Native | Total | Limit | Start(sec) | Key |
---|---|---|---|---|---|---|---|
55 | 64 | 12 | -3 | 131 | 128 | 2 | R |
The JAR in the vanilla Ratpack sample is shaded. A shaded Spring Boot
app with similar features using @EnableRatpack
starts as well and
behaves in a similar way, but a non-shaded version (with or without
JarLauncher
) fails to start. We conclude that reading JAR files uses
memory that the shaded app doesn't, even if you don't use Spring Boot
tooling, but the effect is only important in small containers. (The
shaded Boot app bounced a bit as it was starting up, consistent with
it being right on the limit of the container memory, but eventually
settled down to run smoothly.)
What would work better would be to set the stack size based on the workload and/or make it easy to fix with a single environment variable. It might or might not make sense to expand the metaspace with the container size: really it depends more on the number of classes loaded than anything else, which correlates with the size of the archive uploaded, not so much on the memory.
Here are some memory settings that have been verified to actually work in Cloud Foundry with the freemarker sample (except the smallest), assuming 36 threads:
Container | -Xmx (Heap) | -XX:MaxMetaspaceSize (Metaspace) | -Xss (Stack) | Native | Startup(sec) |
---|---|---|---|---|---|
128m | 32M | 55M | 256K | 32 | - |
256m | 140M | 60M | 256K | 47 | 4 |
512m | 384K | 64M | 256K | 55 | 4 |
1g | 800M | 104M | 256K | 111 | 5 |
The smallest of these (128m) has very little chance of running a Spring Boot app with the default tooling, but might give you a fighting chance if you are prepared to tinker with the build (e.g. use shading).
To accommodate larger apps, we have to ask what a larger app would be doing that would need more memory. Most non-heap memory usage can be accounted for by a simple model involving threads and loaded classes.
memory = heap + non-heap
non-heap = threads x stack + classes x 7/1000
Adding more threads is quite common in enterprise-grade Java apps, and that costs memory which should be taken away from the heap if we want to keep the total constant. The number of application threads is something that the developer probably ideally needs to specify. Another common driver with larger apps would be to load more classes, and this is easier to predict based on the size of the archive (usually it is ddirectly correlated unless the developer makes a mistake building the archive).
In summary: it would be useful if the knobs that were available to modify JVM memory were more aligned with threads and classes, rather than the more abstract inputs that we have today. The classes input value could be guessed by the build pack by measuring the size of the archive (assuming it is known).
A rule of thumb would be 400 classes per MB of application. We could also make a rough guess for threads, given that bigger applications (in terms of archive bytes) probably need more threads. Here's a suggestion:
classes = archive(MB) * 400
threads = 15 + archive(MB) * 6 / 10
Metaspace isn't the whole of the non-heap memory, but it probably scales with it (and probably the balance is proportional to the archive size). This all leads to a guess of
Archive | Jar Type | Container | -Xmx (Heap) | -XX:MaxMetaspaceSize (Metaspace) | -Xss (Stack) | Native Buffer |
---|---|---|---|---|---|---|
A | J | L | L - N - B | M | S:256K | B |
Where N = (T + A * 60%)*S + A * 280%
is an estimator for the
non-heap memory, based on the archive size (A
), the number of
threads (T
) and the stack size (S
). The default value of S is
256K, but users might want to bump it if they know they use a non-Java
language or a lot of layers of proxies. The metaspace M
is roughly
M = N - 80% * A
, and we also re-assign the 80% * A
to other cache
settings below.
Finally, B
is a native buffer (memory that we empirically see being
needed in the container but it is hard to account for in JVM
metrics). We generally seem to need less buffer for shaded jars than
not and we believe the size is related to the files being read by the
classloader. Thus a candidate rule for B
is:
Jar Type | Native Buffer (B) |
---|---|
Shaded | 22 + 80% * A |
Nested | 22 + 180% * A |
We do not recommend starting the JVM with initial values equal to the
max (e.g. -Xms=-Xmx
) because an app often seems to need a bit of
extra memory to get it started and every little helps. Initial values
of -Xms=16M
and -XX:MetaspaceSize=20M
seems to work fine, and
least for smaller containers (maybe they should scale with the max
values just as in the existing calculation).
Here are some example values calculated using the formula above:
Archive | Jar Type | Threads | Container | -Xmx (Heap) | -XX:MaxMetaspaceSize (Metaspace) | -Xss (Stack) | Native Buffer |
---|---|---|---|---|---|---|---|
15 | Nested | 35 | 128 | 28.25 | 38.75 | 256 | 49 |
15 | Nested | 35 | 256 | 156.25 | 38.75 | 256 | 49 |
15 | Nested | 35 | 512 | 412.25 | 38.75 | 256 | 49 |
15 | Nested | 35 | 1024 | 924.25 | 38.75 | 256 | 49 |
15 | Shaded | 35 | 128 | 43.25 | 38.75 | 256 | 34 |
15 | Shaded | 35 | 256 | 171.25 | 38.75 | 256 | 34 |
15 | Shaded | 35 | 512 | 427.25 | 38.75 | 256 | 34 |
15 | Shaded | 35 | 1024 | 939.25 | 38.75 | 256 | 34 |
42 | Nested | 45 | 128 | -98.45 | 95.25 | 256 | 97.6 |
42 | Nested | 45 | 256 | 29.55 | 95.25 | 256 | 97.6 |
42 | Nested | 45 | 512 | 285.55 | 95.25 | 256 | 97.6 |
42 | Nested | 45 | 1024 | 797.55 | 95.25 | 256 | 97.6 |
42 | Shaded | 45 | 128 | -56.45 | 95.25 | 256 | 55.6 |
42 | Shaded | 45 | 256 | 71.55 | 95.25 | 256 | 55.6 |
42 | Shaded | 45 | 512 | 327.55 | 95.25 | 256 | 55.6 |
42 | Shaded | 45 | 1024 | 839.55 | 95.25 | 256 | 55.6 |
We think that adding -XX:CompressedClassSpaceSize
,
-XX:ReservedCodeCacheSize
can also be quite useful. We haven't
studied how they might scale with A
, but probably they are
proportional (e.g. they could be represented already by the 80% * A
composing all or part of the empirical B
). Setting
-XX:+DisableAttachMechanism
will also save a thread or two so
occasionally worth a try in a small container.
Note that the number of threads T
tends to scale with A
(because
you add more libraries and they all want more threads), so we can
extrapolate the 2 data points we have and suppose that (approximately)
T = 24 + 30% * A
.
Multiplying everything out we have, in terms of inputs L
(container
limit), A
(archive size), S
(stack size) and J
(jar type, 0 for
shaded 1 otherwise):
Name | Value |
---|---|
-Xss | S:256K |
-Xmx | L - 28 - (24 + 90% * A) *S - (J + 360%) * A |
-XX:MetaspaceSize | (24 + 90% * A) * S + 200% * A |
-XX:CompressedClassSpaceSize | 55% * A |
-XX:ReservedCodeCacheSize | 25% * A |
To get back to a model that only has one input (L
) we can also make
some additional assumptions. If we don't know what the value of A
is
(even though it's easy to measure) we could assume that it also scales
with L
- people want more memory for bigger apps. We can also guess
S
if we don't know better and say that it should scale linearly
between 256K and 1M for containers in 128m to 1g. So we'll go with A = 12% * L
and S = 256K + 768K * min(1,max(L-128,0)/896)
. Here's the
result:
Archive | Jar Type | Threads | Container | -Xmx (Heap) | -XX:MaxMetaspaceSize (Metaspace) | -Xss (Stack) | Native Buffer |
---|---|---|---|---|---|---|---|
8 | Nested | 26 | 64 | 16 | 24 | 256 | 36 |
15 | Nested | 29 | 128 | 26 | 40 | 256 | 50 |
31 | Nested | 33 | 256 | 74 | 80 | 366 | 77 |
61 | Nested | 42 | 512 | 162 | 168 | 585 | 133 |
64 | Nested | 43 | 1024 | 626 | 210 | 1024 | 137 |
64 | Nested | 43 | 2048 | 1650 | 210 | 1024 | 137 |
8 | Shaded | 26 | 64 | 16 | 24 | 256 | 28 |
15 | Shaded | 29 | 128 | 41 | 40 | 256 | 34 |
31 | Shaded | 33 | 256 | 105 | 80 | 366 | 47 |
61 | Shaded | 42 | 512 | 224 | 168 | 585 | 71 |
64 | Shaded | 43 | 1024 | 690 | 210 | 1024 | 73 |
64 | Shaded | 43 | 2048 | 1714 | 210 | 1024 | 73 |
(We added caps on the estimated values of A
from both sides, min 8
and max 64.)