Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set MaxDirectMemorySize, reconsider Xmx default size #4948

Closed
3 tasks
keturn opened this issue Nov 11, 2021 · 23 comments · Fixed by #5025
Closed
3 tasks

set MaxDirectMemorySize, reconsider Xmx default size #4948

keturn opened this issue Nov 11, 2021 · 23 comments · Fixed by #5025
Labels
Category: Performance Requests, Issues and Changes targeting performance Good First Issue Good for learners that are new to Terasology Size: S Small effort likely only affecting a single area and requiring little to no research Type: Improvement Request for or addition/enhancement of a feature

Comments

@keturn
Copy link
Member

keturn commented Nov 11, 2021

I discovered that setting the Java option -XX:MaxDirectMemorySize is effective at limiting the game's seemingly limitless hunger for memory outside the Java heap. I don't know what heuristic it uses by default, but my processes were regularly multiple gigabytes larger than Xmx led me to believe they should be. (As measured by looking at the process in Linux and noting its Resident Set Size or data+stack size as in #4946).

A value of -XX:MaxDirectMemorySize=512M was sufficient to run a single-player game of Metal Renegades with “Moderate” view distance (plus 6 LOD).

Looking at how much of the process's memory is not in the Java heap also made me reconsider what value we use for that by default. We currently give it a whopping three gigabytes in development workspaces:

const val DEFAULT_MAX_HEAP_SIZE = "3G"

—but that single-player MR game seemed content with only a quarter of that (768M).

Places to adjust this:

  • build-logic/exec.kt
    • change DEFAULT_MAX_HEAP_SIZE
    • add an @Option to set MaxDirectMemorySize, provide a default
  • IntelliJ run configs?
  • Launcher. If I recall correctly, it has UI in settings for Xmx. I don't remember if it sets it by default. I think it also allows you to set additional arbitrary java arguments, but I think it would be worth it to provide explicit support for this one so we don't rely on users remembering to add it.

Considerations:

Why did we set DEFAULT_MAX_HEAP_SIZE so high? (I think that change was within the last year or two.)

@keturn keturn added Category: Performance Requests, Issues and Changes targeting performance Good First Issue Good for learners that are new to Terasology Type: Improvement Request for or addition/enhancement of a feature Size: S Small effort likely only affecting a single area and requiring little to no research labels Nov 11, 2021
@pollend
Copy link
Member

pollend commented Nov 11, 2021

MaxDirectMemorySize is for the memory allocated for NIO related to serilization/deserilization and network communication. maybe 1G should be a good safe value?

@DarkWeird
Copy link
Contributor

MaxDirectMemorySize is for the memory allocated for NIO related to serilization/deserilization and network communication. maybe 1G should be a good safe value?

We also use directbuffers for passing parameters to lwjgl exactly. Bullet - unknown

@keturn
Copy link
Member Author

keturn commented Nov 11, 2021

Yeah, one of the things I haven't sorted out yet is which things do or don't use that type of Direct Memory buffer. Our documentation (or error-handling) for this should include instructions for how to tell if one of these limits is set too low.

For example, when I set MaxDirectMemorySize too low, I get this error when trying to create a Metal Renegades game:

java.lang.OutOfMemoryError: Direct buffer memory
	at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
	at org.terasology.engine.world.block.tiles.WorldAtlasImpl.createAtlasMipmaps(WorldAtlasImpl.java:276)
	at org.terasology.engine.world.block.tiles.WorldAtlasImpl.buildAtlas(WorldAtlasImpl.java:206)
	at org.terasology.engine.world.block.tiles.WorldAtlasImpl.<init>(WorldAtlasImpl.java:76)
	at org.terasology.engine.core.modes.loadProcesses.RegisterBlocks.step(RegisterBlocks.java:42)

But if I set Xmx too low, I get things like this:

13:22:47.517 [Chunk-Processing-Reactor] ERROR o.t.e.w.c.p.ChunkProcessingPipeline - ChunkTask at position ( 1.000E+0  1.000E+0 -3.000E+0) and stage [Chunk generate internal lightning] catch error: 
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.terasology.engine.world.chunks.pipeline.PositionFuture.get(PositionFuture.java:49)
	at org.terasology.engine.world.chunks.pipeline.ChunkProcessingPipeline.onStageDone(ChunkProcessingPipeline.java:118)
	at org.terasology.engine.world.chunks.pipeline.ChunkProcessingPipeline.chunkTaskHandler(ChunkProcessingPipeline.java:105)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space

(and, as that happened in another thread, the engine is still trying to chug along but isn't having a very good time of it.)

@keturn
Copy link
Member Author

keturn commented Nov 11, 2021

As for defaults, I hope we can work with a default that's lower than a gigabyte! Given how low-res our textures are, a gigabyte is a huge amount of RAM.

But yes, let's pick a default that seems to work for things as they are currently, and we can talk about optimization separately outside of this ticket.

@pollend
Copy link
Member

pollend commented Nov 13, 2021

oh so that is all buffers declared with DirectByteBuffer. so when you allocate a DirectByteBuffer its a buffer that backs memory in xmx. that is all communication for networking, and any vertex data for meshes and I also believe for texture.

@keturn
Copy link
Member Author

keturn commented Nov 13, 2021

@skaldarnar
Copy link
Member

I'm happy to have a look at any PRs that come up related to this.

As the two open launcher PRs MovingBlocks/TerasologyLauncher#666 and MovingBlocks/TerasologyLauncher#667 touch the way it handles settings we may want to get them out of the way before we add new settings/features.

@jdrueckert
Copy link
Member

Finally got around to read this. First of all:

Why did we set DEFAULT_MAX_HEAP_SIZE so high? (I think that change was within the last year or two.)

Seems like we used to use 1.5G, but changed it to 3G in #4365 with the following reasoning:

Increases default memory when running from Gradle - flying around in settings like Metal Renegades does not appreciate being memory starved. Goes beyond the 32-bit JVM capacity, but we honestly could have probably done that years ago, playing the game on 32-bit is one thing, developing it a whole other thing.


Second, here's a few questions I have:

  1. On one hand you say

    setting the Java option -XX:MaxDirectMemorySize is effective at limiting the game's seemingly limitless hunger for memory outside the Java heap

    which sounds to me as if with this the game only takes as much as it can and then frees things up or some other way manages to not exceed it.
    On the other hand you say

    when I set MaxDirectMemorySize too low, I get this error

    which sounds to me as if instead of using more memory, we just crash 🤔
    Which one is it?

  2. A value of -XX:MaxDirectMemorySize=512M was sufficient to run a single-player game of Metal Renegades with “Moderate” view distance (plus 6 LOD).

    What happens if we use more LOD or a higher view distance? Do we crash? If so, do we need to set this to a value sufficient for the maximum settings we support or is it possible to change this for a running JVM when the settings are changed?

  3. If we currently don't limit it at all and see the game use multiple GB, wouldn't a limit 1G already be a good improvement and a safer one if we're not sure yet how much all the gameplay and settings combinations actually require? Alternatively, should we decide on a value and start a testing series with different gameplays, settings, single-vs-multiplayer scenarios etc first to get a bit of trust behind the value we choose?

  4. How can we enumerate the things that use the direct memory? Would that even help us with estimating the required amount? If vertex data for meshes is stored there, will we choke on setting this too small as soon as we add a more complex 3D model into the game at some point?

@skaldarnar
Copy link
Member

What happens if we use more LOD or a higher view distance? Do we crash? If so, do we need to set this to a value sufficient for the maximum settings we support or is it possible to change this for a running JVM when the settings are changed?

I'd say that the default memory settings should be sufficient to play our (supported) game modes with the default in-game settings.

I would neither optimize for minimal nor maximal settings (either too restrictive, or too generous).

We may want to add some hint in the menu that changing the view distance may require more memory. Depending on the exact behavior this may or may not lead to a (severe) crash. In case we have the chance to handle the out-of-memory error we can probably add a helpful message for the user that they might want to adjust the max memory setting if they are think their machine can handle it (exact wording of this "recommendation" is hard, though).

How can we enumerate the things that use the direct memory? Would that even help us with estimating the required amount? If vertex data for meshes is stored there, will we choke on setting this too small as soon as we add a more complex 3D model into the game at some point?

👍 +1 for this question.

@keturn
Copy link
Member Author

keturn commented Dec 19, 2021

Ah, good questions!

  1. Some of each.

The trouble comes from memory allocations with short lifetimes. Maybe very short, like something needed to render a single frame, or maybe a little less short, like a chunk that is loaded for a while but then should be un-loaded after the game has moved on.

A memory allocator could re-use the space it assigned to old objects in old frames, but reclaiming memory takes work, so if it thinks it still has plenty of fresh memory available I guess it prefers to use that. Giving it an explicit limit forces it to do re-use earlier.

But asking it to re-use old memory is only successful if there actually is discarded memory space available to reclaim. The limit must still be high enough for all the data it needs for that frame, or to load all the chunks necessary for a system's logic, or whatever. If it's not enough to fit all the objects that need to be live at once, then 💥

  1. […] Do we crash?

Yes.

is it possible to change this for a running JVM when the settings are changed?

I don't think this is a setting we can change at runtime. It'll have to be a "change settings and restart" kind of thing.

  1. […] wouldn't a limit 1G already be a good improvement and a safer one?

I think this becomes a "safer for who?" question. Lower values are safer for players on hardware with less RAM, or environments where less RAM is available for Terasology (b/c they're also running a browser and Discord and IntelliJ and gradle and Blender and whatever), game modes that don't have logic that spans many chunks or entities, or have a small play area (like the L&S arena).

High values are safer for games with high render distances and a lot of stuff going on in-game, and in settings where the computer isn't being asked to do so much multi-tasking.

  1. How can we enumerate the things that use the direct memory?

The answer to this begins with "look for all the code that imports java.nio.Buffer classes (including ByteBuffer, FloatBuffer, etc)."

I don't recommend getting in to that too deeply just yet. We'll likely want to do an optimization pass. Maybe we find more memory-efficient ways to do some things, but maybe they shift to a different memory allocation strategy, or they change when we upgrade to lwjgl 3.3 — oh, that was actually released as stable last month! That'll be good information to have, but I don't think it's necessary for the scope of this ticket.

@jdrueckert
Copy link
Member

Okay, thank you for answering the questions and bringing some more light into this, @keturn
So from what I take, the open question is "What should our default value be?"

I think @skaldarnar's comments

the default memory settings should be sufficient to play our (supported) game modes with the default in-game settings.

and

We may want to add some hint in the menu that changing the view distance may require more memory.

make sense here. The default JVM configuration should support the default game configuration. And if we restrict this to a (potentially low) default, issues resulting from this should be handled properly and it should be made clear to the user, what to do to improve the situation.

So to find a suitable default, I guess we want to measure what's sufficient to play all our top 3 gameplays (+ core gameplay, although I think that's pretty much "included" in JS) with default in-game settings, correct?

@keturn How did you measure / test this? Experiential, by specify a value and then see the game doesn't crash or using some kind of tooling?

@keturn
Copy link
Member Author

keturn commented Dec 21, 2021

Yeah, my preliminary testing was not too formal. Start a new MR game, find a city, hang around long enough for it to grow a bit and have things spawn, etc.

“See that the game doesn't crash” was the main assessment tool, but watching Java heap size and the process data size in JMC can give some indication of how much wiggle room there is.

@jdrueckert
Copy link
Member

Okay, then to formalize this at least a tad, I'd like to brainstorm on things to test that give us some trust in that the default is a good one and doesn't blow up in our faces as soon as the first person actually tries to play one of the gameplays for more than a few minutes.

@keturn
Copy link
Member Author

keturn commented Dec 21, 2021

And as a reminder, we are talking about choosing defaults for two values: -Xmx (bringing it down from its current 3G) and -XX:MaxDirectMemorySize (which we currently do not set at all so the jvm picks something based on some heuristic we don't understand).

@jdrueckert
Copy link
Member

Alright then, commencing brainstorming for first testing setup:

  • default in-game settings (delete config.cfg before start)
  • -Xmx 768M
  • -XX:MaxDirectMemorySize 512M
  • CoreGameplay: use different guns to destroy blocks
  • JS: walk around the world and interact with it
  • MR: find city, let it grow
  • LaS: play a bit
  • test this in both singleplayer and multiplayer

Tbh, for JS (other than walking around triggering chunk gen) and especially LaS I'm currently not so sure what memory-intense things to do would be 🤔 Do you have any ideas there @keturn @skaldarnar ?

@keturn
Copy link
Member Author

keturn commented Jan 18, 2022

I think that's fine.

Things we should take note of include:

  • number of concurrent players
  • server uptime
  • how many chunks generated (or some approximation of that, like distance traveled)

I don't think we need to be super rigorous in this pass, but it'll be handy if we're later able to look back at our notes to answer "hey, did we test multiplayer with 2 people or 12?"

@jdrueckert
Copy link
Member

Tbh, for JS (other than walking around triggering chunk gen) and especially LaS I'm currently not so sure what memory-intense things to do would be thinking Do you have any ideas there @keturn @skaldarnar ?

Had a quick offline chat with skal about this and he mentioned

  • testing LaS with a lot of players (20+) - might be a good candidate for our next play test
  • spawning a lot of deer and sheep in JS (50+)

@jdrueckert
Copy link
Member

I started testing today and while the max heap configuration works properly, I'm having doubts about the MaxDirectMemorySize part... Irrespective of whether I set it in my gradle.properties or via jvmArgs("-XX:MaxDirectMemorySize=512M") in exec.kt or both, the Terasology java process' memory reported by my system monitor reports the same 2,1G 🤔

I searched for a way to extend our memory debugging information with the runtime value of MaxDirectMemorySize to make sure I configured it correctly. Unfortunately it does not really seam possible to get this value without some hacky workarounds.
@keturn Do you have any idea how to confirm the setting is actually taken into account?

@keturn
Copy link
Member Author

keturn commented May 7, 2022

I guess I'm not too surprised to hear it requires hacky workarounds to access at runtime, given that it's an implementation-specific flag set with -XX.

It looks like one way to check is to add the -XX:+PrintCommandLineFlags flag. With that I get:

Task :facades:PC:game
-XX:G1ConcRefinementThreads=8 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=261478272 -XX:+ManagementServer -XX:MaxDirectMemorySize=536870912 -XX:MaxHeapSize=805306368 -XX:+PrintCommandLineFlags -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
11:51:52.027 [main] INFO org.terasology.engine.Terasology - homeDir is .

@jdrueckert
Copy link
Member

Sooo, did some singleplayer testing today (see #5025 for the utilized settings).

1. Test: CoreGameplay

  • tested block entity changes and chunk generation
  • traveled 4 gun shots far through underground (~20 chunks)
  • bit of lag while traveling through gunshot tunnel
  • not much lag directly when shooting
  • memory: 2.6G
  • uptime: 8 minutes

2. Test: JoshariasSurvival

  • tested spawning a lot of entities with pathfinding
  • spawned 50 deer, then 50 sheep (not moving due to bug), then another 50 deer
  • both times lag while spawning deer (starting at 30), but not while spawning sheep
  • not much lag after first 50 deer
  • a lot of lag after second 50 deer (0.8 FPS)
  • memory: 2.9G
  • uptime: 12 minutes

3. Test: MetalRenegades

  • traversed ~20 chunks on search for city (max 17 in one direction, circled back, ended up at (-6, 0, -3)
  • memory: 2.5G (uptime: 6 minutes)
  • memory: 2.8G (uptime: 12 minutes), stayed at 2.8G for rest of 30 minutes uptime

@jdrueckert
Copy link
Member

From what I saw, memory consumption never went above 3G according to my resource monitor. Would be interesting to see how it behaves in multiplayer testing on the server.
But considering that there were no crashes, I'd argue that we can try the memory restriction and if we receive error reports, we can still increase it...? @keturn

@keturn
Copy link
Member Author

keturn commented May 29, 2022

It looks like one way to check is to add the -XX:+PrintCommandLineFlags flag.

I'd been wondering if we could use this to get any insight in to what value it uses when we don't set it explicitly.

The answer is: nope. PrintCommandLineFlags does show numbers for some things we weren't explicit about (e.g. ReservedCodeCacheSize), but when there is no explicit MaxDirectMemorySize, it's not printed among the flags at all.

Checking -XX:+PrintFlagsFinal is similar: when we don't set it it's reported as 0 {default}.

(This is with Temurin OpenJDK 11.0.15.)

@jdrueckert
Copy link
Member

Ran the same tests without the changes introduced in #5025 and that's the results:

    1. Test - already at 4.3G after 5 minutes
    1. Test - already at 5.5G after spawning the first 50 deer, towards the 40th sheep, game crashed with an OpenAL error (for details see log below), probably more or less unrelated
    1. Test
    • 4.9G after 5 minutes
    • 5.2G after 12 minutes
    • 5.5G after 20 minutes
    • 6.0G after 30 minutes

OpenAL Error Logs (just fyi)

org.terasology.engine.audio.openAL.OpenALException: OpenAL Error (40963) at Buffer unqueue - Invalid Value
        at org.terasology.engine.audio.openAL.streamingSound.OpenALStreamingSoundSource.update(OpenALStreamingSoundSource.java:67)
        at org.terasology.engine.audio.openAL.BaseSoundPool.lambda$update$1(BaseSoundPool.java:101)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1603)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
        at org.terasology.engine.audio.openAL.BaseSoundPool.update(BaseSoundPool.java:101)
        at org.terasology.engine.audio.openAL.OpenALManager.update(OpenALManager.java:249)
        at org.terasology.engine.core.subsystem.lwjgl.LwjglAudio.postUpdate(LwjglAudio.java:49)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Performance Requests, Issues and Changes targeting performance Good First Issue Good for learners that are new to Terasology Size: S Small effort likely only affecting a single area and requiring little to no research Type: Improvement Request for or addition/enhancement of a feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants