Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Processor Count uses total CPUs #136

Closed
dmikusa opened this issue Jan 8, 2022 · 32 comments · Fixed by #388
Closed

Active Processor Count uses total CPUs #136

dmikusa opened this issue Jan 8, 2022 · 32 comments · Fixed by #388
Labels
hacktoberfest Hacktoberfest eligible note:ideal-for-contribution An issue that a contributor can help us with type:enhancement A general enhancement

Comments

@dmikusa
Copy link
Contributor

dmikusa commented Jan 8, 2022

What happened?

The ActiveProcessorCount helper's results can be confusing.

  • What were you attempting to do?

In general, I want to limit the CPU resources using Kubernetes or Docker for my application and I'd like the application to automatically adjust to these changes. The Java buildpack has the ActiveProcessorCount helper to do this.

If you limit the number of CPUs available to a container using Kubernetes limits or docker run --cpus=X the active processor count code will pull the total CPUs, not the limited number of CPUs. What!?

This happens because these methods of limiting CPU resources are based on CPU shares, so you do technically have access to all of the CPUs on the machine. You just have a limited quantity of time on the CPUs relative to all of the other processes running at the time.

This can sometimes cause performance problems because assumptions can be made that if you have 8 CPUs you have full access to 8 CPUs.

  • What did you expect to happen?

It would be nice if there were a way for the processor count to scale down based on CPU limits/shares in a similar way to how it will scale down if you use CPU affinity and bind the process to only a subset of the CPUs.

  • What was the actual behavior? Please provide log output, if possible.

When using CPU limits/shares, ActiveProcessorCount reports the total number of CPUs on the system.

Build Configuration

  • What buildpacks are you using? Please include versions.

paketo-buildpacks/java

See here for more details.

@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 27, 2022

I'm leaning towards adding an option that would allow you to disable the ActiveProcessorCount helper. For backward compatibility, this would default to using the ActiveProcessorCount helper.

By allowing this to be disabled, we'd let the JVM detect the total number of CPUs. All of the versions of the JVM that we have shipped for a long time now support detection of CPU count within a container.

https://bugs.openjdk.java.net/browse/JDK-8146115

It is done a bit differently though and it does seem to attempt to take into consideration reduced CPU shares.

This would leave three options for users:

  1. The default: let the ActiveProcessorCount run and determine using the present algorithm.
  2. Disable ActiveProcessorCount and let the JVM determine using its algorithm.
  3. Take manual control and set the number of ActiveProcessors as you wish.

@dmikusa
Copy link
Contributor Author

dmikusa commented Jan 27, 2022

Some historical context on this feature is available here -> #53 (comment)

@paskos
Copy link

paskos commented Feb 2, 2022

We added manual -XX:ActiveProcessorCount=1 to the java options at startup.
Now we see both -XX:ActiveProcessorCount=32 -XX:ActiveProcessorCount=1 values in JAVA_TOOL_OPTIONS environment variable.

we use spring-boot-admin and the process reports active CPUs as 1 and the thread pools are sized accordingly so it looks like -XX:ActiveProcessorCount=1 takes precedence over 32.
We will test on more than one service to see if the "winner" is consistent or if it is random.

@dmikusa
Copy link
Contributor Author

dmikusa commented Feb 2, 2022

Based on past experience, when you have multiple arguments with different values like this the JVM will consistently pick the right-most value in the list. I suspect it's processing them in order, so when it processes the argument the second time, it just changes the value from 32 to 1, but I've not looked at the source code so that is a guess.

Anyway, in your case it should consistently pick 1.

@domainname
Copy link

Hi @dmikusa , is there any updates on this issue? We are impacted by it. See Azure/Azure-Spring-Apps#36. Looking forward to an option to disable ActiveProcessorCount helper.

@cmdjulian
Copy link

Actually, as we are forced to override all of the memory settings the memory calculator sets. I would appreciate it if we had just a flag to disable it completely if this is possible somehow

@anthonydahanne
Copy link
Member

hello!
Could you provide an example of an app that suffers from this default?
Like an app running out of memory in seconds / minutes because of this setting?

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 30, 2023

I've been strongly against disabling the memory calculator because it provides a vital function. If you're having issues, it's usually just forcing you to address them now versus at 3am when your app is crashing (ie. It fails fast so you can fix stuff now).

But as @anthonydahanne said please open an issue with more specifics about your problem. This issue is not regarding the memory calculator, so please open a new issue for memory calculator questions.

I also think that we need to do a review of Java 17 and the upcoming Java 21 and see how they work in a container. Java has been improving its container awareness so perhaps we can disable memory calculator in newer versions if the JVM functions correctly.

Thanks

@showpune
Copy link

showpune commented Sep 3, 2023

Hi @dmikusa,

For the code active_processor_count looks like a bug as we calculate the active processor count by node CPU but not container request CPU, so in some small application customer request just 1 CPU, but still get 16 active processor count. Can we just remove the active processor call from helper main before we have conclusion #320

In fact from JDK8, JDK can get the right CPU numbers and we needn't to inject one https://bugs.openjdk.org/browse/JDK-8140793

@dmikusa
Copy link
Contributor Author

dmikusa commented Sep 4, 2023

@showpune (or anyone interested) - I'm not opposed to removing this, but I think we need a quick proof of concept first. I'd like to see what the JVM is telling us and compare that to what we get from Golang (i.e., the helper).

There are two cases to compare for each major Java version: 1.) When using container limits for the number of CPUs and 2.) When using container limits for a number of CPU shares. I think it's important we examine the results from both to make sure we're giving users the best experience.

Thanks

@anthonydahanne
Copy link
Member

I agree with @dmikusa : we can't just remove this functionality out of sheer hope or just for 1 single use case.

We need facts / use cases that it really breaks apps.

@showpune I invite you to show us how the active_processor_count breaks your app. This could be:

  • creating a sample project on Github and having some actions demonstrate a failure, or
  • hacking around an existing java sample that would behave better without it

@showpune
Copy link

showpune commented Sep 5, 2023

Hi @dmikusa , @anthonydahanne

As an alternative solution, can we use /sys/fs/cgroup/cpu to get the CPU processor count?

@dmikusa
Copy link
Contributor Author

dmikusa commented Sep 5, 2023

Hi @dmikusa , @anthonydahanne

As an alternative solution, can we use /sys/fs/cgroup/cpu to get the CPU processor count?

It wouldn't surprise me if that is what our Go helper is doing now. I'm open to any option, but the same request applies. We need to see a.) a use case for the change, especially if it'll change the default, and b.) a proof of concept that demonstrates the new solution works better than the present solution.

@showpune
Copy link

Hi,
can you help to check Azure/Azure-Spring-Apps#36?
As the active process account is set to 8 according to node instead of CPU request of Pod, we go the wrong CPU usage metric

@anthonydahanne anthonydahanne added the hacktoberfest Hacktoberfest eligible label Oct 4, 2023
@Ch4s3r
Copy link

Ch4s3r commented Oct 20, 2023

What about using runtimepb.NumCPU() instead of runtime.NumCPU()?
As they offer support for cgroups and not only detecting logical cores.

@pauloricardomg
Copy link

A workaround to this bug is to set the following runtime environment variable:
JAVA_TOOL_OPTIONS="-XX:ActiveProcessorCount=-1"

This will fallback to the default (and correct) Java container support behavior which is to honor the container cpu limits.

@pax95
Copy link

pax95 commented Jan 29, 2024

Tried to use the JAVA_TOOL_TOPTIONS, but it didn't have any affect on the starting pod
The buildpack used this:
Is there any way to set ActiveProcessorCount ?

Build Configuration:
$BP_JVM_JLINK_ARGS --no-man-pages --no-header-files --strip-debug --compress=1 configure custom link arguments (--output must be omitted)
$BP_JVM_JLINK_ENABLED false enables running jlink tool to generate custom JRE
$BP_JVM_TYPE JDK the JVM type - JDK or JRE
$BP_JVM_VERSION 17 the Java version
Launch Configuration:
$BPL_DEBUG_ENABLED false enables Java remote debugging support
$BPL_DEBUG_PORT 8000 configure the remote debugging port
$BPL_DEBUG_SUSPEND false configure whether to suspend execution until a debugger has attached
$BPL_HEAP_DUMP_PATH write heap dumps on error to this path
$BPL_JAVA_NMT_ENABLED true enables Java Native Memory Tracking (NMT)
$BPL_JAVA_NMT_LEVEL summary configure level of NMT, summary or detail
$BPL_JFR_ARGS configure custom Java Flight Recording (JFR) arguments
$BPL_JFR_ENABLED false enables Java Flight Recording (JFR)
$BPL_JMX_ENABLED false enables Java Management Extensions (JMX)
$BPL_JMX_PORT 5000 configure the JMX port
$BPL_JVM_HEAD_ROOM 0 the headroom in memory calculation
$BPL_JVM_LOADED_CLASS_COUNT 35% of classes the number of loaded classes in memory calculation
$BPL_JVM_THREAD_COUNT 250 the number of threads in memory calculation
$JAVA_TOOL_OPTIONS -XX:ActiveProcessorCount=2

@keskad
Copy link

keskad commented Mar 7, 2024

In my case the -XX:ActiveProcessorCount is set to a value, that slows down the application startup by 2x - e.g. 15s -> 32s
In effect applications are in a CrashLoopBack because of failing liveness probes.

It would be nice to disable this.

A workaround in Kubernetes is to set entrypoint to point directly to:

/layers/paketo-buildpacks_bellsoft-liberica/jre/bin/java org.springframework.boot.loader.JarLauncher

Which turns off other buildpack features - because the original entrypoint no longer works.

@keskad
Copy link

keskad commented Mar 7, 2024

Now the -XX:ActiveProcessorCount=-1 worked, when I put it at the beginning of the JAVA_TOOL_OPTIONS, I had it after a new line (testing in docker-compose locally)

JAVA_TOOL_OPTIONS: |
               -Dspring.config.location=xxx ...
               -XX:ActiveProcessorCount=-1
JAVA_TOOL_OPTIONS: |
              -XX:ActiveProcessorCount=-1 -Dspring.config.location=xxx ...

@dmikusa
Copy link
Contributor Author

dmikusa commented Mar 7, 2024

@keskad Can you expand on your environment? What value for ActiveProcessorCount is being picked by the buildpack that causes the start-up times to be slower? What are the constraints put on your application by the container orchestrator? Is it limiting your CPU count to 1, and/or are there CPU share limitations put on the app too?

Thanks. Just trying to understand the circumstances behind that statement, so we can test/reproduce.

@keskad
Copy link

keskad commented Mar 7, 2024

@dmikusa

For those limitations in docker-compose.yaml:

            resources:
                limits:
                    cpus: '0.75'
                    memory: 1024M
                reservations:
                    cpus: '0.10'
                    memory: 256M

and this specs:

  • Intel CPU with 8 threads
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7

The value of -XX:ActiveProcessorCount is set to 8 for example.
When I set it to a lower value, then the performance boosts.

@sdeleuze
Copy link

sdeleuze commented Apr 25, 2024

I confirm what is mentioned in previous comment, and the impact can be huge, so big +1 from me to fix this issue.

I was struggling to understand why the Dockerfile version of https://github.com/sdeleuze/petclinic-efficient-container in the main branch was starting 3x to 4x faster than the Buildpacks equivalent when limits are enabled with Dockerfile with:

      resources:
        limits:
          cpus: '2.0'

And it seems due to the fact the hardware number of CPU are taken in account to compute -XX:ActiveProcessorCount not the ones available in practice.

You can reproduce with:

sdk use java 21.0.2-librca
git clone https://github.com/sdeleuze/petclinic-efficient-container.git
cd petclinic-efficient-container
git checkout buildpacks-processor-count-issue
./build-container-image.sh
docker-compose up

That gives me on my machine Started PetClinicApplication in 5.885 seconds (process running for 6.078) and I see -XX:ActiveProcessorCount=12 despite 2 configured.

Now, if I override the value to set the one I would expect:

   environment:
      - JAVA_TOOL_OPTIONS="-XX:ActiveProcessorCount=2"

I see on my machine something like Started PetClinicApplication in 1.561 seconds (process running for 1.762)!

My use case is to simulate cheap servers on my powerful laptop to be able to perform benchmarks for Spring projects, and I also would like to use that feature in my upcoming talks. As I can see in #136 (comment), there seems also to have impact on production.

I know close to nothing in Go but maybe the proposal in #136 (comment) is worth to explore:

What about using runtimepb.NumCPU() instead of runtime.NumCPU()?
As they offer support for cgroups and not only detecting logical cores.

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 25, 2024

@sdeleuze Thanks for providing some details. I think what I'd like to do is just have the buildpacks backoff and not provide that helper for newer Java versions. The JDK is supposed to be able to detect CPU usage in containers correctly now, so we don't really need to do this in the buildpack.

I've got a PR up that does this for Java 17+. There's a test image up here docker.io/dmikusa/bellsoft-liberica:gh-130. If you have a minute, please give it a try and let me know if you see better start up performance with this image.

You can try that by running pack build -b docker.io/dmikusa/bellsoft-liberica:gh-130 -b paketo-buildpacks/java .... Putting it first in the buildpack order will make it run and override the bellsoft-liberica version inside the standard buildpack.

Thanks!

@sdeleuze
Copy link

I tried but got errors like:

[INFO]     [creator]     ======== Error: paketo-buildpacks/[email protected]+dev ========
[INFO]     [creator]     fork/exec /cnb/buildpacks/paketo-buildpacks_bellsoft-liberica/v10.7.0+dev/bin/detect: exec format error
[INFO]     [creator]     ======== Results ========
[INFO]     [creator]     err:  paketo-buildpacks/[email protected]+dev

I am probably doing something wrong, any chance you could share the Spring Boot Maven configuration I am expected to use to test it (I am not using pack)?

Also I am wondering if we could just avoid setting -XX:ActiveProcessorCount for all versions of Java. The reason why I propose that is as far as I understand https://bugs.openjdk.org/browse/JDK-8146115 and based on what I read on various discussions on SO, -XX:ActiveProcessorCount is only available on versions of Java capable of detecting correctly the CPU limits, and also it has been backported to Java 8 as of build 8u191 and 8u201 (see the detail in the OpenJDK link) so I struggle to understand the use case where setting -XX:ActiveProcessorCount make sense.

@anthonydahanne
Copy link
Member

I got the same error with maven and

                        <buildpacks>
							<buildpack>dmikusa/bellsoft-liberica:gh-130</buildpack>
							<buildpack>paketobuildpacks/syft:latest</buildpack>
							<buildpack>paketobuildpacks/executable-jar:latest</buildpack>
							<buildpack>paketobuildpacks/dist-zip:latest</buildpack>
							<buildpack>anthonydahanne/spring-boot:cds-april-24</buildpack>
                        </buildpacks>

@dmikusa did you publish your buildpack with a pack exp that supports dual arch?

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 25, 2024

@sdeleuze

Also I am wondering if we could just avoid setting -XX:ActiveProcessorCount for all versions of Java. The reason why I propose that is as far as I understand https://bugs.openjdk.org/browse/JDK-8146115 and based on what I read on various discussions on SO, -XX:ActiveProcessorCount is only available on versions of Java capable of detecting correctly the CPU limits, and also it has been backported to Java 8 as of build 8u191 and 8u201 (see the detail in the OpenJDK link) so I struggle to understand the use case where setting -XX:ActiveProcessorCount make sense.

I picked 17 arbitrarily. I wasn't sure exactly what versions of Java had the container-aware CPU sizing improvements. I didn't have time to dig in, so I thought I'd just start here. We can absolutely tweak this to different versions, or just remove it altogether if we're confident that all the fixes in this area were backported to Java 8. Changing Java 8/11 behaviors makes me nervous though, they've been out in the field a while and people expect things not to change there.

Maybe a feature flag is necessary here? We could have Java 8/11 be default enabled, so behavior doesn't change, but users can opt-in, and 17+ it defaults to on.

Anyway, curious to know what everyone thinks about it. Thanks for all the feedback!

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 25, 2024

I am probably doing something wrong, any chance you could share the Spring Boot Maven configuration I am expected to use to test it (I am not using pack)?

Try removing the docker image for docker.io/dmikusa/bellsoft-liberica:gh-130, just docker rmi it. Then docker pull docker.io/dmikusa/bellsoft-liberica:gh-130. Then try your build and see if that helps.

The present version of pack is only partially platform aware and, at least for me, it downloads amd64 images. That has made some weirdness on my MPB M1. If you are also on an M1, this might be your problem too. If you docker pull it first, that will ensure the proper arm64 variant of the image is on your machine. Then when you pack build use the --pull-policy if-not-present flag (or use never) and it will use the existing image.

Disregard that. That was me screwing things up. I'd built this demo buildpack as multi-arch, so you need to use a multi-arch builder too. That's why it's picking the wrong arch.

Try this command:

pack build -b docker.io/dmikusa/bellsoft-liberica:gh-130 -b gcr.io/paketo-buildpacks/java:beta -B paketobuildpacks/builder-jammy-buildpackless-tiny ...

For SB Build Tools, use what @anthonydahanne posted. Add <builder>paketobuildpacks/builder-jammy-buildpackless-tiny</builder> as a sibling element to the <buildpacks> block. See https://docs.spring.io/spring-boot/docs/3.2.5/maven-plugin/reference/htmlsingle/#build-image.docker-registry for details.

@sdeleuze
Copy link

Ok that's fine if you prefer to focus on Java 17+ at least initially, and IMO no need for a feature flag (can be customized via Java options).

I will test and provide a feedback tomorrow (AFK right now).

@sdeleuze
Copy link

I was already using <builder>paketobuildpacks/builder-jammy-buildpackless-tiny</builder>, and I tried various combinations, not able to make it work. I do my test on a Linux x86 if that matters.

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 26, 2024

@sdeleuze Sorry. We had a bug in our build script and it was not creating the binaries for amd64 correctly. I've fixed that and have a new image out, can you try docker.io/dmikusa/bellsoft-liberica:gh-130-3?

@sdeleuze
Copy link

sdeleuze commented Apr 26, 2024

Np, with this new version, I confirm my sample application using Buildpacks Java 21 and CDS is starting 4x faster without having ActiveProcessorCount customization, so all good!

dmikusa added a commit that referenced this issue Apr 27, 2024
@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 27, 2024

OK, so there's definitely performance implications to the way that this is being set currently by the buildpack. Thanks to everyone for providing details and information on this topic.

We're going to make the following changes: #388

This disables the active processor count helper for versions of Java 17 and newer. This essentially allows the JVM to use its logic to auto-configure this value. We picked 17 because it is newish and people are still migrating to it (thus it doesn't have a lot of users' established expectations like Java 8/11), and we believe it to have solid logic for detecting container CPU limits and configuring itself correctly. It's possible that Java 8/11 would work as well but this is riskier to change as we don't want to break existing user apps/expectations. It's possible users could be adjusted to the buildpacks current method of operation and that while making this change would be correct, it might disrupt their existing apps.

For users of Java 8/11, if you're seeing this issue you can use this workaround:

  1. Set JAVA_TOOL_OPTIONS='-XX:ActiveProcessorCount=-1' to use the JVM's default logic, or set it to the number of CPUs you're limiting your container to.
  2. Restart your container.

This works because when you manually set JAVA_TOOL_OPTIONS it appends that value to the end of the existing list of parameters and the last setting is the one picked by the JVM. This means if you look at the settings used by the JVM, you'll see -XX:ActiveProcessorCount= listed twice, with your value should be listed second.

Or of course you can upgrade to Java 17+ which is awesome in its own right.

dmikusa added a commit that referenced this issue Apr 29, 2024
Only contribute active-processor-count helper for Java versions < 17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest Hacktoberfest eligible note:ideal-for-contribution An issue that a contributor can help us with type:enhancement A general enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.