Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAVA_TOOL_OPTIONS environment expansion added in 0.112.0 crash loops java applications while running on gke #3463

Closed
ryanohnemus opened this issue Nov 15, 2024 · 4 comments · Fixed by #3510
Assignees
Labels
auto-instrumentation:java bug Something isn't working

Comments

@ryanohnemus
Copy link

Component(s)

auto-instrumentation

What happened?

Description

After upgrading the otel-operator to version 0.112.0, java8 containers that use auto-instrumentation with inject-java attributes fail to start with the following error:

Picked up JAVA_TOOL_OPTIONS: $(JAVA_TOOL_OPTIONS) -javaagent:/otel-auto-instrumentation-java/javaagent.jar
Unrecognized option: $(JAVA_TOOL_OPTIONS)
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

I believe this is due to #1814 being patched in the 0.112.0 version.

Steps to Reproduce

Attempt to run a java8 application using autoinstrumentation after upgrading to otel-operator 0.112.0. The java container will crash loop with the above error.

Expected Result

Auto-instrumentation to load the javaagent without crashing the java application.

Actual Result

Java8 container crash loops.

Kubernetes Version

1.31.0

Operator version

0.112.0

Collector version

0.112.0

Environment information

Environment

Java Zulu JRE 8.0.432-1

Log output

Picked up JAVA_TOOL_OPTIONS: $(JAVA_TOOL_OPTIONS) -javaagent:/otel-auto-instrumentation-java/javaagent.jar
Unrecognized option: $(JAVA_TOOL_OPTIONS)
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Additional context

Reverting to 0.107.0 of the otel-operator resolves the issue.

@ryanohnemus ryanohnemus added bug Something isn't working needs triage labels Nov 15, 2024
@pavolloffay
Copy link
Member

hi @ryanohnemus this is not enough information to debug. Could you please share your deployment/pod specification?

@ryanohnemus ryanohnemus changed the title JAVA_TOOL_OPTIONS environment expansion added in 0.112.0 crash loops java8 applications, preventing container start JAVA_TOOL_OPTIONS environment expansion added in 0.112.0 crash loops java applications while running on gke Nov 15, 2024
@ryanohnemus
Copy link
Author

ryanohnemus commented Nov 15, 2024

hi @pavolloffay,

I ran through a few different versions of this setup and found this specifically is due to different handling of ENV vars within GKE vs normal k8s.

(As far as a deployment i was using the image: azul/zulu-openjdk:8-jre-latest with a command of ["java", "-version"] with autoinstrumentation enabled and an env var of JAVA_TOOL_OPTIONS=-Xdebug set. Java isn't the root issue in this case, it's actually the env variable handling differences in k8s version).

The change from #1814 appears to add a 2nd environment variable of the exact same name.
On a kind 1.31 cluster that results in the created pod existing with 2 env vars of the same name:

    - name: JAVA_TOOL_OPTIONS
      value: -Xdebug
    - name: JAVA_TOOL_OPTIONS
      value: $(JAVA_TOOL_OPTIONS) -javaagent:/otel-auto-instrumentation-java/javaagent.jar     

However when running on GKE's 1.31 version, it seems you are not allowed to have multiple env vars of the same name. These get collapsed when the pod is created and you have a pod that only has the last defined variable:

    - name: JAVA_TOOL_OPTIONS
      value: $(JAVA_TOOL_OPTIONS) -javaagent:/otel-auto-instrumentation-java/javaagent.jar     

This results in java attempting env expansion and failing with the Unrecognized option: $(JAVA_TOOL_OPTIONS) error and then exiting.

Can the original solution be updated to modify the JAVA_TOOL_OPTIONS env var in-place without this duplicate named entry being created?

@nickandreev
Copy link
Contributor

nickandreev commented Nov 22, 2024

Hi @pavolloffay @ryanohnemus ! Duplicate env vars don't work for us either (aws eks v1.28.13). Instead of two JAVA_TOOL_OPTIONS, there is only one JAVA_TOOL_OPTIONS in the pod resource with this value

    - name: JAVA_TOOL_OPTIONS
      value: $(JAVA_TOOL_OPTIONS) -javaagent:/otel-auto-instrumentation-java/javaagent.jar 

Which causes the pods to crash 😞

I think there is some kind of mutating webhook in our cluster that is causing such behaviour. I haven't identified which one, but I can investigate this later.

@Starefossen
Copy link
Contributor

Starefossen commented Nov 27, 2024

Yes, experiencing the same issue for some of my java apps in the most recent version of the operator. Duplicate environment variables seams to be stripped out and only the one with reference to the original exists. For us this happens when the deployment already have JAVA_TOOL_OPTIONS in the list of environment variables.

You can test this yourself with a deployment/pod with the following env:

      containers:
      - env:
        - name: JAVA_TOOL_OPTIONS
          value: -XX:+UseParallelGC -XX:MaxRAMPercentage=75

For this case I suggest bringing back the original solution by modify the JAVA_TOOL_OPTIONS env var in-place without this duplicate named entry being created 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-instrumentation:java bug Something isn't working
Projects
None yet
4 participants