Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openjdk 1.8.0.252.b09 - rd wait on exit issue #302

Closed
PeterGarlic opened this issue Jun 9, 2020 · 38 comments
Closed

openjdk 1.8.0.252.b09 - rd wait on exit issue #302

PeterGarlic opened this issue Jun 9, 2020 · 38 comments

Comments

@PeterGarlic
Copy link

On CentOS 7.x rundeck cli 1.0.x/1.2.x/1.3.x and rundeck 2.11.5 / 3.2.x (last test with 3.2.8.20200608) works fine with old openjdk packages:

java-1.8.0-openjdk-headless 1.8.0.242.b08-0.el7_7.x86_64
java-1.8.0-openjdk 1.8.0.242.b08-0.el7_7.x86_6

If the openjdk package is updated with latest available on CentOS 7 (1.8.0.252.b09-2) rd command wait for a long time before exit from any command.

Restore the previous version solve the issue

@molleda
Copy link

molleda commented Jun 25, 2020

Any update on that issue? Using the tools with that waits is hard.

Thanks

@gschueler
Copy link
Member

sorry, if this is an issue with that particular version of openjdk, I'm not sure what we can do about it with the rd tool code. is there any bug report you can find openjdk that indicates a workaround?

@PeterGarlic
Copy link
Author

At today no openjdk updates has been released and I din´t find any workaround to address the problem.

On my CentOS7 servers I just blocked any java update by adding to /etc/yum.conf the line:

exclude=java*

I´m not a programmer then I cannot say if the issue come out from openjdk or from rd cli code. I just reported the problem because we hare using rd in many scripts.

Right now we didn´t check if alternative java version will fix the problem because the system requirements are on java 1.8.x

https://docs.rundeck.com/docs/administration/install/system-requirements.html

@golimer
Copy link

golimer commented Jul 15, 2020

time RD_URL=http://127.0.0.1:4440 RD_INSECURE_SSL=true RD_PASSWORD=TEST RD_USER=admin rd system info

real 3m1.046s
user 0m1.284s
sys 0m0.056s`

##another example

root@rundeck-ble-us-1:/var/log# time RD_URL=http://127.0.0.1:4440 RD_INSECURE_SSL=true >RD_PASSWORD=_____ RD_USER=admin rd users list
WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
1 Users:
admin: name [], last Name [], email []

real 3m1.046s
user 0m1.384s
sys 0m0.080s

rd --version
1.3.0

root@nordlocker-rundeck-us-1:~# time RD_URL=http://127.0.0.1:4440 RD_INSECURE_SSL=true RD_PASSWORD=password RD_USER=admin RD_DEBUG=1 rd projects list

WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed

Jul 15, 2020 5:09:53 AM okhttp3.internal.platform.Platform log
INFO: --> GET http://127.0.0.1:4440/ http/1.1
Jul 15, 2020 5:09:53 AM okhttp3.internal.platform.Platform log
INFO: <-- 302 Found http://127.0.0.1:4440/ (4ms, 0-byte body)
Jul 15, 2020 5:09:53 AM okhttp3.internal.platform.Platform log
INFO: --> GET http://127.0.0.1:4440/user/login http/1.1
Jul 15, 2020 5:09:53 AM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://127.0.0.1:4440/user/login (6ms, unknown-length body)
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: --> POST http://127.0.0.1:4440/j_security_check http/1.1 (40-byte body)
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: <-- 302 Found http://127.0.0.1:4440/j_security_check (1ms, 0-byte body)
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: --> GET http://127.0.0.1:4440/ http/1.1
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: <-- 302 Found http://127.0.0.1:4440/ (15ms, 0-byte body)
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: --> GET https://rundeck.host.net/menu/home h2
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: <-- 302 https://rundeck.host.net/menu/home (40ms, 0-byte body)
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: --> GET https://rundeck.host.net/user/login h2
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: <-- 200 https://rundeck.host.net/user/login (23ms, unknown-length body)
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: --> GET http://127.0.0.1:4440/api/29/projects http/1.1
Jul 15, 2020 5:09:54 AM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK http://127.0.0.1:4440/api/29/projects (6ms, unknown-length body)

9 Projects:

xxxxxxxx
xxxxxxxxx

real 3m1.143s
user 0m1.556s
sys 0m0.052s

@golimer
Copy link

golimer commented Jul 15, 2020

Update: same lag with others commands like
rd system info
rd users list

tested on openjdk 11.0.6 2020-01-14 SAME

@gschueler
Copy link
Member

gschueler commented Jul 15, 2020

@golimer please post your os information (edit)

@jplassnibatt
Copy link
Contributor

jplassnibatt commented Jul 15, 2020

We've tried your versions here without issues:
updated CentOS:

$ cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)

rundeck version:

$ ls /var/lib/rundeck/bootstrap/
rundeckpro-3.2.8-20200608.war

java:

$ java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

rd-cli from repo and from zip.

$ rd --version
1.3.0

local rundeck:

$ time RD_URL=http://127.0.0.1:4440 RD_INSECURE_SSL=true RD_PASSWORD=admin RD_USER=admin rd users list
# WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
# 1 Users:
admin: name [], last Name [], email []

real	0m1.313s
user	0m1.033s
sys	0m0.080s

remote rundeck:

$ time RD_URL=http://cloud.service.com:4440 RD_INSECURE_SSL=true RD_PASSWORD=admin RD_USER=admin ./rd users list
# WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
# 2 Users:
admin: name [], last Name [], email []
testuser: name [], last Name [], email []

real	0m3.696s
user	0m1.179s
sys	0m0.092s

@golimer
Copy link

golimer commented Jul 16, 2020

After desperate tries... updated everything what is possible except OS :) few months ago was perfect.... i cant find what is wrong....

Debian stretch

cat /etc/debian_version
9.12

rundeck:

rundeck-3.3.0-20200701.war

java:

OpenJDK Runtime Environment (build 11.0.6+10-post-Debian-1bpo91)
OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Debian-1bpo91, mixed mode, sharing)

rd:

1.3.1

Timing:

WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
1 Users:
admin: name [], last Name [], email []

real 3m1.046s
user 0m1.384s
sys 0m0.080s

Thank you guys for helping!!!!

@jplassnibatt
Copy link
Contributor

debian version:

# cat /etc/debian_version
9.12

rundeck:

# ls /var/lib/rundeck/bootstrap/
rundeck-3.3.0-20200701.war

java:

# java --version
openjdk 11.0.6 2020-01-14
OpenJDK Runtime Environment (build 11.0.6+10-post-Debian-1bpo91)
OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Debian-1bpo91, mixed mode, sharing)

rd-cli:

# rd --version
1.3.1

time

# time RD_URL=http://127.0.0.1:4440 RD_INSECURE_SSL=true RD_PASSWORD=admin RD_USER=admin rd users list
# WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
# 1 Users:
admin: name [], last Name [], email []

real	0m0.932s
user	0m1.212s
sys	0m0.096s

@golimer to discard an rd-cli issue, could you try using curl command like below? you will need a token for the user (from the Rundeck's GUI profile of that user)
time curl -H "X-Rundeck-Auth-Token: YOUR.API.TOKEN.HERE" -H "Content-Type: application/xml" http://127.0.0.1:4440/api/35/user/list
Also, to discard a Rundeck installation issue, you think you can stop that instance, launch a fresh war (java -jar rundeck-3.3.0-20200701.war) and retry the rd-cli command?

That could give us some lights on what is not working as expected.

@golimer
Copy link

golimer commented Jul 17, 2020

<user>
  <login>admin</login>
  <firstName />
  <lastName />
  <email />
  <created>2020-07-13 11:13:56.0</created>
  <updated>2020-07-17 05:07:31.0</updated>
  <lastJob>2020-07-17 05:05:00.0</lastJob>
  <tokens>1</tokens>
</user>
real	0m0.037s
user	0m0.004s
sys	0m0.000s
time curl -H "X-Rundeck-Auth-Token: token" -H "Content-Type: application/xml" http://127.0.0.1:4440/api/1/projects
<result success='true' apiversion='35'>
  <projects count='9'>
.......
real	0m0.120s
user	0m0.004s
sys	0m0.004s

@golimer
Copy link

golimer commented Jul 17, 2020

tested and on debian 10.4 same

# WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
# 0 Projects:


real	3m1.872s
user	0m2.381s
sys	0m0.143s

Starting to think about configs......

@golimer
Copy link

golimer commented Jul 17, 2020

case solved .....

rundeck-config.properties

grails.serverURL=https://domain.com 
REPLACE TO 
grails.serverURL=http://localhost:4440

@golimer
Copy link

golimer commented Jul 22, 2020

well changes of rundeck-config.properties

grails.serverURL=https://domain.com 
REPLACE TO 
grails.serverURL=http://localhost:4440

i have new problems: after clicking on job redirecting to http://localhost:4440
so restored to grails.serverURL=https://domain.com
and i have ****** lag again

rundeck-us-1:/etc/rundeck# time RD_URL=http://127.0.0.1:4440 RD_INSECURE_SSL=true RD_PASSWORD=admin RD_USER=admin rd projects list
# WARNING: RD_INSECURE_SSL=true, no hostname or certificate trust verification will be performed
# 1 Projects:
tcc

real	3m1.076s
user	0m1.436s
sys	0m0.068s

@golimer
Copy link

golimer commented Jul 28, 2020

hmm no solution regarding this issue?
grails.serverURL=https://domain.com <<<<<<<<<<< lag 3 min
grails.serverURL=http://localhost:4440 <<<<<<<< works PERFECT (website not working when grails.serverURL is set to localhost)

@gschueler
Copy link
Member

gschueler commented Aug 3, 2020

if you have lag resolving domain.com in your server, maybe your local name resolution is not working to resolve domain.com or something.

You can also try the RD_BYPASS_URL config setting https://rundeck.github.io/rundeck-cli/configuration/#bypass-an-external-url

@cova-fe
Copy link

cova-fe commented Aug 14, 2020

I have the same issues and made some investigation.

  • I'm not able to find anything wrong with dns. Maybe there is something, but it is well concealed
  • I set RD_URL and RD_BYPASS_URL to IP address instead of hostnames, no changes, still 3 minutes lag.
  • The elapsed time for rd cli to exit is always 3+ minutes, like there is a 3 minutes timeout that is triggered after the job is finished
  • The job ends normally and then rd-cli returns after 3 minutes, so the lag is always at the end. Intermediate steps output is printed as it happens on server.
  • Interesting enough, the 3+ minutes issue happens only if the job ends correctly. If the job ends with an error, rd cli returns as soon as the job is ended. I tested it simply by adding a exit 1 at the end of last step. I guess this is an interesting thing to investigatge.
  • tested with openjdk 11.0.8+10-0ubuntu1~18.04.1 and alpine openjdk11-jre in a docker container
    Unfortunately my java knowledge is basically zero, investigating the code will be complex for me :)

Forgot to mention:
Rundeck 3.3.1
rundeck-cli 1.3.1

@gschueler
Copy link
Member

@cova-fe can you run with RD_DEBUG=1 and see where the lag happens among the API calls? i.e. does it happen waiting for a server response? does it happen before an API request? or does it happen after all API responses?

@cova-fe
Copy link

cova-fe commented Aug 17, 2020

@cova-fe can you run with RD_DEBUG=1 and see where the lag happens among the API calls? i.e. does it happen waiting for a server response? does it happen before an API request? or does it happen after all API responses?

So, here you can see the last lines of the log:

Aug 17, 2020 9:09:33 AM okhttp3.internal.platform.Platform log
INFO: <-- 200  https://xxx.yyy.zzz.ttt/api/29/execution/490919/output?offset=8125&lastmod=1597648170000&maxlines=500&compacted=true (35ms, unknown-length body)

Aug 17, 2020 9:09:35 AM okhttp3.internal.platform.Platform log
INFO: --> GET https://xxx.yyy.zzz.ttt/api/29/execution/490919/output?offset=8125&lastmod=1597648170000&maxlines=500&compacted=true h2

Aug 17, 2020 9:09:35 AM okhttp3.internal.platform.Platform log
INFO: <-- 200  https://xxx.yyy.zzz.ttt/api/29/execution/490919/output?offset=8125&lastmod=1597648170000&maxlines=500&compacted=true (40ms, unknown-length body)

DEBUG

Aug 17, 2020 9:09:37 AM okhttp3.internal.platform.Platform log
INFO: --> GET https://xxx.yyy.zzz.ttt/api/29/execution/490919/output?offset=8854&lastmod=1597648175000&maxlines=500&compacted=true h2

Aug 17, 2020 9:09:37 AM okhttp3.internal.platform.Platform log
INFO: <-- 200  https://xxx.yyy.zzz.ttt/api/29/execution/490919/output?offset=8854&lastmod=1597648175000&maxlines=500&compacted=true (34ms, unknown-length body)

Please note that DEBUG line is printed by the script in the last step of the workflow, a locally executed script.
For comparison, this is what happens if I add a "exit 1" just after "echo DEBUG" in the above script. In this case, rd returns just after printing the error message and there is no 3 minute wait whatsoever. Also, After "DEBUG" in case the result code is 1 the line "Result: 1" is printed. If the job is executed properly (i.e. exit 0) the line "Result:" is not printed.

Aug 17, 2020 9:18:42 AM okhttp3.internal.platform.Platform log
INFO: <-- 200  https://xxx.yyy.zzz.ttt/api/29/execution/490925/output?offset=8125&lastmod=1597648717000&maxlines=500&compacted=true (37ms, unknown-length body)

Aug 17, 2020 9:18:44 AM okhttp3.internal.platform.Platform log
INFO: --> GET https://xxx.yyy.zzz.ttt/api/29/execution/490925/output?offset=8125&lastmod=1597648717000&maxlines=500&compacted=true h2

Aug 17, 2020 9:18:44 AM okhttp3.internal.platform.Platform log
INFO: <-- 200  https://xxx.yyy.zzz.ttt/api/29/execution/490925/output?offset=8125&lastmod=1597648717000&maxlines=500&compacted=true (41ms, unknown-length body)
DEBUG
Result: 1
Failed: NonZeroResultCode: Result code was 1
Execution failed: 490925 in project myproject: [Workflow result: , step failures: {11=Dispatch failed on 1 nodes: [rundeck-XXXX.YYYYY.ZZ: NonZeroResultCode: Result code was 1 + {dataContext=MultiDataContextImpl(map={ContextView(step:11, node:rundeck-XXXX.YYYYY.ZZ)=BaseDataContext{{exec={exitCode=0}}}, ContextView(node:rundeck-XXXX.YYYYY.ZZ)=BaseDataContext{{exec={exitCode=0}}}}, base=null)} ]}, Node failures: {rundeck-XXXX.YYYYY.ZZ=[NonZeroResultCode: Result code was 1 + {dataContext=MultiDataContextImpl(map={ContextView(step:11, node:rundeck-XXXX.YYYYY.ZZ)=BaseDataContext{{exec={exitCode=0}}}, ContextView(node:rundeck-XXXX.YYYYY.ZZ)=BaseDataContext{{exec={exitCode=0}}}}, base=null)} ]}, status: failed]

@gschueler
Copy link
Member

@cova-fe so the 3 minute delay occurs at which point?

@cova-fe
Copy link

cova-fe commented Aug 18, 2020

@cova-fe so the 3 minute delay occurs at which point?

It happens after the last line
Aug 17, 2020 9:09:37 AM okhttp3.internal.platform.Platform log INFO: <-- 200 https://xxx.yyy.zzz.ttt/api/29/execution/490919/output?offset=8854&lastmod=1597648175000&maxlines=500&compacted=true (34ms, unknown-length body)
...3 minutes start here...

of course only on the first log, the second one is the "exit 1" log and has no delays.

@golimer
Copy link

golimer commented Aug 20, 2020

if you have lag resolving domain.com in your server, maybe your local name resolution is not working to resolve domain.com or something.

You can also try the RD_BYPASS_URL config setting https://rundeck.github.io/rundeck-cli/configuration/#bypass-an-external-url

NICE works like a charm !!! 👍

@mkhpalm
Copy link

mkhpalm commented Aug 24, 2020

Same problem here, after updating OpenJDK8 for security updates rundeck-cli suddenly takes 3 minutes to exit after a job is finished.

@gschueler
Copy link
Member

if anyone experiencing this can take a threaddump during this delay, maybe that would shed some light on it

@cova-fe
Copy link

cova-fe commented Aug 25, 2020

if anyone experiencing this can take a threaddump during this delay, maybe that would shed some light on it

No problem in collecting info, but java is not my field. Any pointer about how to doing that properly?

@gschueler
Copy link
Member

you can send a SIGQUIT signal to the JVM to have it output a threaddump, e.g. kill -quit $PID

@cova-fe
Copy link

cova-fe commented Aug 26, 2020

Thanks, here you can find the result.
A couple of remarks:

  1. I tried also with another different JVM (old icedtea): same result. This output comes from openjdk11 (on gentoo; if needed I can get the same dump on ubuntu but I guess the result will be the same).
  2. interestng enough, some threads references the IP address, while some other the FQDN of the server, even though the RD_BYPASS_URL is set to an IP address.
  3. I'm investigating DNS resolution issues, but so far I've discovered no issues.
Full thread dump OpenJDK 64-Bit Server VM (11.0.8+10 mixed mode):

Threads class SMR info:
_java_thread_list=0x00007f9ad08401c0, length=16, elements={
0x00007f9ad02d9800, 0x00007f9ad02dd800, 0x00007f9ad02f2800, 0x00007f9ad02f4800,
0x00007f9ad02f6800, 0x00007f9ad02f8800, 0x00007f9ad03ad800, 0x00007f9ad03b9800,
0x00007f9ad0860800, 0x00007f9ad0862000, 0x00007f9ad087f800, 0x00007f9a20004000,
0x00007f9a1c002000, 0x00007f9ad088a000, 0x00007f9a48142000, 0x00007f9ad0027800
}

"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.47ms elapsed=60.15s tid=0x00007f9ad02d9800 nid=0x725f waiting on condition  [0x00007f9ab8175000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.ref.Reference.waitForReferencePendingList([email protected]/Native Method)
        at java.lang.ref.Reference.processPendingReferences([email protected]/Reference.java:241)
        at java.lang.ref.Reference$ReferenceHandler.run([email protected]/Reference.java:213)

"Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.47ms elapsed=60.15s tid=0x00007f9ad02dd800 nid=0x7260 in Object.wait()  [0x00007f9a8c4e8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x0000000452001180> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
        - waiting to re-lock in wait() <0x0000000452001180> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:176)
        at java.lang.ref.Finalizer$FinalizerThread.run([email protected]/Finalizer.java:170)

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.15ms elapsed=60.14s tid=0x00007f9ad02f2800 nid=0x7261 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 cpu=552.65ms elapsed=60.14s tid=0x00007f9ad02f4800 nid=0x7262 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"C1 CompilerThread0" #8 daemon prio=9 os_prio=0 cpu=498.66ms elapsed=60.14s tid=0x00007f9ad02f6800 nid=0x7263 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"Sweeper thread" #9 daemon prio=9 os_prio=0 cpu=9.65ms elapsed=60.14s tid=0x00007f9ad02f8800 nid=0x7264 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" #10 daemon prio=9 os_prio=0 cpu=0.06ms elapsed=60.11s tid=0x00007f9ad03ad800 nid=0x7265 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #11 daemon prio=8 os_prio=0 cpu=0.49ms elapsed=60.11s tid=0x00007f9ad03b9800 nid=0x7267 in Object.wait()  [0x00007f9a6f717000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x0000000452004928> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
        - waiting to re-lock in wait() <0x0000000452004928> (a java.lang.ref.ReferenceQueue$Lock)
        at jdk.internal.ref.CleanerImpl.run([email protected]/CleanerImpl.java:148)
        at java.lang.Thread.run([email protected]/Thread.java:834)
        at jdk.internal.misc.InnocuousThread.run([email protected]/InnocuousThread.java:134)

"OkHttp xxx.yyy.zzz.ttt" #13 prio=5 os_prio=0 cpu=16.15ms elapsed=59.09s tid=0x00007f9ad0860800 nid=0x7276 runnable  [0x00007f9a6d791000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0([email protected]/Native Method)
        at java.net.SocketInputStream.socketRead([email protected]/SocketInputStream.java:115)
        at java.net.SocketInputStream.read([email protected]/SocketInputStream.java:168)
        at java.net.SocketInputStream.read([email protected]/SocketInputStream.java:140)
        at sun.security.ssl.SSLSocketInputRecord.read([email protected]/SSLSocketInputRecord.java:467)
        at sun.security.ssl.SSLSocketInputRecord.readHeader([email protected]/SSLSocketInputRecord.java:461)
        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket([email protected]/SSLSocketInputRecord.java:70)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord([email protected]/SSLSocketImpl.java:1349)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read([email protected]/SSLSocketImpl.java:958)
        at okio.Okio$2.read(Okio.java:140)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
        at okio.RealBufferedSource.request(RealBufferedSource.java:72)
        at okio.RealBufferedSource.require(RealBufferedSource.java:65)
        at okhttp3.internal.http2.Http2Reader.nextFrame(Http2Reader.java:96)
        at okhttp3.internal.http2.Http2Connection$ReaderRunnable.execute(Http2Connection.java:599)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp ConnectionPool" #14 daemon prio=5 os_prio=0 cpu=0.91ms elapsed=59.09s tid=0x00007f9ad0862000 nid=0x7277 in Object.wait()  [0x00007f9a6d691000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x0000000452227068> (a okhttp3.internal.connection.RealConnectionPool)
        at java.lang.Object.wait([email protected]/Object.java:462)
        at okhttp3.internal.connection.RealConnectionPool.lambda$new$0(RealConnectionPool.java:62)
        - waiting to re-lock in wait() <0x0000000452227068> (a okhttp3.internal.connection.RealConnectionPool)
        at okhttp3.internal.connection.RealConnectionPool$$Lambda$45/0x0000000800165c40.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"Okio Watchdog" #15 daemon prio=5 os_prio=0 cpu=1.22ms elapsed=59.05s tid=0x00007f9ad087f800 nid=0x7278 in Object.wait()  [0x00007f9a6d390000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x0000000452375e28> (a java.lang.Class for okio.AsyncTimeout)
        at okio.AsyncTimeout.awaitTimeout(AsyncTimeout.java:348)
        at okio.AsyncTimeout$Watchdog.run(AsyncTimeout.java:313)
        - waiting to re-lock in wait() <0x0000000452375e28> (a java.lang.Class for okio.AsyncTimeout)

"OkHttp xxx.yyy.zzz.ttt Writer" #16 prio=5 os_prio=0 cpu=1.59ms elapsed=58.91s tid=0x00007f9a20004000 nid=0x7279 waiting on condition  [0x00007f9a6d28f000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000004544e0cc0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:1170)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:899)
        at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1054)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp Http2Connection" #17 daemon prio=5 os_prio=0 cpu=0.39ms elapsed=58.91s tid=0x00007f9a1c002000 nid=0x727a waiting on condition  [0x00007f9a6e6a6000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000004544dd9c0> (a java.util.concurrent.SynchronousQueue$TransferStack)
        at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
        at java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
        at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp rundeck.my.site" #18 prio=5 os_prio=0 cpu=2.30ms elapsed=58.16s tid=0x00007f9ad088a000 nid=0x727e runnable  [0x00007f9a6ebed000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0([email protected]/Native Method)
        at java.net.SocketInputStream.socketRead([email protected]/SocketInputStream.java:115)
        at java.net.SocketInputStream.read([email protected]/SocketInputStream.java:168)
        at java.net.SocketInputStream.read([email protected]/SocketInputStream.java:140)
        at sun.security.ssl.SSLSocketInputRecord.read([email protected]/SSLSocketInputRecord.java:467)
        at sun.security.ssl.SSLSocketInputRecord.readHeader([email protected]/SSLSocketInputRecord.java:461)
        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket([email protected]/SSLSocketInputRecord.java:70)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord([email protected]/SSLSocketImpl.java:1349)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read([email protected]/SSLSocketImpl.java:958)
        at okio.Okio$2.read(Okio.java:140)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
        at okio.RealBufferedSource.request(RealBufferedSource.java:72)
        at okio.RealBufferedSource.require(RealBufferedSource.java:65)
        at okhttp3.internal.http2.Http2Reader.nextFrame(Http2Reader.java:96)
        at okhttp3.internal.http2.Http2Connection$ReaderRunnable.execute(Http2Connection.java:599)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"OkHttp rundeck.my.site Writer" #19 prio=5 os_prio=0 cpu=0.61ms elapsed=58.16s tid=0x00007f9a48142000 nid=0x727f waiting on condition  [0x00007f9a6d18e000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x0000000453d39828> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:1170)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:899)
        at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1054)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run([email protected]/Thread.java:834)

"DestroyJavaVM" #20 prio=5 os_prio=0 cpu=1031.16ms elapsed=37.09s tid=0x00007f9ad0027800 nid=0x7258 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" os_prio=0 cpu=8.82ms elapsed=60.16s tid=0x00007f9ad02d1000 nid=0x725e runnable  

"GC Thread#0" os_prio=0 cpu=13.10ms elapsed=60.17s tid=0x00007f9ad003f800 nid=0x7259 runnable  

"GC Thread#1" os_prio=0 cpu=14.01ms elapsed=59.27s tid=0x00007f9a7c001000 nid=0x726d runnable  

"GC Thread#2" os_prio=0 cpu=15.60ms elapsed=59.27s tid=0x00007f9a7c002000 nid=0x726e runnable  

"GC Thread#3" os_prio=0 cpu=15.38ms elapsed=59.27s tid=0x00007f9a7c003800 nid=0x726f runnable  

"GC Thread#4" os_prio=0 cpu=15.48ms elapsed=59.27s tid=0x00007f9a7c005800 nid=0x7270 runnable  

"GC Thread#5" os_prio=0 cpu=13.79ms elapsed=59.27s tid=0x00007f9a7c007000 nid=0x7271 runnable  

"GC Thread#6" os_prio=0 cpu=11.35ms elapsed=59.27s tid=0x00007f9a7c008800 nid=0x7272 runnable  

"GC Thread#7" os_prio=0 cpu=9.73ms elapsed=59.27s tid=0x00007f9a7c00a000 nid=0x7273 runnable  

"G1 Main Marker" os_prio=0 cpu=0.65ms elapsed=60.17s tid=0x00007f9ad007c000 nid=0x725a runnable  

"G1 Conc#0" os_prio=0 cpu=4.24ms elapsed=60.17s tid=0x00007f9ad007e000 nid=0x725b runnable  

"G1 Conc#1" os_prio=0 cpu=6.63ms elapsed=59.26s tid=0x00007f9a88001000 nid=0x7274 runnable  

"G1 Refine#0" os_prio=0 cpu=0.31ms elapsed=60.17s tid=0x00007f9ad0204800 nid=0x725c runnable  

"G1 Young RemSet Sampling" os_prio=0 cpu=5.12ms elapsed=60.17s tid=0x00007f9ad0206800 nid=0x725d runnable  
"VM Periodic Task Thread" os_prio=0 cpu=21.35ms elapsed=60.11s tid=0x00007f9ad03b0000 nid=0x7266 waiting on condition  

JNI global refs: 16, weak refs: 0

Heap
 garbage-first heap   total 1028096K, used 24729K [0x0000000415c00000, 0x0000000800000000)
  region size 4096K, 8 young (32768K), 2 survivors (8192K)
 Metaspace       used 23828K, capacity 24118K, committed 24368K, reserved 1071104K
  class space    used 2581K, capacity 2678K, committed 2688K, reserved 1048576K

@gschueler
Copy link
Member

is there a previous version of rd where this issue does not happen in the same scenario?

@cova-fe
Copy link

cova-fe commented Aug 27, 2020

is there a previous version of rd where this issue does not happen in the same scenario?

I tried some older versions (1.2.4, 1.1.8, 1.0.29) using -all.jar and the behavior is the same. (also, I tried 1.3.3)

@sticky-note
Copy link

sticky-note commented Sep 1, 2020

Faced the same problem when executing rd tool locally and rundeck web interface accessible through a reverse-proxy.
#302 (comment) <- This solution solved the problem.
RD_URL=http://localhost:4440
RD_BYPASS_URL=https://rundeck.mycompany.com <- url served through reverse-proxy
Thanks @gschueler

@PeterGarlic
Copy link
Author

PeterGarlic commented Sep 23, 2020

Solution verified on centos 7.

rd cli version 1.3.0
CentOS Linux release 7.8.2003 (Core)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)
server fqdn direct/reverse lookup on internal dns

The entry on rd.conf :
RD_BYPASS_URL=https://rundeck-1server-fqdn
fix the problem in my environment.

👍
Thanks

p.s.
The fix works also with rd cli version 1.3.3

@cova-fe
Copy link

cova-fe commented Sep 23, 2020

A question to whoever successfully fixed the issue: is the url you added to RD_BYPASS_URL (or you are using directly) has also reverse dns resolution and not only direct?

@PeterGarlic
Copy link
Author

A question to whoever successfully fixed the issue: is the url you added to RD_BYPASS_URL (or you are using directly) has also reverse dns resolution and not only direct?

We have direct/reverse lookup on internal dns for all servers but I'm not sure that this is involved on the issue.

@cova-fe: If you don't have reverse lookup may you try to add them to your dns and check if this fix the problem?

@cova-fe
Copy link

cova-fe commented Sep 23, 2020

Yep, I had this idea; it will take a bit but I want to test it as soon as possible

@mkhpalm
Copy link

mkhpalm commented Sep 25, 2020

FWIW we have a rundeck-cli docker image that builds from our own openjdk images. With the same rundeck-cli package binary on top of different patch releases of openjdk 8 this problem shows up. All that was required to cause the problem was running apt upgrade on ubuntu 16.04 and 18.04. We don't have to change anything, just run updates for openjdk to reproduce. Mucking with RD_BYPASS_URL doesn't have any affect for us. Our rundeck is running in k8s behind a reverse proxy / ingress controller. But it has grails.serverURL and framework.server.url so it responds to the clients appropriately.

To reproduce it should be as simple as getting both versions and exporting JAVA_HOME.

Here are some examples of me randomly trying different JREs to see which ones work and which ones don't. Notice it aligns with the original ticket title regarding 8u252 where we see the same behavior.

JDK 8u242 works:

openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

JDK 8u252 does not work:

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1ubuntu1-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

JDK 8u265 does not work:

openjdk version "1.8.0_265"
OpenJDK Runtime Environment (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01)
OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode)

JDK 11 does not work:

openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)

@gschueler
Copy link
Member

investigating if this might be the cause: square/retrofit#3144

gschueler added a commit that referenced this issue Sep 26, 2020
Fix #302 jvm exit delayed by lingering threads
@gschueler
Copy link
Member

I released 1.3.4 with this fix, #335 however i could never reproduce the original error, so I wasn't able to verify it. Please let me know if it works. If not please reopen this issue, thanks.

@cova-fe
Copy link

cova-fe commented Sep 26, 2020

I tested it on my setup, the issue seems to be fixed :) (I say seems because I tested it only once, but I'm confident it is fixed for good). Many thanks!!

@mkhpalm
Copy link

mkhpalm commented Oct 9, 2020

Using 1.3.4 fixes it for us too. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants