-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional Sanity JDK10 Linux s390x tests suddenly take 7 hours #2329
Comments
It looks like all cmdlinetest are affected. For example: cmdLineTester_XcheckJNI_0: changed from 8mins to 50mins |
Where any changes made to the machine configuration? Either by (re)running the ansible scripts or even at the machine provider level? Does rerunning the Jenkins 200 build have the same good perf it had before? |
I think SDK and test are fine. It is the machine configuration issue. Reran the test |
I disabled the PR build until we can fix this. |
Further tested a full sanity.functional build, which used latest SDK: https://ci.eclipse.org/openj9/job/Build-JDK10-linux_390-64_cmprssptrs/247/artifact/OpenJ9-JDK10-linux_390-64_cmprssptrs-201805071103.tar.gz The build took only 1hr39mins to complete. |
Related to Dan's question, is there a log of configuration activity (given the smaller set of people with access to machines this should be easier to accomplish) or an ansible schedule that can shine a light on this? If not, it would be good to institute, putting as much transparency on machine layer changes as possible. |
fyi @jdekonin |
Rebuilt the last "good" levels here |
Definitely not a code change issue. |
I haven't been able to successful reboot with the old kernel. zLinux doesn't use grub, it uses zipl as a bootloader. I've followed the basic instructions, the machine just will not reboot with another kernel specified. At least not through the machine reboot cmdln that sudo has access too which reboots the instance in under 10sec. I think this need to be rebooted from the openstack host. @mstoodle @AdamBrousseau do either of you recall how this can be done on our zLinux machines? |
@joransiu helped get these machines, maybe he has the requisite abilities? |
I expect this problem an aspect of the problem being discussed in #1888. Slow startup related to Java 9 and later setting -Xmx to 25% of the physical memory on the machine by default, vs Java 8 that uses a default of 512MB. |
@pshipton that change for Java 9 has existed for months so I doubt it is actually the cause here. It may be related if something in the kernel changed which causes the port library to exhibit the same behaviour as the other issues. The new Linux kernel is likely causing a few different problems here so lets make sure we figure all of them. |
@jdekonin mentioned creating an internal machine with the same kernel level which doesn't exhibit the same slowness, so its not necessarily the kernel change which caused the slowdown. Bottom line seems to be that the machines changed and caused the JVM memory allocation to get really slow. While perhaps we could figure out what changed and revert the machines (which is problematic at this time), we should improve the memory allocation to avoid others finding the same issue. |
FWIW, the internal machine we created to test this (where the sdk runs fine) is Ubuntu 16.04.4 kernel version 4.4.0-130-generic |
this problem is not fixed by eclipse-omr/omr#2743 |
One of the problems is fixed by eclipse-omr/omr#2743, however there is still a problem outstanding. The QUICK memory allocation algorithm can fail to find a suitable candidate but then it falls back to a brute force search which also won't find any suitable memory and can be very slow. |
That looks promising as compiling test material only took 6 mins instead of the recent 1hr plus. Testing appears to be going quickly as well. |
The whole build took about 1.5 hours. |
This is a great result! I'll admit I was skeptical this would address the regression so I'm very pleased to see it resolved. Thanks to everyone for all the work tracking this down! |
|
Done. I assume this can be closed now. |
Thanks, Adam. |
For the record, eclipse-openj9/openj9-omr#12 merged eclipse-omr/omr#2796 to the v0.9.0-release branch. |
First observed on June 28 in OMR build 574
Test before: https://ci.eclipse.org/openj9/job/Test-Sanity-JDK10-linux_390-64_cmprssptrs/200/
Test After: https://ci.eclipse.org/openj9/job/Test-Sanity-JDK10-linux_390-64_cmprssptrs/201/
Typical build time
Compile Test material: 10min
Sanity functional tests: 1.5hrs
After regression
Compile test material: 1hr
Sanity functional tests: 6hrs
Diff between build 573/574
OpenJ9:
693fe84...be52aeb
No OMR diff between builds
PRs merged
Also fetch branches for checks #2296 Also fetch branches for checksProcess zip files up to 4 GB #2245 Process zip files up to 4 GBPrevent recognizing JIT Helpers not for their ISA #2271 Prevent recognizing JIT Helpers not for their ISA[JDK11] Add Dockerfile(s) for z/p Linux and update xLinux Dockerfile #2188 [JDK11] Add Dockerfile(s) for z/p Linux and update xLinux DockerfileAdd stub methods to bringup jdk-11+19 #2283 Add stub methods to bringup jdk-11+19Change Git to SCM step for Copyright and Line Endings checks #2154 Change Git to SCM step for Copyright and Line Endings checks(Crossing off the PRs that have been ruled out)
Also affects PR builds
https://ci.eclipse.org/openj9/job/PullRequest-Sanity-JDK10-linux_390-64_cmprssptrs-OpenJ9/
The text was updated successfully, but these errors were encountered: