Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native-image] Startup time of native compiled with graalvm 20.1-dev is noticably slower than 19.3.0 on macOS #2136

Closed
borkdude opened this issue Feb 8, 2020 · 67 comments
Assignees
Milestone

Comments

@borkdude
Copy link

borkdude commented Feb 8, 2020

Given a simple class like:

public class Foo {

    public static void main(String [] args) {
        System.out.println("Hello");
    }
}
$ javac Foo.java

$ brew install Tenzer/tap/multitime

$ export GRAALVM_HOME=/Users/borkdude/Downloads/graalvm-ce-java8-19.3.0/Contents/Home
$GRAALVM_HOME/bin/native-image Foo --no-server
$ multitime -n100 ./foo

===> multitime results
1: ./foo
            Mean        Std.Dev.    Min         Median      Max
real        0.007       0.001       0.005       0.008       0.009
user        0.002       0.000       0.002       0.002       0.003
sys         0.002       0.000       0.001       0.002       0.004

$ export GRAALVM_HOME=/Users/borkdude/Downloads/graalvm-ce-java8-20.1.0-dev/Contents/Home
$GRAALVM_HOME/bin/native-image Foo --no-server
$ multitime -n100 ./foo

===> multitime results
1: ./foo
            Mean        Std.Dev.    Min         Median      Max
real        0.015       0.001       0.011       0.015       0.016
user        0.007       0.001       0.005       0.007       0.008
sys         0.005       0.000       0.004       0.005       0.006

I noticed this when trying out 20.1-dev jdk8 with babashka, a scripting tool made with GraalVM. On GraalVM 19.3.0 java8 it has a startup time of around 14ms, but when I was trying GraalVM 20.1-dev it became 24ms which is 10ms slower.

$ export GRAALVM_HOME=/Users/borkdude/Downloads/graalvm-ce-java8-19.3.0/Contents/Home
$ script/compile 
$ multitime -n100 ./bb '(System/getProperty "java.vm.version")'

===> multitime results
1: ./bb "(System/getProperty \"java.vm.version\")"
            Mean        Std.Dev.    Min         Median      Max
real        0.014       0.001       0.012       0.014       0.016
user        0.005       0.000       0.004       0.005       0.006
sys         0.006       0.001       0.004       0.006       0.008

$ export GRAALVM_HOME=/Users/borkdude/Downloads/graalvm-ce-java8-20.1.0-dev/Contents/Home
$ script/compile 
$ multitime -n100 ./bb '(System/getProperty "java.vm.version")'

===> multitime results
1: ./bb "(System/getProperty \"java.vm.version\")"
            Mean        Std.Dev.    Min         Median      Max
real        0.024       0.002       0.018       0.024       0.026
user        0.011       0.001       0.008       0.011       0.011
sys         0.010       0.001       0.007       0.010       0.012

To build babashka:

$ git clone https://github.com/borkdude/babashka --recursive

Install lein, a Clojure build tool.

Set GRAALVM_HOME to either 19 or 20 with java8.

Then run script/compile (on linux or macos).

This will produce a bb executable.

@borkdude borkdude changed the title [native-image] Startup time of native compiled with graalvm 20.1-dev is noticably slower than 19.3.0 [native-image] Startup time of native compiled with graalvm 20.1-dev is noticably slower than 19.3.0 on macOS Feb 9, 2020
@borkdude
Copy link
Author

borkdude commented Feb 9, 2020

This does not seem to be the case on linux, so it seems to be mac specific:

On linux:

19:

1: ./bb -e "(+ 1 2 3)"
            Mean        Std.Dev.    Min         Median      Max
real        0.015       0.010       0.005       0.010       0.035       
user        0.008       0.006       0.000       0.006       0.023       
sys         0.008       0.007       0.000       0.005       0.031      

20:

1: ./bb -e "(+ 1 2 3)"
            Mean        Std.Dev.    Min         Median      Max
real        0.014       0.009       0.006       0.010       0.037       
user        0.006       0.005       0.000       0.005       0.024       
sys         0.008       0.006       0.000       0.006       0.027  

This data was provided to me by @sogaiu.

@olpaw olpaw self-assigned this Feb 18, 2020
@olpaw
Copy link
Member

olpaw commented Feb 18, 2020

@borkdude thanks for your bug report. I will take a look.

@borkdude
Copy link
Author

borkdude commented Feb 18, 2020

@olpaw I re-tested with the newly released 20.0.0 java8 version, but the problem is still there. If it helps, this is an uberjar of babashka:

https://github.com/borkdude/babashka/releases/download/v0.0.70/babashka-0.0.70-standalone.jar

It is provided which each new release:

https://github.com/borkdude/babashka/releases

This is the part of the compilation script using that jar:

if [ -z "$BABASHKA_XMX" ]; then
    export BABASHKA_XMX="-J-Xmx3g"
fi

$BABASHKA_VERSION=0.0.70

$GRAALVM_HOME/bin/native-image \
  -jar target/babashka-$BABASHKA_VERSION-standalone.jar \
  -H:Name=bb \
  -H:+ReportExceptionStackTraces \
  -J-Dclojure.spec.skip-macros=true \
  -J-Dclojure.compiler.direct-linking=true \
  "-H:IncludeResources=BABASHKA_VERSION" \
  "-H:IncludeResources=SCI_VERSION" \
  -H:ReflectionConfigurationFiles=reflection.json \
  --initialize-at-run-time=java.lang.Math\$RandomNumberGeneratorHolder \
  --initialize-at-build-time \
  -H:Log=registerResource: \
  -H:EnableURLProtocols=http,https \
  --enable-all-security-services \
  -H:+JNI \
  --verbose \
  --no-fallback \
  --no-server \
  "$BABASHKA_XMX"

@borkdude
Copy link
Author

borkdude commented Feb 18, 2020

@olpaw Some additional data related to general performance, not only startup time.

With 20.0.0-java8 on macOS:

$ time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000
./bb    1.98s  user 0.10s system 99% cpu 2.089 total

With 19.3.1-java8 on macOS:

$ time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000
./bb    1.59s  user 0.08s system 99% cpu 1.676 total

That's a 25% loss in performance for v20.

@borkdude
Copy link
Author

borkdude commented Feb 19, 2020

On linux the performance on version 20-java8 is slightly worse (6%) but not as noticeable as with macOS:

19.3.1:
$ multitime -n10 -s1 -q ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
===> multitime results
1: -q ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
            Mean        Std.Dev.    Min         Median      Max
real        1.785       0.037       1.737       1.780       1.873       
user        1.709       0.039       1.653       1.716       1.797       
sys         0.076       0.015       0.040       0.080       0.092    
12:44 AM

20.0.0:
$ multitime -n10 -s1 -q ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
===> multitime results
1: -q ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
            Mean        Std.Dev.    Min         Median      Max
real        1.897       0.055       1.837       1.872       1.988       
user        1.819       0.044       1.762       1.800       1.889       
sys         0.078       0.015       0.052       0.078       0.100  

This data was provided by @sogaiu who uses linux.

@olpaw
Copy link
Member

olpaw commented Feb 19, 2020

With 20.0.0-java8 on macOS

Thanks for checking with latest releases!

@olpaw
Copy link
Member

olpaw commented Feb 19, 2020

Atm, this is my prime subject:

> otool -L hellojava_19.3.1 
hellojava_19.3.1:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.200.5)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1535.12.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
> otool -L hellojava_latest 
hellojava_latest:
	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 921.2.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.200.5)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1535.12.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)

We are now additionally dynamically linking to CoreServices and this likely adds to the startup time.

@olpaw
Copy link
Member

olpaw commented Feb 19, 2020

Yep, that's it. If I comment out

I'm back to the startup characteristics of 19.3.1.

While we cannot fully eliminate this dynamic dependency we could make the dependency conditional on actual requirement instead of unconditional linking. @eginez please take this one from here and implement the conditional linking so that we only have to pay for the increased startup time if we actually need it.

@borkdude
Copy link
Author

@olpaw That's good news! Is this also the cause of the general performance regression of about 25% on macOS? #2136 (comment)

@eginez
Copy link
Contributor

eginez commented Feb 19, 2020

That makes sense, we had it conditionally linked before if I recall correctly. I'll issue a fix for this , thanks @olpaw and @borkdude

@olpaw
Copy link
Member

olpaw commented Feb 19, 2020

@borkdude most likely yes. I only tested with a trivial hello world that does nothing except startup. Your examples with babashka are actually doing a bit of work so pure startup is mixed with actual work that would likely water down the effect of increased startup overhead.

@borkdude
Copy link
Author

borkdude commented Feb 19, 2020

@olpaw That's true, but the "general performance" I was referring to later was not about startup (although this issue reported that as the first thing and it's already if that can be fixed). A loop with a 1000000 iterations takes 25% longer on macOS 20-java8 (1.98s) than macOS 19.3.1-java8 (1.59s).

@olpaw
Copy link
Member

olpaw commented Feb 19, 2020

@borkdude tomorrow I will take a look at your bb data by building it locally with the recipe you provide in #2136 (comment)

@olpaw
Copy link
Member

olpaw commented Feb 20, 2020

@borkdude I cannot reproduce your +25% against master:

paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_old "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.556s
user	0m1.403s
sys	0m0.144s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_old "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.616s
user	0m1.429s
sys	0m0.180s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_old "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.562s
user	0m1.387s
sys	0m0.170s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.614s
user	0m1.411s
sys	0m0.200s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.583s
user	0m1.421s
sys	0m0.150s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.573s
user	0m1.416s
sys	0m0.148s

where bb_old is built with graalvm-ce-java8-19.3.1 and bb with master

babashka-0.0.70> strings bb | grep -p '^com.oracle.svm.core.VM.*='
com.oracle.svm.core.VM.Target.CCompiler=clang|apple|x86_64|10.0.1
com.oracle.svm.core.VM.Target.StaticLibraries=libnet.a|libzip.a|libjava.a|libnio.a|libsunec.a|libffi.a|liblibchelper.a|libstrictmath.a|libjvm.a
com.oracle.svm.core.VM.Target.Platform=org.graalvm.nativeimage.Platform$DARWIN_AMD64
com.oracle.svm.core.VM.Target.LibC=com.oracle.svm.core.c.libc.GLibc
com.oracle.svm.core.VM=GraalVM 20.1.0-dev Java 8 LIBGRAAL
com.oracle.svm.core.VM.Target.Libraries=-framework CoreServices|stdc++|pthread|-framework CoreFoundation|dl|z

@borkdude
Copy link
Author

@olpaw That sounds promising. But were you able to reproduce it against the released version 20 java8 macOS? I don't know the relation between the master branch and that release.

@olpaw
Copy link
Member

olpaw commented Feb 20, 2020

You are right 20.0 binaries are a tad slower

paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_10.0 "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.610s
user	0m1.456s
sys	0m0.151s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_10.0 "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.625s
user	0m1.448s
sys	0m0.175s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_10.0 "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.619s
user	0m1.434s
sys	0m0.182s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_master "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.576s
user	0m1.419s
sys	0m0.151s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_master "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.576s
user	0m1.418s
sys	0m0.156s
paul@pauls-iMac:~/OLabs/bb/babashka-0.0.70> time ./bb_master "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000

real	0m1.577s
user	0m1.400s
sys	0m0.172s

Whatever it was I cannot reproduce on master thus it will be fixed in 20.1.

@borkdude
Copy link
Author

Thank you! I'll keep an eye on the dev releases here https://github.com/graalvm/graalvm-ce-dev-builds/releases.

@olpaw
Copy link
Member

olpaw commented Feb 20, 2020

Yup. Wait for the next GraalVM CE 20.1.0-dev release and let me know if you can confirm my test results from my local graalvm build from master.

@borkdude
Copy link
Author

@olpaw I just realized that the recipe I gave above with the .jar file also requires a reflection.json. I forgot to give you that. In case that may influence the performance, I'll paste it here anyway. I assume you compiled with an empty reflection.json just to make the build work? Maybe also try this one, if it's not too much to ask.
reflection.json.zip.

@borkdude
Copy link
Author

borkdude commented Feb 20, 2020

@olpaw I'm not able to reproduce the "general performance" problem anymore. The reflection.json doesn't seem to have an impact.

Compiled with "empty" reflection.json: []

$ time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000
./bb    1.53s  user 0.08s system 99% cpu 1.614 total

Compiled with full reflection.json:

$ ./bb '(System/getProperty "java.vm.version")'
"GraalVM 20.0.0 CE"

$ time ./bb "(loop [val 0 cnt 1000000] (if (pos? cnt) (recur (inc val) (dec cnt)) val))"
1000000
./bb    1.58s  user 0.09s system 99% cpu 1.677 total

So please ignore my previous message. I don't know what caused the issue at the time of my initial reproduction of the second issue, I might have been making a mistake somewhere. Sorry to waste your time on the second issue. I'm glad everything is sorted out now. Thanks a lot!

@olpaw
Copy link
Member

olpaw commented Feb 24, 2020

@olpaw I just realized that the recipe I gave above with the .jar file also requires a reflection.json. I forgot to give you that.

No worries, I ran a full build with lein (thus I had reflection.json in place).

BTW, if you put your reflection configuration instead into a file named reflect-config.json somewhere below META-INF/native-image it will automatically get picked up and also become part of the jar itself. Thus the result of your image building will not depend on external files anymore. See https://github.com/graalvm/graalvm-demos/tree/master/native-image-configure-examples/configure-at-runtime-example/src/main/resources/META-INF/native-image/org.graalvm.nativeimage/configure-at-runtime-example for an example.

@borkdude
Copy link
Author

borkdude commented Mar 6, 2020

@olpaw Alternatively there could be an option --report-unsupported-elements-at-build-time that can be used together with --report-unsupported-elements-at-runtime. If the flag is provided, GraalVM could emit the unsupported classes to a file or stdout, while still not crashing (the name of the flag probably needs some thought).

@borkdude
Copy link
Author

borkdude commented Apr 23, 2020

@olpaw Despite the fix with substitutions in #2136 (comment) I'm now back to where we started with the dev build https://github.com/graalvm/graalvm-ce-dev-builds/releases/tag/20.2.0-dev-20200423_0149.

When compiling with the new dev-build I'm seeing a startup of around 28ms instead of 13ms and that a lot more dynamic libraries are loaded compared to compiling with 19.3.1:

19.3.1:

$ otool -L /usr/local/bin/bb
/usr/local/bin/bb:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 307.5.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1349.8.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.8)

Dev build:

$ otool -L bb
bb:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.4)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1575.17.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1575.17.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

Can I prevent this somehow, if I'm not interested in these libraries, to still get good startup time on macOS?

Another issue I found is that the binary with the new dev build is around 6MB larger. For what reason?

@olpaw
Copy link
Member

olpaw commented Apr 23, 2020

We figured out that /usr/lib/libobjc.A.dylib is needed on OSX. @eginez knows more about that.

@borkdude
Copy link
Author

@eginez I'd like to learn more about the reason why this was needed. So far I haven't seen any problems with babashka on 19.3.1 on macOS where the library wasn't loaded. Is it maybe possible to opt out of loading certain dynamic libraries at the cost of errors (on the person who chooses to opt out) to restore startup time in macOS? Adding more dynamic libraries adds tens of milliseconds of startup time, is the end of this in sight?

@eginez
Copy link
Contributor

eginez commented Apr 23, 2020

To be honest I don't recall any previous change bringing in libobjc directly. However we did decide to increate the size of the image in order to properly support time zones. 6Mb though does not sound correct and we would have take a look at why libobjc is getting linked in

@eginez
Copy link
Contributor

eginez commented Apr 24, 2020

Ok I took some time today to analyze some of the variation. I built bb with

Below the results

➜  babashka git:(master) ✗ for i in `ls bb-*`
do
multitime -n100 -s0 ./$i -e 'nil'
echo -Size-
bsz=`stat -f "%z" bb-19.3.1`
sz=`stat -f "%z" $i`
diff=$(echo $sz - $bsz|bc)
printf '%s\t\t%s\t%s\n' $i $sz $diff
otool -L $i
done
===> multitime results
1: ./bb-19.3.1 -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.010       0.005       0.008       0.010       0.061
user        0.004       0.000       0.003       0.004       0.004
sys         0.004       0.001       0.003       0.004       0.009
-Size-
bb-19.3.1		48145656	0
bb-19.3.1:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1673.126.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
===> multitime results
1: ./bb-20.2.0 -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.014       0.000       0.013       0.013       0.015
user        0.006       0.000       0.006       0.006       0.006
sys         0.005       0.000       0.005       0.005       0.006
-Size-
bb-20.2.0		54356548	6210892
bb-20.2.0:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1673.126.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1673.126.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
===> multitime results
1: ./bb-master -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.012       0.008       0.011       0.011       0.097
user        0.005       0.000       0.005       0.005       0.006
sys         0.005       0.001       0.004       0.004       0.013
-Size-
bb-master		54356564	6210908
bb-master:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1673.126.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1673.126.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
===> multitime results
1: ./bb-reverts -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.010       0.000       0.009       0.010       0.013
user        0.004       0.000       0.004       0.004       0.005
sys         0.004       0.000       0.004       0.004       0.005
-Size-
bb-reverts		54160544	6014888
bb-reverts:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1673.126.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)

Some observations

  • I did not witness a 28ms run-time for this particular work load in any of the builds.
  • When the Foundation and the objc runtime libraries were removed the runtime did increase(barely).
  • The size of the binary has increased as you noticed however at this time is still unclear what is the root cause given that reverting the suspicious commits did not have a impact on that

@eginez
Copy link
Contributor

eginez commented Apr 24, 2020

@loicottet can probably expand on the need for 5c5417b. To investigate the size @borkdude I'd like to run a git bisect to see if I can find a probable cause. On your earlier comment(#2136 (comment)) it seems like you successfully created a build that was considered good using an snapshot build of 20.1 would you be able to provide the git hash of such build? (found in $GRAALVM_HOME/Contents/Home/release)

@eginez
Copy link
Contributor

eginez commented Apr 24, 2020

Ok I actually ran git bisect with the range starting with the commit from 19.3.1 and ending in master and I found what seems to be the issue causing the increase:
9d5ce43 Load classes and resources for 'javax.xml.*'.
I confirmed that is actually the problem by disabling this feature https://github.com/oracle/graal/blob/0ee434c01b4fac9403d8c988f13041af8394467c/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/xml/JavaxXmlClassAndResourcesLoaderFeature.java and on that build the size

➜  babashka git:(master) ✗ stat -f "%z" bb
46696340

@arodionov I know you have been working on this feature, is there something we can do minimize the impact on the image size

@olpaw
Copy link
Member

olpaw commented Apr 24, 2020

@eginez thanks a lot for investigating!

@olpaw
Copy link
Member

olpaw commented Apr 24, 2020

I don't recall any previous change bringing in libobjc directly.

Ahh sorry I remember now ... this was driven by the need to generate OSX binaries that do not link against the internal CoreFoundation library. @loicottet implemented that. Its needed for OSX so that the binaries that we produce can be uploaded to AppStore. See #2266

@borkdude
Copy link
Author

Thanks @eginez @olpaw !

Regarding binary size

If this is related to XML, then it makes sense. Babashka includes xml functionality, but to do so, I've only added these classes to the reflection config:

com.sun.xml.internal.stream.XMLInputFactoryImpl
{:methods [{:name "<init>" :parameterTypes []}]}
com.sun.xml.internal.stream.XMLOutputFactoryImpl
{:methods [{:name "<init>" :parameterTypes []}]}

How does the XML feature work? Does it automatically detect XML usage and can I now remove these classes from the reflection config for version 20.1 or 20.2?

Regarding startup time

This is what I'm seeing with the latest dev release (https://github.com/graalvm/graalvm-ce-dev-builds/releases/tag/20.2.0-dev-20200423_0149) on my system, a Macbook Pro 2019 i9, 32GB ram, macOS mojave 10.14.6:

$ multitime -n100 -s0 ./bb -e 'nil'
===> multitime results
1: ./bb -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.023       0.002       0.019       0.023       0.031
user        0.010       0.001       0.008       0.010       0.012
sys         0.010       0.002       0.007       0.010       0.017
$ otool -L ./bb
./bb:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.4)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1575.17.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1575.17.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

With 19.3.1:

$ multitime -n100 -s0 tmp/bb-19.3.1 -e 'nil'
===> multitime results
1: tmp/bb-19.3.1 -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.011       0.005       0.009       0.010       0.061
user        0.004       0.000       0.003       0.004       0.005
sys         0.004       0.001       0.003       0.004       0.010
$ otool -L tmp/bb-19.3.1
tmp/bb-19.3.1:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.4)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1575.17.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)

This is a 12ms startup regression on my system. I'm not planning to deploy my binaries to the AppStore and I would like to opt out of this startup regression if that is possible.

@arodionov
Copy link

@borkdude

How does the XML feature work?

During the analysis, it detects if there are methods in a call-tree that instantiate XML-parsers via reflection, and registers those parsers. The feature code.

Does it automatically detect XML usage and can I now remove these classes from the reflection config for version 20.1 or 20.2?

Yes, you can remove reflection config for XML.
For 20.1 it will be used the previous version of this feature (that brings all XML-parser and is not optimal). For 20.2, I think, it will be the current implementation.

@arodionov
Copy link

arodionov commented Apr 24, 2020

@eginez

I know you have been working on this feature, is there something we can do minimize the impact on the image size

This benchmark uses the last (already optimized) version of the feature. We can try to check maybe some libraries or frameworks also triggers XML-parsers instantiation and substitute those calls. Should have a call-tree for this

@arodionov
Copy link

arodionov commented Apr 24, 2020

Have checked, during image build, the following parsers are instantiated via methods calls (it means that those methods are present somewhere in a babashka code or libraries):

  • javax.xml.stream.FactoryFinder.find(java.lang.Class, java.lang.String) -> StAXParserClasses
  • javax.xml.transform.FactoryFinder.find(java.lang.Class, java.lang.String) -> TransformerClassesAndResources
  • javax.xml.parsers.DocumentBuilderFactory.newInstance() -> DOMParserClasses
  • javax.xml.parsers.SAXParserFactory.newInstance() -> SAXParserClasses

@borkdude
Copy link
Author

@arodionov Thank you for explaining

@arodionov
Copy link

@borkdude
Copy link
Author

@arodionov I only recently added support for postgres to babashka, only as of today actually. Try with commit 2fef8a793e22237a53059737cab91f336aef699b, that was the commit before psql support was added.

I'm also adding clojure.data.xml to it, so you can do things like this:

$ echo '"<foo></foo>"' | bb '(-> *input* xml/parse-str xml/emit-str)'
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo/>"

so I'm not surprised that XML-related classes are detected, I was only surprised by the difference in binary size between 19.3.1 and 20.1-dev/20.2-dev.

@arodionov
Copy link

arodionov commented Apr 24, 2020

@borkdude yes, you are right, for the provided commit the following methods bring all XML via:

  • clojure.data.xml.jvm.parse$make_input_factory.invokeStatic(java.lang.Object)
  • clojure.data.xml.jvm.emit$write_document.invokeStatic(java.lang.Object, java.lang.Object, java.lang.Object)
  • java.util.Properties.storeToXML(java.io.OutputStream, java.lang.String, java.lang.String) -> XmlSupport.save
  • java.util.Properties.loadFromXML(java.io.InputStream) -> XmlSupport.load

Most of the XML classes are brought by java.util.Properties methods. For Java 11 storeToXML and loadFromXML are implemented in a different way (without reflection usage and triggering almost all parsers).

@arodionov
Copy link

@borkdude in your reflection.json config I'd like to advise to specify methods for reflection more selectively (it can reduce the size of the image).
For example, for
{ "name" : "java.util.Properties", "allPublicMethods" : true, ... }
you can exclude storeToXML and loadFromXML to be analyzed and included into the image (with all XML-related stuff).

@borkdude
Copy link
Author

@arodionov Thanks for the advice.
I have tried this now with the 19.3.1 version of babashka:

$ ./bb '(.storeToXML (System/getProperties) System/out "foo")'
clojure.lang.ExceptionInfo: Provider com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl not found [at line 1, column 1]

As you can see this fails. But with 20.2-dev this works:

$ tmp/bb-20.2-dev '(.storeToXML (System/getProperties) System/out "foo")'
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>foo</comment>
<entry key="java.class.path"/>
<entry key="user.name">borkdude</entry>
<entry key="java.ext.dirs"/>
<entry key="java.vendor">Oracle Corporation</entry>

So this explains the extra binary size: there is actually more functionality available now.
Since this is a Clojure interpreter, I don't know ahead of time which classes and methods people are going to use, but I'll consider removing java.util.Properties.storeToXML and java.util.Properties.loadFromXML since these methods are not likely to be used in Clojure programs.

@borkdude
Copy link
Author

@sogaiu did some measurements and omitting the java.util.Property classes saves about 5MB of binary size, so this explains that part.

@eginez @olpaw I think the only unresolved bit in this issue is startup time and dynamically linking. I'll just past that part of the comment here again.

This is what I'm seeing with the latest dev release (https://github.com/graalvm/graalvm-ce-dev-builds/releases/tag/20.2.0-dev-20200423_0149) on my system, a Macbook Pro 2019 i9, 32GB ram, macOS mojave 10.14.6:

$ multitime -n100 -s0 ./bb -e 'nil'
===> multitime results
1: ./bb -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.023       0.002       0.019       0.023       0.031
user        0.010       0.001       0.008       0.010       0.012
sys         0.010       0.002       0.007       0.010       0.017
$ otool -L ./bb
./bb:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.4)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1575.17.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1575.17.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

With 19.3.1:

$ multitime -n100 -s0 tmp/bb-19.3.1 -e 'nil'
===> multitime results
1: tmp/bb-19.3.1 -e nil
            Mean        Std.Dev.    Min         Median      Max
real        0.011       0.005       0.009       0.010       0.061
user        0.004       0.000       0.003       0.004       0.005
sys         0.004       0.001       0.003       0.004       0.010
$ otool -L tmp/bb-19.3.1
tmp/bb-19.3.1:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.4)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.250.1)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1575.17.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)

This is a 12ms startup regression on my system. I'm not planning to deploy my binaries to the AppStore and I would like to opt out of this startup regression if that is possible.

@olpaw
Copy link
Member

olpaw commented Apr 27, 2020

This is a 12ms startup regression on my system. I'm not planning to deploy my binaries to the AppStore and I would like to opt out of this startup regression if that is possible.

If Apple does not want users to link to CoreFoundation directly we would like to respect that.

It would be interesting to see why your measurement results do not seem to match the results from @loicottet as shown on #2266. Maybe a different version of OSX or XCode is responsible for that mismatch.

@borkdude
Copy link
Author

borkdude commented Apr 27, 2020

@olpaw

@olpaw
Copy link
Member

olpaw commented Apr 27, 2020

Was this a request made by Apple? I read #2266 as a way of getting things into the AppStore, not as something that should be avoided?

I don't think so. See #1568 (comment) from @johanvos

Relying on linking against non-public symbols sounds like a bad idea to me ... independent from being able to upload native-images to the AppStore. @christianwimmer how should we handle this case?

@borkdude
Copy link
Author

borkdude commented Apr 27, 2020

@olpaw To be clear, I'm only interested in getting the good startup time on macOS back like it was in 19.3.1. How that is done is an implementation detail to me, but maybe there is not much that can be done? Is the result of loading the public API that 2 more libraries compared to 19.3.1 are being loaded, i.e. the public API loading 2 other libraries (including the private one that was used before)?

@olpaw
Copy link
Member

olpaw commented Apr 27, 2020

Is the result of loading the public API that 2 more libraries compared to 19.3.1 are being loaded, i.e. the public API loading 2 other libraries (including the private one that was used before)?

https://github.com/oracle/graal/pull/2266/files#diff-bfc44cccccc87507300243b7d38779bd
I assume us now having -ObjC on the linker command line plays out like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants