Performance issue tracking #116

eiennohito · 2016-05-11T09:17:25Z

It takes a lot in profiles. 3-4 times more than actual serialization of the message (writeTo).

However, before doing anything need to write a benchmark for this.

thesamet · 2016-05-14T04:06:36Z

Thanks for filing this. At this point, we haven't done much to optimize the performance of ScalaPB but it's been on my mind. It would be really nice to have an example benchmark we can work with.

eiennohito · 2016-05-14T04:17:14Z

I will try to do benchmarks and to experiment making serializedSize a simple (non-volatile) var with a sentinel value like java.lang.String does with its hashcode.

But not right now.

thesamet · 2016-05-14T05:01:21Z

Can you share some details on the message that caused the slow serializedSize time? Deeply nested? Big arrays? Maps? One ofs? Anything else?

eiennohito · 2016-05-14T05:03:50Z

I don't know yet because I did not do exact benchmarking.
I believe that lazy val is one of main culprits.
But I had 5-6 level nesting, in that case, yes.

eiennohito · 2016-05-17T07:44:03Z

Okay, I've done some benchmarking.

Project: https://github.com/eiennohito/scalapb-jmh-bench
Benchmarking Code, Protobuf

For the test data I use three level nesting: Obj1[ 10x Obj2 [ 50x Obj3[ string ] ] ]
There are three setups:

copyOnly (baseline): doing a deep copy (because we need to reinit serializedSize of all objects)
copyAndSize: baseline + calculation of serializedSize
copyAndSerialize: copyAndSize + creating CodedOutputStream with preallocated array + serialization of size + serialization of the message itself

I run tests on JDK 1.8.0_92, Xeon E5-2667 v3, CentOS 6.7

Benchmark                Mode  Cnt   Score   Error  Units
Bench2.copyAndSerialize  avgt   10  31.266 ± 0.180  us/op
Bench2.copyAndSize       avgt   10  23.249 ± 0.174  us/op
Bench2.copyOnly          avgt   10   4.002 ± 0.006  us/op

Basically there is +20microsec/op on calculating size and +10microsec/op on serializing the object itself. I will use this project for the further analysis.

eiennohito · 2016-05-18T06:00:26Z

Benchmarking serializedSize

Methodology:

Using JMH
Use protobuf objects three nestings deep with a single string field at the bottom
Two outer objects have repeated fields
Nestings are 10x, 50x, in total there are 500 objects of 3rd level
Run paremeters: -i 50 -wi 20 -f1 -r 500ms
Need to do large warmup to get GC to adapt for the load
Hardware: Xeon E5-2680 (Sandy Bridge)
Software: Linux CentOS 6.7, JDK 1.8.0_92
Scalac: "-Ybackend:GenBCode", "-Yopt:l:classpath", "-target:jvm-1.8", "-Ydelambdafy:method"
Does not change picture, but is a bit faster than without delambdafying

Benchmarks:

copyOnly: baseline - deep copy of object, to reset serializedSize
copyAndSize: baseline + calculate serializedSize
copyAndSerialize: copyAndSize + actual serialization to preallocated and fixed array

Conditions:

control: version 0.5.26 generated code
changed lazy val of innermost object to private var
changed lazy val of second level nesting to private var
changed foreach on second level nesting to iterator with while loop

Discussion

I did this test on a bit older hardware.
Sanity test: copyOnly is almost the same everywhere, good.

Here is a full histogram for git checkout bench01.
https://gist.github.com/eiennohito/db4f25d0c6f1e9514f01a4ca5dd9d4a6
There are some outliers because of GC, but we can trust averages, I believe.

lazy val is slow

By replacing lazy val => private var on the most inner level,
we get 4x improvement on us/op.
Second level gets some improvement as well, but it is not as visible.

`foreach` => `iterator` gets marginal improvement

Removing closures on the second level improves running time of size+serialize ~10%.
But the average running time of size only increases ~5%.
Maybe it's because jvm inlining of foreach with the passed function.
I did not dig deep into this.

Results

control: `git checkout bench01~3`

[info] Benchmark                Mode  Cnt    Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  176.404 ± 0.222  us/op
[info] Bench3.copyAndSize       avgt   50  165.415 ± 0.578  us/op
[info] Bench3.copyOnly          avgt   50    6.250 ± 0.051  us/op

l3: lazy val => private var `git checkout bench01~2`

[info] Benchmark                Mode  Cnt   Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  42.959 ± 0.053  us/op
[info] Bench3.copyAndSize       avgt   50  26.402 ± 0.023  us/op
[info] Bench3.copyOnly          avgt   50   5.808 ± 0.049  us/op

l2: lazy val => private var `git checkout bench01~1`

[info] Benchmark                Mode  Cnt   Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  39.879 ± 0.070  us/op
[info] Bench3.copyAndSize       avgt   50  20.001 ± 0.060  us/op
[info] Bench3.copyOnly          avgt   50   6.073 ± 0.041  us/op

l2: foreach => iterator `git checkout bench01`

[info] Benchmark                Mode  Cnt   Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  33.763 ± 0.052  us/op
[info] Bench3.copyAndSize       avgt   50  21.106 ± 0.060  us/op
[info] Bench3.copyOnly          avgt   50   5.715 ± 0.051  us/op

eiennohito · 2016-05-18T06:03:14Z

If you don't mind, I will replace current generation of serializedSize by something on the lines of eiennohito/scalapb-jmh-bench@e17979e

lazy vals are replaced by transient backing fields with -1 as sentinel value and lazy initialization in accessor methods The best low hanging fruit for scalapb#116 right now. Motivation for the fix is described in the scalapb#116

eiennohito · 2016-08-29T05:52:49Z

I think we should either close this issue or rename it to something like "Performance issues: Tracking"

RamakrishnaHande · 2024-08-09T10:25:25Z

Is there a bench marking results with most recent library version ? I have a big protobuf file that has 174MB size, however takes 4 seconds to deserialize it , but python takes 2 millis for the same

thesamet · 2024-08-09T13:12:59Z

We haven't run the benchmarks for a while, but there weren't changes that would warrant a regression from 0.10.x. Of course, everything is possible. Can you create a minimal example project with a protobuf that would take 4 seconds to deserialize in Scala while it's 2 millis in python (or something that demonstrate an inefficiency)?

RamakrishnaHande · 2024-08-12T13:27:33Z

Hello @thesamet the python code for the same takes 2 seconds, not 2ms , sorry for the wrong information .
I will come up with a minimal reproducible example that demonstrates same in both scala and python, in new few days

eiennohito mentioned this issue May 18, 2016

replace lazy vals in size generation with explicit lazy initialization #119

Merged

thesamet closed this as completed in scalapb/scalapb-json4s@f171f01 Aug 29, 2016

thesamet reopened this Aug 29, 2016

thesamet changed the title ~~serializedSize is slow~~ Performance issue tracking Aug 29, 2016

thesamet closed this as completed Feb 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue tracking #116

Performance issue tracking #116

eiennohito commented May 11, 2016

thesamet commented May 14, 2016

eiennohito commented May 14, 2016

thesamet commented May 14, 2016

eiennohito commented May 14, 2016 •

edited

Loading

eiennohito commented May 17, 2016 •

edited

Loading

eiennohito commented May 18, 2016 •

edited

Loading

eiennohito commented May 18, 2016

eiennohito commented Aug 29, 2016

RamakrishnaHande commented Aug 9, 2024

thesamet commented Aug 9, 2024

RamakrishnaHande commented Aug 12, 2024

Performance issue tracking #116

Performance issue tracking #116

Comments

eiennohito commented May 11, 2016

thesamet commented May 14, 2016

eiennohito commented May 14, 2016

thesamet commented May 14, 2016

eiennohito commented May 14, 2016 • edited Loading

eiennohito commented May 17, 2016 • edited Loading

eiennohito commented May 18, 2016 • edited Loading

Benchmarking serializedSize

Discussion

lazy val is slow

foreach => iterator gets marginal improvement

Results

control: git checkout bench01~3

l3: lazy val => private var git checkout bench01~2

l2: lazy val => private var git checkout bench01~1

l2: foreach => iterator git checkout bench01

eiennohito commented May 18, 2016

eiennohito commented Aug 29, 2016

RamakrishnaHande commented Aug 9, 2024

thesamet commented Aug 9, 2024

RamakrishnaHande commented Aug 12, 2024

eiennohito commented May 14, 2016 •

edited

Loading

eiennohito commented May 17, 2016 •

edited

Loading

eiennohito commented May 18, 2016 •

edited

Loading

`foreach` => `iterator` gets marginal improvement

control: `git checkout bench01~3`

l3: lazy val => private var `git checkout bench01~2`

l2: lazy val => private var `git checkout bench01~1`

l2: foreach => iterator `git checkout bench01`