Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue tracking #116

Closed
eiennohito opened this issue May 11, 2016 · 11 comments
Closed

Performance issue tracking #116

eiennohito opened this issue May 11, 2016 · 11 comments

Comments

@eiennohito
Copy link
Contributor

It takes a lot in profiles. 3-4 times more than actual serialization of the message (writeTo).

However, before doing anything need to write a benchmark for this.

@thesamet
Copy link
Contributor

Thanks for filing this. At this point, we haven't done much to optimize the performance of ScalaPB but it's been on my mind. It would be really nice to have an example benchmark we can work with.

@eiennohito
Copy link
Contributor Author

I will try to do benchmarks and to experiment making serializedSize a simple (non-volatile) var with a sentinel value like java.lang.String does with its hashcode.

But not right now.

@thesamet
Copy link
Contributor

Can you share some details on the message that caused the slow serializedSize time? Deeply nested? Big arrays? Maps? One ofs? Anything else?

@eiennohito
Copy link
Contributor Author

eiennohito commented May 14, 2016

I don't know yet because I did not do exact benchmarking.
I believe that lazy val is one of main culprits.
But I had 5-6 level nesting, in that case, yes.

@eiennohito
Copy link
Contributor Author

eiennohito commented May 17, 2016

Okay, I've done some benchmarking.

Project: https://github.com/eiennohito/scalapb-jmh-bench
Benchmarking Code, Protobuf

For the test data I use three level nesting: Obj1[ 10x Obj2 [ 50x Obj3[ string ] ] ]
There are three setups:

  • copyOnly (baseline): doing a deep copy (because we need to reinit serializedSize of all objects)
  • copyAndSize: baseline + calculation of serializedSize
  • copyAndSerialize: copyAndSize + creating CodedOutputStream with preallocated array + serialization of size + serialization of the message itself

I run tests on JDK 1.8.0_92, Xeon E5-2667 v3, CentOS 6.7

Benchmark                Mode  Cnt   Score   Error  Units
Bench2.copyAndSerialize  avgt   10  31.266 ± 0.180  us/op
Bench2.copyAndSize       avgt   10  23.249 ± 0.174  us/op
Bench2.copyOnly          avgt   10   4.002 ± 0.006  us/op

Basically there is +20microsec/op on calculating size and +10microsec/op on serializing the object itself. I will use this project for the further analysis.

@eiennohito
Copy link
Contributor Author

eiennohito commented May 18, 2016

Benchmarking serializedSize

Methodology:

  • Using JMH
  • Use protobuf objects three nestings deep with a single string field at the bottom
  • Two outer objects have repeated fields
  • Nestings are 10x, 50x, in total there are 500 objects of 3rd level
  • Run paremeters: -i 50 -wi 20 -f1 -r 500ms
  • Need to do large warmup to get GC to adapt for the load
  • Hardware: Xeon E5-2680 (Sandy Bridge)
  • Software: Linux CentOS 6.7, JDK 1.8.0_92
  • Scalac: "-Ybackend:GenBCode", "-Yopt:l:classpath", "-target:jvm-1.8", "-Ydelambdafy:method"
    Does not change picture, but is a bit faster than without delambdafying

Benchmarks:

  • copyOnly: baseline - deep copy of object, to reset serializedSize
  • copyAndSize: baseline + calculate serializedSize
  • copyAndSerialize: copyAndSize + actual serialization to preallocated and fixed array

Conditions:

  • control: version 0.5.26 generated code
  • changed lazy val of innermost object to private var
  • changed lazy val of second level nesting to private var
  • changed foreach on second level nesting to iterator with while loop

Discussion

I did this test on a bit older hardware.
Sanity test: copyOnly is almost the same everywhere, good.

Here is a full histogram for git checkout bench01.
https://gist.github.com/eiennohito/db4f25d0c6f1e9514f01a4ca5dd9d4a6
There are some outliers because of GC, but we can trust averages, I believe.

lazy val is slow

By replacing lazy val => private var on the most inner level,
we get 4x improvement on us/op.
Second level gets some improvement as well, but it is not as visible.

foreach => iterator gets marginal improvement

Removing closures on the second level improves running time of size+serialize ~10%.
But the average running time of size only increases ~5%.
Maybe it's because jvm inlining of foreach with the passed function.
I did not dig deep into this.

Results

control: git checkout bench01~3

[info] Benchmark                Mode  Cnt    Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  176.404 ± 0.222  us/op
[info] Bench3.copyAndSize       avgt   50  165.415 ± 0.578  us/op
[info] Bench3.copyOnly          avgt   50    6.250 ± 0.051  us/op

l3: lazy val => private var git checkout bench01~2

[info] Benchmark                Mode  Cnt   Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  42.959 ± 0.053  us/op
[info] Bench3.copyAndSize       avgt   50  26.402 ± 0.023  us/op
[info] Bench3.copyOnly          avgt   50   5.808 ± 0.049  us/op

l2: lazy val => private var git checkout bench01~1

[info] Benchmark                Mode  Cnt   Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  39.879 ± 0.070  us/op
[info] Bench3.copyAndSize       avgt   50  20.001 ± 0.060  us/op
[info] Bench3.copyOnly          avgt   50   6.073 ± 0.041  us/op

l2: foreach => iterator git checkout bench01

[info] Benchmark                Mode  Cnt   Score   Error  Units
[info] Bench3.copyAndSerialize  avgt   50  33.763 ± 0.052  us/op
[info] Bench3.copyAndSize       avgt   50  21.106 ± 0.060  us/op
[info] Bench3.copyOnly          avgt   50   5.715 ± 0.051  us/op

@eiennohito
Copy link
Contributor Author

If you don't mind, I will replace current generation of serializedSize by something on the lines of eiennohito/scalapb-jmh-bench@e17979e

eiennohito added a commit to eiennohito/ScalaPB that referenced this issue May 18, 2016
 lazy vals are replaced by transient backing fields with -1 as sentinel value
 and lazy initialization in accessor methods

 The best low hanging fruit for scalapb#116 right now.

 Motivation for the fix is described in the scalapb#116
@thesamet thesamet reopened this Aug 29, 2016
@eiennohito
Copy link
Contributor Author

I think we should either close this issue or rename it to something like "Performance issues: Tracking"

@thesamet thesamet changed the title serializedSize is slow Performance issue tracking Aug 29, 2016
@RamakrishnaHande
Copy link

Is there a bench marking results with most recent library version ? I have a big protobuf file that has 174MB size, however takes 4 seconds to deserialize it , but python takes 2 millis for the same

@thesamet
Copy link
Contributor

thesamet commented Aug 9, 2024

We haven't run the benchmarks for a while, but there weren't changes that would warrant a regression from 0.10.x. Of course, everything is possible. Can you create a minimal example project with a protobuf that would take 4 seconds to deserialize in Scala while it's 2 millis in python (or something that demonstrate an inefficiency)?

@RamakrishnaHande
Copy link

Hello @thesamet the python code for the same takes 2 seconds, not 2ms , sorry for the wrong information .
I will come up with a minimal reproducible example that demonstrates same in both scala and python, in new few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants