Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compilation phase for codec registration checker #135

Open
PawelLipski opened this issue Nov 22, 2021 · 10 comments · Fixed by #155
Open

Optimize compilation phase for codec registration checker #135

PawelLipski opened this issue Nov 22, 2021 · 10 comments · Fixed by #155
Labels
nice to have Not very strictly needed for ASH to be usable performance

Comments

@PawelLipski
Copy link
Collaborator

Time measurement of compilation phases on Hydra (commercial) project, with -Ystatistics flag:

[info]   parser                      : 1 spans, ()473.728ms (0.2%)
[info]   namer                       : 1 spans, ()118.276ms (0.0%)
[info]   packageobjects              : 1 spans, ()1.863ms (0.0%)
[info]   typer                       : 1 spans, ()124027.494ms (52.1%)
[info]   semanticdb-typer            : 1 spans, ()10044.861ms (4.2%)
[info]   dump-persistence-schema     : 1 spans, ()844.721ms (0.4%)
[info]   superaccessors              : 1 spans, ()353.506ms (0.1%)
[info]   extmethods                  : 1 spans, ()74.814ms (0.0%)
[info]   pickler                     : 1 spans, ()150.427ms (0.1%)
[info]   xsbt-api                    : 1 spans, ()1645.657ms (0.7%)
[info]   xsbt-dependency             : 1 spans, ()1767.425ms (0.7%)
[info]   refchecks                   : 1 spans, ()7264.879ms (3.1%)
[info]   codec-registration-class-sweep: 1 spans, ()129.464ms (0.1%)
[info]   patmat                      : 1 spans, ()11113.825ms (4.7%)
[info]   serializability-checker     : 1 spans, ()662.502ms (0.3%)
[info]   codec-registration-serializer-check: 1 spans, ()38826.128ms (16.3%)
[info]   uncurry                     : 1 spans, ()7006.531ms (2.9%)
[info]   fields                      : 1 spans, ()5867.108ms (2.5%)
[info]   tailcalls                   : 1 spans, ()352.656ms (0.1%)
[info]   specialize                  : 1 spans, ()954.033ms (0.4%)
[info]   explicitouter               : 1 spans, ()1027.995ms (0.4%)
[info]   erasure                     : 1 spans, ()10823.126ms (4.6%)
[info]   posterasure                 : 1 spans, ()684.086ms (0.3%)
[info]   lambdalift                  : 1 spans, ()1444.621ms (0.6%)
[info]   constructors                : 1 spans, ()496.269ms (0.2%)
[info]   flatten                     : 1 spans, ()535.716ms (0.2%)
[info]   mixin                       : 1 spans, ()675.139ms (0.3%)
[info]   cleanup                     : 1 spans, ()589.357ms (0.2%)
[info]   delambdafy                  : 1 spans, ()1370.026ms (0.6%)
[info]   jvm                         : 1 spans, ()8177.186ms (3.4%)
[info]   semanticdb-jvm              : 1 spans, ()13.902ms (0.0%)
[info]   xsbt-analyzer               : 1 spans, ()168.936ms (0.1%)

or just including the phases added by ash:

[info]   dump-persistence-schema     : 1 spans, ()844.721ms (0.4%)
[info]   codec-registration-class-sweep: 1 spans, ()129.464ms (0.1%)
[info]   serializability-checker     : 1 spans, ()662.502ms (0.3%)
[info]   codec-registration-serializer-check: 1 spans, ()38826.128ms (16.3%)

To be researched what specifically in org.virtuslab.ash.SerializerCheckCompilerPluginComponent#newPhase makes it so slow...
probably b/c it checks for every combination of types discovered in the @Serializer-annotated class and the types detected in the preceding codec-registration-class-sweep phase

@MarconZet
Copy link
Collaborator

This problem will likely be solved by replacing some Lists with Sets.

@PawelLipski PawelLipski linked a pull request Feb 14, 2022 that will close this issue
@LukaszKontowski
Copy link
Contributor

LukaszKontowski commented May 23, 2022

Results from my compilation statistics for our phases (slightly different from PawelLipski's snippet):

[info] dump-persistence-schema : 1 spans, ()2651.299ms (2.4%)
[info] codec-registration-class-sweep: 1 spans, ()46.157ms (0.0%)
[info] serializability-checker : 1 spans, ()255.628ms (0.2%)
[info] codec-registration-serializer-check: 1 spans, ()12359.217ms (11.4%)

And flamegraph generated by https://github.com/jvm-profiling-tools/async-profiler profiler on Hydra - in attachment. Potentially correlated fact is that:

  1. for dump-persistence-schema phase's (2,4% compilation time)newPhase - there were 160 samples ddetected by profiler
  2. for codec-registration-serializer-check (11,4% compilation time) phase's newPhase - there were 955 such samples

So, it might be possible, that this long time is not caused by some lack of optimization inside SerializerCheckCompilerPluginComponent#newPhase (tried a lot of things to optimize it more, but failed to shorten it's duration). The problem might be caused by the fact, that this SerializerCheckCompilerPluginComponent#newPhase method gets invoked round 6 * more times than similar method from dump-persistence-schema phase. (this last finding is not true)

@PawelLipski
Copy link
Collaborator Author

The problem might be caused by the fact, that this SerializerCheckCompilerPluginComponent#newPhase method gets invoked round 6 * more times than similar method from dump-persistence-schema phase.

Wow, that's a non-trivial discovery indeed 🤔

@LukaszKontowski
Copy link
Contributor

Sorry, in my last comment I was wrong - SerializerCheckCompilerPluginComponent#newPhase gets invoked only one time and creates one instance of StdPhase type, but - underlying StdPhase#apply method gets invoked hundreds of times - e.g. 541 times (from my last test).
For now, I do not know yet, why it gets invoked so many times.

@PawelLipski
Copy link
Collaborator Author

Reopening as @LukaszKontowski's research is still ongoing

@LukaszKontowski
Copy link
Contributor

With #155 managed to go down to 3,8% for the codec-registration-serializer-check phase. Now working on further improvement, which probably is possible by providing more optimization in the private def processSerializerClass

@LukaszKontowski
Copy link
Contributor

Small improvement for this phase in another PR: #158 (down from 3,8% to 3,5%)

@LukaszKontowski
Copy link
Contributor

LukaszKontowski commented May 31, 2022

The heavy part of code that makes this phase slow is within https://github.com/VirtusLab/akka-serialization-helper/blob/main/codec-registration-checker-compiler-plugin/src/main/scala/org/virtuslab/ash/SerializerCheckCompilerPluginComponent.scala.

In particular:

val foundTypes = {
          try {
            serializerImplDef
              .collect {
                case x: Tree if x.tpe != null => x.tpe
              }
              .distinct
              .filter(_.toString.matches(filterRegex))

this _.toString from last line appears on lots of samples generated by the profiler (see attached snippet).

image

However, it might be possible to further improve performance for this block code - if we could somehow narrow down the number of Types that we get from Trees here:

.collect {
  case x: Tree if x.tpe != null => x.tpe
}

(sample collection returned by this collect with distinct - in attached tmp2.txt file - one type per line).

tmp2.txt

Because for now this collect returns all Types found within one AST (including things like package names, types from scala standard library etc.) - sometimes the number of types (after distinct) is more than 20000. And we need only a small subset from this collection - types pointed by user in his custom code that match regex. I tried to filter these types further before calling toString - by some methods from https://www.scala-lang.org/api/2.13.3/scala-reflect/scala/reflect/api/Trees$Tree.html and https://www.scala-lang.org/api/2.13.3/scala-reflect/scala/reflect/api/Types$Type.html in order to get only types that are "real candidates" for checking regex (for example - to remove package types and some other not needed types before calling toString) but failed to achieve anything.

I also tried to narrow down Trees that we are checking by changing this line of code:

case x: Tree if x.tpe != null => x.tpe

into something like this:

case x @ (_: Some_Subtype_1 | _: Some_Subtype_2) if x.tpe != null => x.tpe

with known subclasses of Tree:

image

but unfortunately failed to achieve better speed (and in most cases - it also did not work).

However, I think that it still might be possible to further improve this and we can leave this issue alive - probably I failed for now because of small knowledge of the scala reflection internals.

@LukaszKontowski
Copy link
Contributor

Helper comment for future attempts - in order to enable -Ystatistics flag on Hydra properly:

  1. in ./build.sbt inside definition of
lazy val `phoenix-backend`

add following line to .settings

Compile / scalacOptions += "-Ystatistics"
  1. in ./project/Build.scala comment this line:
"-Ybackend-parallelism",

inside lazy val common settingss

  1. install async-profiler as described on https://www.lightbend.com/blog/profiling-jvm-applications

  2. run sbt on checked project (Hydra) - then in another terminal window run jps command to get this sbt's process id.

  3. run async-profiler with actual process id so that it will produce flamegraph properly (note: output file must be .html not .svg - svg is deprecated)

@PawelLipski PawelLipski removed the good first issue Good for newcomers label Jun 10, 2022
@LukaszKontowski
Copy link
Contributor

Possibly working improvement - to be tested / checked / discussed etc. - for now it's just an idea that worked on one project and improved performance of the phase by 25%

Replace

serializerImplDef
  .collect {
    case tree: Tree if tree.tpe != null => tree.tpe
  }

with

serializerImplDef
  .collect {
    case tree @ (_: TypTree | _: Select | _: GenericApply) if tree.tpe != null => tree.tpe
  }

@PawelLipski PawelLipski added the nice to have Not very strictly needed for ASH to be usable label Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nice to have Not very strictly needed for ASH to be usable performance
Projects
None yet
4 participants