Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store IR caches in suitable format to read them "on demand" #8207

Merged
merged 102 commits into from
Nov 19, 2023

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Nov 1, 2023

Pull Request Description

Major problem of #6100 is that we load 28MB of .ir caches at startup before doing anything useful. Rather than trying to speed up loading of all such data, we want to avoid loading of such a huge amount of data completely.

This PR introduces Persistance interface that can be registered for a particular Class (with or without its subclasses) and serialize and deserialize its instances via Persistance.Input and Persistance.Output. As serialization of most of the Scala case classes used by IR is very similar, there is an @Persistable annotation and associated annotation processor that can generate most of the boiler plate code automatically based on case class constructor signature.

Essential difference with respect to regular serde frameworks is the ability to load just a reference to an object without materializing it in the memory immediately. That allows creation of lazy sequences and their use in Method.Explicit that drastically cut out the amount of objects that has to be deserialized when loading .ir files.

Important Notes

There are many ways to improve this approach (see Future Work section), however measurements held on 177c44e indicate that there already is a speedup when executing simple:

from Standard.Base import IO

main = IO.println "Ahoj"

as regular 6cf75da version takes 4.3s while the version in this PR is able to finish in 3.6s. This clearly demonstrates the lazy deserialization is a viable approach which can deliver on the promise of restoring caches effectively.

Similar approach was suggested for Tree serde, but in the case of IR it makes even bigger sense. Most of the IR tree is not needed at early stage of the execution.

The advantaged of those manually written or annotation processor generated Persistance subclasses is their native image friendliness. There is no reflection and they should work with native image automatically.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

Future Work

The Persistance format shall benefit from huge cache files mmapped into memory. Rather than having a lot of small .ir files for each module in standard library, let's have just a single file per each library. Btw. there already is such a file for BindingsMaps per each library - with the new format we can include whole IR in there.

Optimize for GC - so far this PR only loads objects lazily on demand, but once they are loaded, they are never freed. That could very likely be improved. Once the IR tree of a method is processed, it could be GCed. Lowering the memory requirements during execution.

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 1, 2023
@JaroslavTulach JaroslavTulach requested a review from kazcw November 1, 2023 17:36
@JaroslavTulach JaroslavTulach requested a review from 4e6 as a code owner November 1, 2023 17:36
@JaroslavTulach JaroslavTulach self-assigned this Nov 1, 2023
@JaroslavTulach JaroslavTulach marked this pull request as draft November 1, 2023 17:36
@JaroslavTulach JaroslavTulach changed the title Read IR caches "lazily" Store IR caches in suitable format to read them "on demand" Nov 2, 2023
@JaroslavTulach JaroslavTulach force-pushed the wip/jtulach/Persistance_6100 branch from 7d9e271 to 400b151 Compare November 2, 2023 09:19

import static org.enso.persist.PerUtils.raise;

/** Central persistance class. Use static {@link
Copy link
Member Author

@JaroslavTulach JaroslavTulach Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use:

enso$ find lib/java/persistance/src/main/java/ | grep java$ | xargs /graalvm-21/bin/javadoc -d target/javadoc/ --snippet-path lib/java/persistance/src/test/java/

to generate target/javadoc/index.html and view this documentation in proper form.

@JaroslavTulach JaroslavTulach added CI: Ready to merge This PR is eligible for automatic merge and removed CI: Ready to merge This PR is eligible for automatic merge labels Nov 18, 2023
.settings(
version := "0.1",
frgaalJavaCompilerSetting,
Compile / javacOptions := ((Compile / javacOptions).value),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be unnecessary

.settings(
version := "0.1",
frgaalJavaCompilerSetting,
Compile / javacOptions := ((Compile / javacOptions).value ++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do javacOptons ++= Seq() the same way it is done for libraryDependencies

build.sbt Show resolved Hide resolved
@@ -49,13 +49,34 @@ object Method {
*/
sealed case class Explicit(
override val methodReference: Name.MethodReference,
override val body: Expression,
val bodySeq: Seq[Expression],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving it like that is error-prone. Someone will forget that access to the body triggers the serialization logic and it will hurt the performance. It should be reflected in the type signature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to write a test to verify that the deserialization is lazy. I may do it in a subsequent PR.

@JaroslavTulach JaroslavTulach merged commit ba19813 into develop Nov 19, 2023
31 of 35 checks passed
@JaroslavTulach JaroslavTulach deleted the wip/jtulach/Persistance_6100 branch November 19, 2023 15:38
@JaroslavTulach JaroslavTulach mentioned this pull request May 28, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants