Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for the explicit-state model checker TLC #207

Open
lemmy opened this issue Nov 7, 2019 · 15 comments
Open

Add benchmarks for the explicit-state model checker TLC #207

lemmy opened this issue Nov 7, 2019 · 15 comments
Labels
benchmark-candidate Potential benchmark to be added

Comments

@lemmy
Copy link

lemmy commented Nov 7, 2019

The model checker (https://github.com/tlaplus/tlaplus) is optimized for throughput and has a single hot path. I can come up with benchmarks for a) scalability and b) throughput. One is a spec where evaluation of the next-state relation is cheap, the other has a costly next-state relation. Adjusting specs to the available hardware is just a matter or choosing a few parameters. Benchmarks can be distributed as two separate, self-contained jars.

@farquet
Copy link
Collaborator

farquet commented Nov 8, 2019

Thanks @lemmy for the suggestion !
A TLA+ benchmark would definitely fit very well in the spirit of the Renaissance suite. Would those benchmarks need to be written or do they already belong to https://github.com/tlaplus/Examples ?
I also see that everything is MIT licensed, so that's perfect.

One constraint we have for a new benchmark is to always deterministically compute the same thing. So we should avoid random aspects and side effects. So basically, two subsequent calls to run the workload should perform the exact same thing such that they stay comparable.
Another constraint is for the workload to take a few seconds to run on a standard machine. We're not too strict about this, but it would be good if it's in the same ballpark than the other benchmarks (so [0.5, 5] sec per iteration would be fine). But I guess we can always tweak some parameters to reach that goal.

If you are willing to open a PR, adding a new benchmark (or set of benchmarks) to renaissance is as simple as creating a subdirectory in the benchmarks directory, describe your maven dependencies in the sbt config and write the actual workload in a runIteration method of a class that extends Benchmark. This is detailed in the contribution guide.
Alternatively, you could point us to a demo workload and we can do the integration ourselves.

@farquet farquet added the benchmark-candidate Potential benchmark to be added label Nov 8, 2019
@axel22
Copy link
Member

axel22 commented Nov 8, 2019

Thanks for the suggestion - this looks like an interesting workload.
To add to what Francois said, there is a benchmark subfolder benchmarks/dummy that is a minimal example of how to add a new benchmark group - if you would like to go ahead and add a TLC benchmark, that's a good template to start from.

@lemmy
Copy link
Author

lemmy commented Nov 15, 2019

Is there an example that shows how to add a benchmark that comes as a self-contained jar file?

@farquet
Copy link
Collaborator

farquet commented Nov 15, 2019

All jar dependencies we have come from maven central and they are defined in the build.sbt of each project. Here is an example.

Wouldn't it be possible to have the required dependencies there and the thin benchmark layer as sources here ?

@lemmy
Copy link
Author

lemmy commented Nov 15, 2019

That could be done but the jar would be tailored to the benchmark. Not sure we want to "pollute" maven central with such a jar. By the way, the jar can be pack200ed to a few hundred kbs.

@farquet
Copy link
Collaborator

farquet commented Nov 15, 2019

I see. And how about the second option ?
Having the benchmark definition here has advantages like easier source code exploration, being able to adapt the workload size, etc. Could that be possible ?

If it's a matter of the work that has to be done to turn the sample application into a benchmark, we could do that part.

@lemmy
Copy link
Author

lemmy commented Nov 15, 2019

It is probably best to ditch the idea of a self-contained jar and instead use the vanilla jar from maven central and check the specs (input) into this repository. Is the benchmark layer going to fork a VM?

@axel22
Copy link
Member

axel22 commented Nov 15, 2019

Is the benchmark layer going to fork a VM?

By default, the harness does not fork a new VM.
The optional JMH-compliant build, however, uses JMH as the harness, which does fork a new VM.

@farquet
Copy link
Collaborator

farquet commented Nov 15, 2019

It is probably best to ditch the idea of a self-contained jar and instead use the vanilla jar from maven central and check the specs (input) into this repository.

That would be great indeed. It matches how the other benchmarks are built. For instance, we have several benchmarks that include any kind of input in a resources folder like here.

Alternatively, if the input is small it can also be inlined into the benchmark definition.

@lemmy
Copy link
Author

lemmy commented Nov 15, 2019

Two questions:

  • How do you make sbt consume maven snapshot dependencies?
  • What the meaning of optional JMH-compliant build? TLC fails if we check two different specs with a single VM instance (VM has to be fresh).

@farquet
Copy link
Collaborator

farquet commented Nov 15, 2019

How do you make sbt consume maven snapshot dependencies?

You can use the -SNAPSHOT suffix just like in a pom file. So you can add something like this to your build.sbt :

libraryDependencies += "org.lib" %% "lib" % "1.0-SNAPSHOT" % "test" changing()

changing() forces the re-download in case you push an updated version.

It would be good to keep that for dev only though to avoid potentially changing dependencies in the future.

What the meaning of optional JMH-compliant build?

We have users who prefer using JMH buils for convenience because they have tools to run such jars. However, the renaissance harness is the default way of running benchmarks because its goal is to augment it with tools to analyze the workloads automatically.

TLC fails if we check two different specs with a single VM instance (VM has to be fresh).

This is not a problem if your two workloads are distinct benchmarks because renaissance benchmarks are meant to be executed individually anyway.
Your benchmarks can be part of the same sbt project though. Create a subdirectory named something like model-checker in the benchmarks directory (it will be discovered automatically) and create two individual classes that extend Benchmark, one for each model checker.

@axel22
Copy link
Member

axel22 commented Nov 15, 2019

TLC fails if we check two different specs with a single VM instance (VM has to be fresh).

In addition to what Francois said: you should be aware that the Renaissance harness will call the run method of the benchmark (i.e. the entry point into your workload) multiple times. The reason for this is that the harness first tries a couple of warmup runs, and then does several runs with measurements.
This means that if you should make sure that there is no global state on which the run depends on, or if there is such a global state, that you reset that state in the setup and the teardown methods.

@lemmy
Copy link
Author

lemmy commented Nov 16, 2019

@axel22 If that's a strict requirement, I don't see a way to move forward. I don't have the bandwidth to change TLC to support warmup runs (I once hacked something based on OSGi). Initially, I assumed renaissance would accept a macro-benchmark that runs for a few minutes and thus needs no warmups. TLC fits this type of benchmark very well.

@axel22
Copy link
Member

axel22 commented Nov 16, 2019

@lemmy It's still possible to have single runs, but it would require setting the default number of repetitions to 1 in the benchmark definition. In this case, there would be no warmup.
I think it is still useful to contribute such a benchmark - we could keep in the staging area, as that would allow others to run it and analyze it, and maybe even invest into adding multiple-runs-per-process ability to TLC later.

@lemmy
Copy link
Author

lemmy commented Nov 25, 2019

https://github.com/tlaplus/tlaplus/tree/master/general/performance/Grid5k would be the spec that easily simulates two workloads: scalability and computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark-candidate Potential benchmark to be added
Projects
None yet
Development

No branches or pull requests

4 participants
@lemmy @axel22 @farquet and others