Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing Z3 context consumes a lot of time #236

Closed
hernanponcedeleon opened this issue Jul 15, 2021 · 22 comments
Closed

Closing Z3 context consumes a lot of time #236

hernanponcedeleon opened this issue Jul 15, 2021 · 22 comments
Labels

Comments

@hernanponcedeleon
Copy link

I'm trying to replace the use of z3 API with JavaSMT in the dartagnan tool. Most of the integration is looking good so far but we just noticed some performance problems when using Solvers.Z3 as the underlying solver in MacOS.

Calling SolverContext.close() is extremely slow for some benchmarks, e.g. for a formula where the solving time is around 9 secs, closing the context takes more than 20 secs.

This is surprisingly since the concrete implementation of close() in Z3SolverContext seems to call Native.delContext(context) which is the same the Java bindings for z3 does.

Any idea what the reason might be? Any further information I could provide to find the root of this?

@kfriedberger
Copy link
Member

The Z3SolverContext does a little bit more than just closing the context:

  • cleanup all AST object (can be expensive, but only runs if phantom references are used, which are disabled by default).
  • write/flush and close the logfile of Z3 (do you use a log? This might need some time, if a lot of data is written to slow storage).
  • cleanup parameters (should be quite fast, we only set one parameter for Z3).
  • delete the context (should also be fast, but depends on internal operations of Z3).

Can you use a Java profiler and locate the exact section where the time is spent?
Maybe even a simple command like jstack provides enough info here, when executed in the moment of closing, maybe even several times in the given 20s interval.

@ThomasHaas
Copy link
Contributor

I'm also working on Dartagnan and I noticed that the call to Native.delContext(context) is the one that takes time. I just used the IntelliJ Debugger to walk through the code step-by-step and each call performed in close() is done almost instantly except the Native.delContext(context) which can take upwards of 2 minutes to complete.

@hernanponcedeleon
Copy link
Author

In my case is also Native.delContext(context) the one consuming the time.
About the log ... I'm using a default configuration

Configuration config = Configuration.defaultConfiguration();
SolverContext ctx = SolverContextFactory.createSolverContext(
    config, 
    BasicLogManager.create(config), 
    ShutdownManager.create().getNotifier(), 
    Solvers.Z3); 

but I assume Native.closeLog() is the method that takes care of writing/flushing the log and this method terminates instantly.

@hernanponcedeleon
Copy link
Author

Small updated ... I manage to make the solvers work also in Linux.
Closing the context for z3 also consumes a lot of time here (for certain benchmarks). The same happens for mathsat4. However closing the context for cvc4 is quite fast.

@kfriedberger
Copy link
Member

The symptoms are very vague and hard to reproduce without further information.
Could you describe your use-case:

  • How many symbols? A few or several thousands?
  • How many formula objects? Are the formulas build from basic symbols or parsed from SMTLIB2?
  • How many distinct solver stacks (= prover environments)?
  • How many queries are asserted? Many smaller ones (incremental solving) or just a few computationally hard ones?

@ThomasHaas
Copy link
Contributor

One of our benchmarks has around 20 seconds solving time and takes at least 4 minutes to close. That example has:

  • around 43000 basic symbols (that is the content of Z3FormulaCreator.symbolsToDeclarations)
  • the formulas are built from basic symbols (no SMTLIB2)
  • the formulas are some QF_LIA but mostly propositional
  • we have 1 prover environment
  • There are 2 queries in general, but in this case we perform only a single query under a prover.push(), because the instance is satisfiable. The second is only performed for unsatisfiable instances.
  • Asserting the initial formula takes around 20 seconds time, solving it is 15-20 seconds.

This exactly mimics the way we used Z3 before. We replaced all calls to Z3 with the respective calls to JavaSMT.

As far as I can tell, this problem seems to appear in all our medium-sized and larger benchmarks.

@hernanponcedeleon
Copy link
Author

Would it help if we generate the SMT2 file out of the encoding?

@kfriedberger
Copy link
Member

Maybe it might be helpful.
The problem might only be caused when using the binary API instead of SMTLIB2.

@hernanponcedeleon
Copy link
Author

Btw ... When googling if people have experienced this directly when using Z3, I found this which mentions @PhilippWendler ... Maybe he has some idea

@kfriedberger
Copy link
Member

I will also work on tests for #237. Within a few minutes, I found performance issues in Princess and SMTInterpol. :-)

@PhilippWendler
Copy link
Member

Btw ... When googling if people have experienced this directly when using Z3, I found this which mentions @PhilippWendler ... Maybe he has some idea

@cheshire was JavaSMT maintainer back then. The reason why he mentioned me in this comment was that two days before I wrote the following to him via mail:

since recently, I noticed that when using Z3 CPAchecker very often does not shutdown nicely as it should (in at most a few seconds).

Instead, it hangs or does a lot of work in Z3SolverBasedProver.close, which calls Z3SolverBasedProver.pop.

I suspect this might be because of the recent changes you asked the Z3 people to make?

However, @cheshire replied that at the time I wrote this, the Z3 version in JavaSMT did not even include the two commits from Z3Prover/z3#679 yet. (On 2016-07-12, 08c4f48 upgraded to z3-4.4.1-1558-gf96cfea and only on 2016-07-22 f3f8d56 upgraded to z3-4.4.1-1624-g82d0310. The commits in question are z3-4.4.1-1569-gb080e3a21 and z3-4.4.1-1570-g4720d578a.)
Furthermore, he indicated that Z3Prover/z3#679 was about closing the context and my problems have been with closing the prover.

So my problems back then might not have been related to Z3Prover/z3#679, but of course it could be that the current problems are.

Regarding the discussion Z3Prover/z3#679 I don't know more than that.

There have been quite a few other changes related to closing and cleanup back then (I know Z3Prover/z3#245, Z3Prover/z3#648, Z3Prover/z3#661), but these are more on the Java side and should not influence Native.delContext, I guess.

Some ideas:

  • Does Z3 still have this replay log? Then try dumping such a query and replay it, and check whether the replay also includes the costly close. If yes, this would be easy to report to the Z3 developers.
  • If not, one could try to recreate the problem with a SMT2 file. But this is less likely to problem the behavior, I guess.
  • One can create a C program that calls the Z3 API in the same way as your current Java program. One could at least semi-automate this by inserting some print statements next to all API calls or so. If the problem is reproducible then, this would also be easy to report.

@hernanponcedeleon
Copy link
Author

Maybe it might be helpful.
The problem might only be caused when using the binary API instead of SMTLIB2.

Here is the smt2 file of one of the problematic cases using Solvers.Z3:

  • solving time: ~2 secs
  • closing context time: ~20 secs

At the java level, the code looks like

prover.addConstraint(A);
prover.isUnsatWithAssumptions(B);
prover.addConstraint(C);
prover.isUnsat();

As @ThomasHaas mentioned, the first check in this example is SAT. The smt2 file corresponds to A /\ B /\ C.

encoding.smt2.zip

@hernanponcedeleon
Copy link
Author

  • Does Z3 still have this replay log? Then try dumping such a query and replay it, and check whether the replay also includes the costly close. If yes, this would be easy to report to the Z3 developers.

I did not get it ... are you talking about the Z3 API? what do you mean by "dumping such a query"?

In our version of the code where we use Z3 API instead of JavaSMT (the code is identical modulo method renaming, e.g. solver.add() to prover.addConstrain() etc) the call to ctx.close() was almost immediate.

@PhilippWendler
Copy link
Member

  • Does Z3 still have this replay log? Then try dumping such a query and replay it, and check whether the replay also includes the costly close. If yes, this would be easy to report to the Z3 developers.

I did not get it ... are you talking about the Z3 API? what do you mean by "dumping such a query"?

I am talking about the replay log of Z3, and suggest to create such a log for a query with a costly close. With JavaSMT the option solver.z3.log can be used to create such a log (if nothing has changed since I last used it). Then one can use a Z3 binary on the command line to replay the query and check if it also takes so long. If yes, then one can report this to the developers, they should be able to reproduce it and explain what takes so long.

In our version of the code where we use Z3 API instead of JavaSMT (the code is identical modulo method renaming, e.g. solver.add() to prover.addConstrain() etc) the call to ctx.close() was almost immediate.

Then it seems trying to recreate the problem like I suggested as a C problem is unlikely to succeed.

Idea: use ltrace to dump all C API calls to Z3 from your program, once using the direct Z3 API and once using JavaSMT, and compare this. I guess there has to be a difference in the API usage earlier that leads to the later difference.

@hernanponcedeleon
Copy link
Author

I am talking about the replay log of Z3, and suggest to create such a log for a query with a costly close. With JavaSMT the option solver.z3.log can be used to create such a log (if nothing has changed since I last used it). Then one can use a Z3 binary on the command line to replay the query and check if it also takes so long. If yes, then one can report this to the developers, they should be able to reproduce it and explain what takes so long.

If I understood correctly, this would create a file that I can feed as an input to the Z3 binary and hopefully reproduce the issue.

Is there a way a way to set this options directly at the java level (we are not yet using configuration file)?
Maybe something like the code below?

Configuration config = Configuration.builder().setOption(name, value).build();

@PhilippWendler
Copy link
Member

Exactly.

@hernanponcedeleon
Copy link
Author

Are the entries as follow?

  • name: log
  • value: true

Do I need somehow to specify the log file name or where to create it?

@PhilippWendler
Copy link
Member

The name would be "solver.z3.log" and value would be desired file name of the output file (e.g., "z3.log", can also be an absolute path etc.).

Btw.: All configuration options of JavaSMT are documented in a text file named org/sosy_lab/java_smt/ConfigurationOptions.txt inside JavaSMT's JAR. A current snapshot is always at https://sosy-lab.github.io/java-smt/ConfigurationOptions.txt (linked under "Quick links" in the readme).

@hernanponcedeleon
Copy link
Author

Wow ... by taking a look to ConfigurationOptions.txt I noticed the solver.z3.usePhantomReferences (phantom references were mentioned in Z3Prover/z3#245).

I set this option to true in the configuration and this completely solved the performance problem.
Is there any further consequence of using this option?

@PhilippWendler
Copy link
Member

Interesting.

What this option does is to couple Java's garbage collection and Z3's reference counting: After a Z3Formula instance gets cleaned up by the garbage collector, JavaSMT will call Native.decRef() such that Z3 can also clean it up if there are no other references. Without phantom references, this does not happen and Z3 never gets a chance to clean up formula instances before the context is closed.

For your case, the main effect is probably that our Z3SolverContext.close() calls decRef() for all created formula instances before calling delContext() if phantom references are enabled. It seems that this manual calling of decRef() and the associated cleanup is more efficient than the cleanup that Z3 does in delContext() if the formula instances still exist. So this indicates an efficiency bug in delContext() and should be worth reporting to Z3's developers (the method should be able to just do the same as we do externally or potentially be even more efficient, not slower).

Apart from this, the effects of enabling phantom references are:

  • Slightly increased cost of creating Z3Formula instances and for Java's GC
  • Slightly increased memory usage by JavaSMT for the additional bookkeeping
  • Whenever you perform a Z3 operation through JavaSMT, it will check if any Z3Formula instances have been cleaned up by GC since the last call and if yes, it will clean them up (this could in theory lead to some cheap method taking a lot of time if GC freed many instances just before).
  • Memory leak (ever increasing memory consumption of Z3) is avoided if there is a context that exists for a long time with many formulas being created, used, and then never used anymore with all references gone to them.

So if you have an application that fits the usage pattern of the last item, it should be worth it. If not, e.g., because you just have short-lived contexts or because you keep references to basically all formulas anyway forever, then probably not (but you could benchmark it). This is why it is not enabled by default.

@PhilippWendler
Copy link
Member

Note that it seems to me that since Z3Prover/z3#648, Z3's own Java API always uses phantom references and also iterates over all created objects and cleans them up manually before calling delContext() (in com.microsoft.z3.Context.close()). And even before that, it seems that it kept a list of all created references for cleanup. So this would explain why JavaSMT with phantom references and Z3's own API are similarly fast.

However, I would still say that delContext() should be improved for the case where not all references have been cleaned up. After all, that would also benefit all other users of Z3 in languages without GC, if they don't meticulously clean up all references manually before delContext().

@PhilippWendler PhilippWendler changed the title Closing context consumes a lot of time Closing Z3 context consumes a lot of time Jul 16, 2021
@hernanponcedeleon
Copy link
Author

Solved.
Thanks a lot for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants