-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High runtime of _rationalLUSolver.clear()
in SoPlexBase<R>::_changeLhsReal()
#19
Comments
Hi Florian, thanks for the issue and happy to help analyzing this further with you. You are right that your use case is something we do not have performance tests for, but that does not mean it is not interesting for us. Generally, I can imagine that overhead was introduced (unintentionally) while restructuring SoPlex in order to support rational solving. But I hope we can fix it. Let's start with the easiest: In real solving mode, _rationalLUSolver.Clear() should not need to be called. Can you tell me the path (or paths) from where this is called currently? Your parameter set sounds already correct, but can you still post an exact list of what you change from default? |
Hi Ambros, thanks for looking into this. I only changed two unrelated parameters: soplex.setIntParam(SoPlex::VERBOSITY, SoPlex::VERBOSITY_ERROR);
soplex.setIntParam(SoPlex::SIMPLIFIER, SoPlex::SIMPLIFIER_OFF); The path to
In the profile, the second call is inlined. Here is the relevant part of the profile: Looking for |
Sorry that I took a while to get back to this. Instead of adding many if's to soplex.hpp I decided to safeguard SLUFactorRational::clear() itself. Can you try #20 and see if the overhead disappears? |
PS: Will be away from keyboard for two weeks now, but happy to continue afterwards. |
I'm also on holidays right now but will be happy to look into this further next month. |
Hi Ambros, I'm back now and started looking into this again. I couldn't use #20 directly because our surrounding code also needs 522b8a1 which seems to be merged in For the remaining overhead, it is less clear to me what causes it. I'm attaching screenshots of two profiles, one created with 3.1.1 and one with the fix from #20. They are run on the same instance which behaves deterministically. In both cases, the method to solve is called 8640 times but the legacy code used 6.1B samples, where the new code used 10.8B. Can you see something from the traces that they do differently? It is somewhat harder to compare them because v3 is so old and the structure of the code changed since then. 3.1.16.0.x (master + fix from #20) |
Some observations:
Otherwise, I think we would need to start looking into log files as well to confirm that there are more pivots and where they could come from. |
I enabled the log output with both versions and uploaded the results here. In this setting, there is an LP with a bunch of "permanent" constraints that is solved over and over (roughly 10000 times over all). From one solve to the next, the bounds on the permanent constraints are changed, and some "temporary" constraints are added. All temporary constraints are removed again after the solve and different ones are added for the next solve. To line up the logs better, I added some output before and after each solve ( I'm not sure how to interpret the logs but one thing that stuck out to me was that in the v3.1.1 version, the column For my own reference later: the instance used for the profiles above and the logs was |
More important than the "facts" column is the "iters" column. It looks like for most LPs, the old version did not have to pivot at all and took zero iterations; in the new version you see many more iterations for each LP. Apparently the old version warmstarts every LP solve from the previous basis, while the new version loses the basis and solves from scratch. SoPlex would generally discards the basis if constraints with status basic are removed, since the dimension changes. So I am not sure how the old version manages to warmstart. Are these cases where no constraints are added or removed? In any case what you would need to find out is where the basis gets invalidated, maybe you can trace what is happening in spxchangebasis.cpp ( |
The way I understood things, SoPlex doesn't directly implement primal/dual simplex but the the algorithm depends on whether the representation is in column or row form (https://soplex.zib.de/doc/html/FAQ.php#primaldualsimplex). Could this have anything to do with it? Maybe in switching from OSI to directly talking to SoPlex, we accidentally changed the representation? Would the entering algorithm on the column representation do the same as the leaving algorithm on the row representation? |
Yes, this could be the case. Probably OSI had column rep fixed. You can test this with
Yes, that is correct. |
I tried running the code with the 4 combinations of algorithm and representation. While the logs are different, all combinations had a larger number of iterations compared to the 3.1.1 logs from above: column_dual.log I then tried writing out the models from the first and second solve. The old 3.1.1. code had no iterations on the second solve which I guess means that it warm-starts and directly finishes. The 6.0.x code used ~70 iterations. (I had to add initial_60x.lp.txt The models look the same to me but it is somewhat hard to say because:
Going from the first to the second LP, the changes are only in the constraint rhs:
The way the code is set up though, all rhs are set again (all rhs other then the ones above just get set to their old values). |
Warm start or even hot start (keeping the factorization) should be perfectly possible. Can you investigate, whether the method |
I did find a call to it on the second solve ( It turns out I overlooked something and there are actually additional changes to the model that I didn't see before. Some constraints get removed and others added. The set of constraints removed and added happen to be identical between these two solves which is why I didn't spot them when looking at the diff of the models. I would normally expect some of these constraints to change and will look into why they are the same here, but in the end this affects both the new and old version in the same way. Knowing that constraints change, I can understand that there is no warm start; but now I'm surprised that there is a warm start with the old version. |
I am just guessing here: It could be that the Osi interface that you used before stores the basis and loads it back into SoPlex before the next solve. This should work if the dimensions have not changed and the basis matrix is still regular. Could that be the explanation? You could mimic this rather easily by getBasis/setBasis. |
I didn't actively get or set the basis in the version using OSI but maybe this is something that OSI does internally? The profile I posted earlier shows some time in a function To try it in the new code, I added this code around the optimization call: int basis_num_cols = basis_cols.size();
int basis_num_rows = basis_rows.size();
if (get_num_variables() == basis_num_cols &&
get_num_constraints() == basis_num_rows) {
soplex.setBasis(basis_rows.data(), basis_cols.data());
} else {
basis_cols.resize(get_num_variables());
basis_rows.resize(get_num_constraints());
}
soplex.optimize();
soplex.getBasis(basis_rows.data(), basis_cols.data()); This is supposed to always store the basis after a solve and then reload it before a solve if the dimensions of the problem are still the same. Does this look OK to you? With this, the instance I tested sped up a lot and is now slightly faster than the OSI version. From the log it looks like the new code now also warm-starts where possible (0 iterations where the old code had 0 iterations) but it still does more iterations than the old code (higher number of iterations in cases with non-zero iterations). It seems to do those iterations faster, though, as the overall time is lower. I'll run this on a larger benchmark set to confirm but it looks like this fixes the issue for me. I saw #20 is already merged into the branch |
Wow, nice. There will be a major release earl 2024 from the master branch. bugfix will soon be merged into master, then master has both changes. |
Great to hear about the release. The experiments I ran unfortunately showed very mixed results. Sometimes the time for storing the basis just does not pay off as it did in the instance we looked at here and over all we still do not reach the performance we had with OSI and SoPlex 3. But since this moved far away from the initial issue, we can just leave it at that. It doesn't seem to be caused by a single change. Also, some results suggest that I introduced a bug by restoring the basis. It looks like some LP's now report a different optimal value but I didn't trace this down yet. Is it really always possible to re-use the basis as long as the dimensions of the problem stay the same, no matter what else changes? |
I am happy to keep investigating. If you take the best setting you have currently and show profiling results, we can make new conjectures as to what may cause slower performance.
No, if the constraint coefficients change then there are cases where the basis becomes singular. In that case, SoPlex should either terminate with singular as status (in which case no solution is available) or restart from scratch as if no basis was given. Not sure what happens without a log. So if you receive a different optimal value this would indeed be a bug. |
What is the status here? |
Hi all,
we run SoPlex on relatively small LPs where we solve the same constraints with different bounds several times per second. I realize that such small LPs are a bit of a niche use case but one thing we noticed in a profile was that updating the bounds took a surprisingly long time (~14% of the total time). Most of time was spent in
_rationalLUSolver.clear()
. This sounds related toSYNCMODE
which we had set toSYNCMODE_ONLYREAL
(as well asSOLVEMODE
andCHECKMODE
).We also saw a large overhead in storing the solution (
SoPlexBase::_storeSolutionReal
, 8% of total time) and the solve itself took much longer than with our baseline of SoPlex 3 (we are in the process of upgrading from v3 to v6, so the baseline is v3 and the version where we experience the slowdown is the latest version on the main branch here).Can either of these overheads be avoided in some way? I'm attaching the profile in case you are interested:
callgrind.out.303286.log
The text was updated successfully, but these errors were encountered: