Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Performance on PD Instances #559

Merged
merged 12 commits into from
Jan 31, 2022

Conversation

krypt-n
Copy link
Contributor

@krypt-n krypt-n commented Aug 31, 2021

Issue

(Do I need one for performance improvements?)

Tasks

  • Cleanup
  • Update CHANGELOG.md (remove if irrelevant)
  • review

Description

This is work I did back in February which lead me to discover #462, rebased onto master and formatted. During profiling I noticed that PDShift and LocalSearch both contain code to look for a cheap insertion of a pickup and a delivery into a route.
This PR first de-duplicates the code and then improves the search somewhat by pruning based on known costs, leading to improved performance on PD instances.

Additionally, 0e3aa24 reduces the allocation of Amount objects, again, improving performance somewhat.

As far as I am aware, this PR does not in any way change the solutions computed by vroom, I'd consider it a bug if it does. Initial benchmarking on the li_lim_100 PD instances looks promising:

master at 58f7411

,Gaps,Computing times
Min,-20.04,250
First decile,-0.0,411
Lower quartile,0.0,534
Median,0.0,726
Upper quartile,1.77,849
Ninth decile,4.18,1039
Max,9.86,1278

this PR

,Gaps,Computing times
Min,-20.04,235
First decile,-0.0,368
Lower quartile,0.0,439
Median,0.0,577
Upper quartile,1.77,680
Ninth decile,4.18,797
Max,9.86,1052

@krypt-n
Copy link
Contributor Author

krypt-n commented Aug 31, 2021

A bit more thorough benchmark on 176 instances (li_lim_100, li_lim_200, li_lim_400):

master:

,Gaps,Computing times
Min,-39.68,245
First decile,-16.1,589
Lower quartile,-9.09,870
Median,0.0,3102
Upper quartile,1.21,12602
Ninth decile,3.49,17916
Max,14.61,31036

this PR:

,Gaps,Computing times
Min,-39.68,234
First decile,-16.1,501
Lower quartile,-9.09,710
Median,0.0,2551
Upper quartile,1.21,10462
Ninth decile,3.49,14437
Max,14.61,26141

seems to be almost 20% faster

@jcoupey
Copy link
Collaborator

jcoupey commented Aug 31, 2021

Thanks for polishing this work and submitting a PR! I'll look into it soon. Looks like between this and #558 we have some great ongoing improvements on computing times. \o/

@jcoupey
Copy link
Collaborator

jcoupey commented Sep 2, 2021

Did not go through the commits yet, but I can confirm the steady reduction around -21.5% on average for all PDPTW benchmarks (li_lim_*) across all exploration levels. This is great!

@jcoupey jcoupey added this to the v1.11.0 milestone Sep 2, 2021
@jcoupey
Copy link
Collaborator

jcoupey commented Sep 2, 2021

I've been running a few non-regression tests on that one and found something breaking with the following instance.

pd-perf-problem.txt

Running vroom at 58f7411 (parent commit for the PR on master) provides an expected solution but the current tip of the PR breaks an assert:

$ vroom -i pd-perf-problem.txt 
vroom: structures/vroom/tw_route.cpp:177: void vroom::TWRoute::fwd_update_earliest_from(const vroom::Input&, vroom::Index): Assertion `current_earliest <= latest[i]' failed.
Aborted (core dumped)

This kind of problem is not trivial to debug, it's a consistency check that breaks upon updating earliest times after applying some route change. Usually it's the sign that 1. the applied move is not valid or 2. an inconsistency has been previously introduced in earliest/latest dates by some other change. I'd go for 1. here since the changes do not touch the TW logic but only "client" code.

@krypt-n
Copy link
Contributor Author

krypt-n commented Sep 2, 2021

Cool, the instance is pretty small. I'll look into it

@jcoupey
Copy link
Collaborator

jcoupey commented Sep 3, 2021

For what it's worth, here is a patch I find useful to log current state of TWRoute objects. It applies to current master but may require adjustments on this PR.

Also to avoid searching a needle in a haystack you can narrow down the error with a single heuristic parameter applied: vroom -i pd-perf-problem.txt -e "0,FURTHEST,0.3" -x 0.

@krypt-n
Copy link
Contributor Author

krypt-n commented Sep 3, 2021

I think I fixed the problem, it was clearly a mistake in this PR. compute_best_insert would halve any cost, thus turning an Insertion with numeric_limits::max() cost (used to signal that no insertion is possible) into one with numeric_limits::max()/2 cost. This means that try_job_additions always would have mistakenly found a "valid" Insertion.

I'm a bit scared that this didn't come up in any benchmark instance

@jcoupey
Copy link
Collaborator

jcoupey commented Sep 6, 2021

Trying to get the big picture for those changes, @krypt-n just let me know if the following summary fits.

On P&D insertion

Before:

  1. PDShift had a check for early stop based on pickup insertion cost only and a known threshold, the same was not implemented for the similar code in compute_best_insertion_pd, called in try_job_additions.
  2. The version in compute_best_insertion_pd had a check to skip deliveries when their sole insertion is not valid.

Now grouping both implementations and adjusting has the consequence that:

  • shortcuts from 1. are now available everywhere, including within try_job_additions;
  • skip described in 2. additionally applies from PDShift and foremost is extended to skip the whole inner loops whenever it is not valid to include any of the deliveries on their own.

On amount-related allocations

Now for 0e3aa24, my understanding is that switching from maintaining modified_delivery all along to recomputing the whole value from the range upon testing does not theoretically reduce the number of amount allocation. In the worst case, this would even be done on each check so would be more costly. But since we only check this for potentially better solutions, the calls to is_valid_addition_for_capacity_inclusion are sparse enough that it's cheaper in the end.

Copy link
Collaborator

@jcoupey jcoupey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, both for the speedup related to added early stops and for the fact that this reduces code duplication. For the latter, I see two ways to go even further, by using the new ls::compute_best_insertion_pd:

  1. From cvrp::PDShift::compute_gain too.
  2. From the heuristics.

Item 1 seems pretty straightforward: I think it would work out-of-the-box if cvrp::PDShift::compute_gain were to hold the new vrptw::PDShift::compute_gain implementation (except for the removal check part) and vrptw::PDShift::compute_gain were to call its parent counterpart. Item 2 may be more touchy so we can always schedule that for another PR.

src/problems/vrptw/operators/pd_shift.cpp Show resolved Hide resolved
src/problems/vrptw/operators/pd_shift.cpp Outdated Show resolved Hide resolved
src/problems/vrptw/operators/pd_shift.cpp Outdated Show resolved Hide resolved
src/problems/vrptw/operators/pd_shift.cpp Outdated Show resolved Hide resolved
src/problems/vrptw/operators/pd_shift.cpp Show resolved Hide resolved
Copy link
Collaborator

@jcoupey jcoupey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as is, thanks for the last changes. What's your take on the previous comment? If you don't plan to do anything on the cvrp side (item 1), then I could probably handle it before merging.

@krypt-n
Copy link
Contributor Author

krypt-n commented Sep 10, 2021

Yep, your summary seems correct to me. I replaced the cvrp version in b88b11e

@krypt-n
Copy link
Contributor Author

krypt-n commented Sep 10, 2021

On a first glance, there are two copies of this pickup/delivery insertion search in heuristics.cpp, with some minor differences to the one in this PR, so I would leave that as is for now

@jcoupey
Copy link
Collaborator

jcoupey commented Sep 13, 2021

I completed the usual Li&Lim checks by runs on real-life instances and also noticed a speedup there (in the 10% to 15% ballpark) so I'm looking forward to merge!

this PR does not in any way change the solutions computed by vroom, I'd consider it a bug if it does

I do have a few instances where the solution is actually different with this PR. It's not always better or worse so it really looks like some heuristic choice is simply made differently at some point. I did not dig more into the problem but managed to narrow it down with a small-ish instance.

Results I'm getting with this file using the parent commit 58f7411 and the tip of this PR:

$ vroom_58f7411 -x 4 -i pd-different-solutions.txt -e "1,HIGHER_AMOUNT,2.1" | jq .summary
{
  "cost": 69312,
  "unassigned": 26,
  "service": 33000,
  "duration": 69312,
  "waiting_time": 0,
  "priority": 260,
  "violations": [],
  "computing_times": {
    "loading": 0,
    "solving": 9
  }
}
$ vroom_b88b11e -x 4 -i pd-different-solutions.txt -e "1,HIGHER_AMOUNT,2.1" | jq .summary
{
  "cost": 68332,
  "unassigned": 24,
  "service": 34560,
  "duration": 68332,
  "waiting_time": 0,
  "priority": 260,
  "violations": [],
  "computing_times": {
    "loading": 0,
    "solving": 6
  }
}

@krypt-n
Copy link
Contributor Author

krypt-n commented Sep 13, 2021

I'll try to figure out the reason for that difference

@jcoupey jcoupey removed this from the v1.11.0 milestone Sep 30, 2021
@jcoupey
Copy link
Collaborator

jcoupey commented Oct 5, 2021

I did some debugging and found that this is a case of same-cost-but-different-choice in try_job_additions.

Applying this patch on top of b88b11e

diff --git a/src/algorithms/local_search/local_search.cpp b/src/algorithms/local_search/local_search.cpp
index 61d3a78..8871588 100644
--- a/src/algorithms/local_search/local_search.cpp
+++ b/src/algorithms/local_search/local_search.cpp
@@ -7,6 +7,8 @@ All rights reserved (see LICENSE).
 
 */
 
+#include <iostream>
+
 #include "algorithms/local_search/local_search.h"
 #include "algorithms/local_search/insertion_search.h"
 #include "problems/vrptw/operators/cross_exchange.h"
@@ -247,6 +249,15 @@ void LocalSearch<Route,
     job_added = (best_cost < std::numeric_limits<double>::max());
 
     if (job_added) {
+      bool log =
+        (best_route == 3 and best_job_rank == 30 and
+         best_insertion.cost == 169 and best_insertion.delivery_rank == 4);
+
+      if (log) {
+        std::cout << "best_insertion.pickup_rank = "
+                  << best_insertion.pickup_rank << std::endl;
+      }
+
       _sol_state.unassigned.erase(best_job_rank);
       const auto& best_job = _input.jobs[best_job_rank];

yields:

$ vroom -x 4 -i pd-different-solutions.txt -e "1,HIGHER_AMOUNT,2.1" -o /dev/null
best_insertion.pickup_rank = 3

Now logging the same at 58f7411 results in:

$ vroom -x 4 -i pd-different-solutions.txt -e "1,HIGHER_AMOUNT,2.1" -o /dev/null
best_insertion.pickup_rank = 0

All the rest of the execution path (operators applied, jobs added) looks similar prior to this choice and then diverges since solution differ after this insertion.

Looks like all evaluations are matching but the order in which ranks are evaluated is changed somehow so a different option with same cost is picked.

@jcoupey
Copy link
Collaborator

jcoupey commented Jan 20, 2022

Back on this PR, I just noticed that my previous comment was misleading since I reported the same output: best_insertion.pickup_rank = 3 for both commands. The value is actually 3 at b88b11e but 0 at 58f7411 (I edited the above message).

I think I found the reason for this difference by logging the actual costs evaluated within the different versions of compute_best_insertion_pd:

  1. at 58f7411, inserting job 30 in route 3 with respective pickup and delivery ranks 0 and 4 costs 169 and is chosen as best option (see situation from the above log patch);
  2. at b88b11e, when calling compute_best_insertion_pd for job 30 and route 3 at that same point in solving, we actually have two updates of the best insertion option: inserting pickup/delivery at ranks 0/4 costs 339, then inserting pickup/delivery at ranks 3;4 costs 338.

The thing is that normalizing the P&D insertion cost by dividing it by two now happens after the call to compute_best_insertion_pd rather than in its inner loop. This explains the difference in logged costs, but also that the version in this PR is able to pick the 3/4 P&D insertion that has a cheaper cost (less by 1). The code at 58f7411 is not able to pick the 3/4 insertion over the 0/4 one since normalized costs both equal to 169.

Wrapping this up:

  • behavior changes should only be noticed in cases with insertion options costs that differ by 1 (and provided the ordering triggers the above situation);
  • this PR is doing it right by first picking the best insertion cost, then normalizing the cost.

I did not want to introduce a change of behavior without fully understanding the reason, but now I think we're good! @krypt-n do you think you could resolve conflicts with current master and add a changelog entry?

@krypt-n
Copy link
Contributor Author

krypt-n commented Jan 26, 2022

Hi, I'll look into updating this PR in the next couple of days. I believe I already resolved some merge conflicts locally a while ago

@krypt-n krypt-n force-pushed the enhancement/pd-perf branch from b88b11e to 98234b4 Compare January 30, 2022 11:07
@krypt-n krypt-n requested a review from jcoupey January 30, 2022 11:22
@krypt-n
Copy link
Contributor Author

krypt-n commented Jan 30, 2022

Okay, I rebased the changes, added a changelog entry (pushed this to the master branch accidentally, apologies for that), and confirmed that this is still a 20% improvement compared to the current master branch with a few benchmarks.

Ready to merge from my point of view!

@jcoupey jcoupey merged commit 9e0840c into VROOM-Project:master Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants