Add SG TSP #1360

hlinsen · 2021-01-27T18:50:10Z

This PR implements an approximated solution to the Traveling Salesperson Problem (TSP).
The algorithm is exposed under traversal through a Python API taking 2D pos as input and returning a route.
This PR relies on RAFT KNN: rapidsai/raft#126
Solves: #1185

rlratzel

I didn't review the TSP details as much, but did notice a few things that'll need to be addressed.

cpp/CMakeLists.txt

cpp/tests/CMakeLists.txt

datasets/get_test_data.sh

python/cugraph/layout/force_atlas2_wrapper.pyx

python/cugraph/tests/test_traveling_salesman.py

python/cugraph/tests/utils.py

python/cugraph/traversal/traveling_salesman.py

cpp/include/algorithms.hpp

BradReesWork · 2021-02-03T15:42:47Z

cpp/include/algorithms.hpp

+ * @param[in] verbose 												Logs configuration and iterative improvement.
+ *
+ */
+float traveling_salesman(raft::handle_t &handle,


why a float return type versus void?

Currently to get my outputs I return the minimum cost and the path is passed as a pointer.

This isn't documented

How do you pick what gets returned as an argument and what doesn't? It seems we are mixing both at the moment. Consider using a consistent output mechanism.

Another possible option for routes: New APIs are returning rmm::uvector which is often easier to deal with than raw pointers.

BradReesWork · 2021-02-03T19:09:25Z

rerun tests

rlratzel

Looks good, just a few more items I noticed.

python/cugraph/tests/utils.py

cpp/CMakeLists.txt

cpp/src/traversal/tsp.cu

aschaffer · 2021-02-03T21:04:08Z

cpp/src/traversal/tsp_solver.hpp

+  best_route = nullptr;
+}
+
+__global__ __launch_bounds__(2048, 2) void two_opt_search(int *mylock,


This is a quite large hand-written kernel. Two issues:
(1.) Couldn't be this designed using Thrust/CUB primitives?
(2.) Is there a possibility to split this kernel into smaller __device__ functions, so that reading and maintaining this code later would become possible / easier ?

cpp/src/traversal/tsp_utils.hpp

codecov-io · 2021-02-07T05:47:52Z

Codecov Report

Merging #1360 (81caaf8) into branch-0.18 (2fb0725) will increase coverage by 0.37%.
The diff coverage is 62.28%.

@@               Coverage Diff               @@
##           branch-0.18    #1360      +/-   ##
===============================================
+ Coverage        60.38%   60.75%   +0.37%     
===============================================
  Files               67       70       +3     
  Lines             3029     3134     +105     
===============================================
+ Hits              1829     1904      +75     
- Misses            1200     1230      +30

Impacted Files	Coverage Δ
python/cugraph/centrality/__init__.py	`100.00% <ø> (ø)`
python/cugraph/dask/structure/renumber.py	`0.00% <0.00%> (ø)`
python/cugraph/link_analysis/pagerank.py	`100.00% <ø> (ø)`
python/cugraph/traversal/__init__.py	`100.00% <ø> (ø)`
python/cugraph/comms/comms.py	`34.52% <25.00%> (ø)`
python/cugraph/dask/common/input_utils.py	`23.07% <28.57%> (+1.14%)`	⬆️
python/cugraph/dask/common/mg_utils.py	`37.50% <38.09%> (-2.50%)`	⬇️
python/cugraph/community/spectral_clustering.py	`72.54% <38.46%> (-11.67%)`	⬇️
python/cugraph/structure/number_map.py	`59.20% <50.00%> (+3.24%)`	⬆️
python/cugraph/structure/graph.py	`66.99% <76.47%> (+0.19%)`	⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 466e29a...81caaf8. Read the comment docs.

hlinsen · 2021-02-08T17:02:26Z

rerun tests

afender

Cool code. Just reviewed C++ part for now.

In addition to the comments, I'd like to see some performance data and a basic profile (ie. what's the perf bottleneck).

Kernels are nice but I would have preferred if there was more of Graph Prims, Thrust, and RAFT in this PR. API/Doc needs polishing. I also added some comments regarding integration and testing, for instance, I see an API that doesn't take graphs but graphs are passed in tests in a custom format that really looks like regular mtx/csv, so I think this should be reconciled.

afender · 2021-02-08T16:26:27Z

cpp/include/algorithms.hpp

+ * handle, the multi GPU version will be selected.
+ * @param[in] vtx_ptr                         Device array containing the vertex identifiers used
+ * to initialize the route.
+ * @param[out] route                          Device array containing the returned route.


No graph_t API?

afender · 2021-02-08T16:26:56Z

cpp/include/algorithms.hpp

+ * @param[in] verbose                         Logs configuration and iterative improvement.
+ *
+ */
+float traveling_salesman(raft::handle_t &handle,


consider const raft::handle_t &handle instead

Created a PR to make the brute_force_knn handle const for 0.19

afender · 2021-02-08T16:33:46Z

cpp/include/algorithms.hpp

+ * @param[in] verbose 												Logs configuration and iterative improvement.
+ *
+ */
+float traveling_salesman(raft::handle_t &handle,


This isn't documented

How do you pick what gets returned as an argument and what doesn't? It seems we are mixing both at the moment. Consider using a consistent output mechanism.

Another possible option for routes: New APIs are returning rmm::uvector which is often easier to deal with than raw pointers.

cpp/src/traversal/tsp.cu

afender · 2021-02-08T16:46:55Z

cpp/src/traversal/tsp.cu

+    h_y_pos.data(), soln + nodes_ + 1, sizeof(float) * (nodes_ + 1), cudaMemcpyDeviceToHost));
+  cudaDeviceSynchronize();
+
+  for (int i = 0; i < nodes_; ++i) {


So nodes_ is expected to always be smaller than 2B in the future or is this something we should template like in the experimental APIs?

Right now the problem size we target is up to 2k nodes

cpp/src/traversal/tsp_utils.hpp

cpp/tests/traversal/tsp_test.cu

afender · 2021-02-08T17:01:10Z

cpp/tests/traversal/tsp_test.cu

+    return tokens;
+  }
+
+  int load_tsp(const char* fname, Route* input)


Why not use csv files and I/O?

We want to compare against TSPLIB which solves TSP as a city problem, so the format corresponds to 2D positions as input.

I see how it would be relevant in the context of a demo/benchmark. Not sure if that's a strong enough motivation in the context of cugraph testing though. It seems to me it is going in the opposite direction than what @rlratzel is trying to do for making testing more generic. I am ok not blocking the PR for this but we should probably have a FiXme/Todo for test refactoring work (if that's ok with Joseph/Rick).

A FIXME as Alex suggested isn't a bad idea, at the very least to capture that TSP deviates from the standard testing I/O pattern and we may want to revisit adopting a more universal approach (instead of or in addition to comparing using TSPLIB's input format).

Added the fixme in this commit: f85cb0d

afender · 2021-02-08T17:04:32Z

datasets/tsplib/a280.tsp

+DIMENSION: 280
+EDGE_WEIGHT_TYPE : EUC_2D
+NODE_COORD_SECTION
+1 288 149


Seems like all these files are really graph/sparse matrices rather than general euclidian space positions.
Are we supporting only positive integers within the same range?
If yes this should be mentioned in the API doc.
If not other test data should be added.

We're supporting float data, the tsp225 file contains purely floats. There are no constraints on the range between the cities

Gotcha. Should we test negative coordinates? parallel edges?

Added file with negative weights

afender · 2021-02-08T17:07:58Z

cpp/include/algorithms.hpp

+ * @param[in] vtx_ptr                         Device array containing the vertex identifiers used
+ * to initialize the route.
+ * @param[out] route                          Device array containing the returned route.
+ * @param[in] x_pos                           Device array containing starting x-axis positions.


What's the argument ordering logic?

I was following a priority ordering on the parameters for the Python API. The cpp layer follows the Python API except for the route that is a returned parameter. I will work on updating the return and use a float and rmm::uvector tuple instead

hlinsen

Cool code. Just reviewed C++ part for now.

In addition to the comments, I'd like to see some performance data and a basic profile (ie. what's the perf bottleneck).

Kernels are nice but I would have preferred if there was more of Graph Prims, Thrust, and RAFT in this PR. API/Doc needs polishing. I also added some comments regarding integration and testing, for instance, I see an API that doesn't take graphs but graphs are passed in tests in a custom format that really looks like regular mtx/csv, so I think this should be reconciled.

Thanks for the review. I'll work on adding timers. The current TSP implementation considers the graph is fully connected when looking for the best path. If we used a graph representation this means we would be O(V^2) in memory. We also consider than recomputing the distances is faster than storing them in shared memory and allows us to run on problems with larger size (more than 2k cities). One of the next step is indeed to reconcile with the rest of the API and take a graph as input but this would mean we have to refactor the kernel logic to look only at specific edges and have an efficient look up table (Cuco should have that) to get the weights. These optimizations are on the roadmap for next releases.

cpp/CMakeLists.txt

python/cugraph/tests/test_traveling_salesman.py

python/cugraph/traversal/traveling_salesman.py

hlinsen · 2021-02-10T03:41:30Z

rerun tests

rlratzel

Looks good, just a few minor change requests.

build.sh

rlratzel · 2021-02-10T03:59:18Z

cpp/tests/traversal/tsp_test.cu

+    return tokens;
+  }
+
+  int load_tsp(const char* fname, Route* input)


A FIXME as Alex suggested isn't a bad idea, at the very least to capture that TSP deviates from the standard testing I/O pattern and we may want to revisit adopting a more universal approach (instead of or in addition to comparing using TSPLIB's input format).

python/cugraph/tests/test_traveling_salesperson.py

python/cugraph/traversal/traveling_salesperson.py

conda/recipes/libcugraph/meta.yaml

BradReesWork · 2021-02-10T14:33:31Z

rerun test

BradReesWork · 2021-02-10T15:54:29Z

rerun tests

BradReesWork · 2021-02-11T14:17:34Z

@gpucibot merge

hlinsen requested review from a team as code owners January 27, 2021 18:50

BradReesWork added this to the 0.18 milestone Jan 29, 2021

BradReesWork requested review from afender and aschaffer January 29, 2021 16:27

rlratzel requested changes Feb 1, 2021

View reviewed changes

hlinsen force-pushed the tsp branch from 385b9a8 to b7f4639 Compare February 3, 2021 11:07

hlinsen changed the title ~~[WIP][skip-ci] Add SG TSP~~ Add SG TSP Feb 3, 2021

BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 3, 2021

ajschmidt8 approved these changes Feb 3, 2021

View reviewed changes

BradReesWork approved these changes Feb 3, 2021

View reviewed changes

rlratzel requested changes Feb 3, 2021

View reviewed changes

python/cugraph/tests/utils.py Outdated Show resolved Hide resolved

cpp/CMakeLists.txt Show resolved Hide resolved

aschaffer requested changes Feb 3, 2021

View reviewed changes

hlinsen force-pushed the tsp branch from 4aaefb9 to b1927fa Compare February 5, 2021 21:24

afender requested changes Feb 8, 2021

View reviewed changes

hlinsen commented Feb 8, 2021

View reviewed changes

cpp/CMakeLists.txt Show resolved Hide resolved

afender requested changes Feb 8, 2021

View reviewed changes

python/cugraph/tests/test_traveling_salesman.py Outdated Show resolved Hide resolved

python/cugraph/traversal/traveling_salesman.py Outdated Show resolved Hide resolved

hlinsen force-pushed the tsp branch from 905aa9c to 458f3d4 Compare February 10, 2021 03:43

rlratzel requested changes Feb 10, 2021

View reviewed changes

hlinsen added 22 commits February 10, 2021 19:01

Move tsplib files

b979ece

Fix flake8

a1c1c32

Update path

bfec70c

Update test file

744be01

Clean code

783cc49

Add negative weights

1cd4799

Update stats

888eb18

Clean

97c9f90

Use set for max in cpp test

6517ac8

Use resize instead of reserve

e460fbf

Add fixme in cpp test

641a1b3

Update exceptions

b550550

Use gpubenchmark

b8aaebc

Update dependencies

2642c2d

Fix flake8

6690050

Fix ci to run with faiss

169336e

Fix indentation

a3710f1

Change test ci script

d6ba45e

Adapt the cuml ci patch to cugraph

b314242

Update test script

e50268d

Update test script

44ef7d1

Remove test binaries patch

d152e83

hlinsen force-pushed the tsp branch from 81caaf8 to d152e83 Compare February 11, 2021 01:17

Add utilities for TSP test

173266f

rapids-bot bot merged commit 2743020 into rapidsai:branch-0.18 Feb 11, 2021

This was referenced Feb 11, 2021

[REVIEW] Updating cmakelists to match frozen raft hash #1389

Closed

[FEA] Travelling Salesman Problem (TSP) and/or Hamiltonian path #1185

Closed

hlinsen deleted the tsp branch April 15, 2021 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SG TSP #1360

Add SG TSP #1360

hlinsen commented Jan 27, 2021 •

edited

Loading

rlratzel left a comment

BradReesWork Feb 3, 2021

hlinsen Feb 3, 2021

afender Feb 8, 2021 •

edited

Loading

BradReesWork commented Feb 3, 2021

rlratzel left a comment

aschaffer Feb 3, 2021

codecov-io commented Feb 7, 2021 •

edited

Loading

hlinsen commented Feb 8, 2021

afender left a comment •

edited

Loading

afender Feb 8, 2021

afender Feb 8, 2021

hlinsen Feb 9, 2021

afender Feb 8, 2021 •

edited

Loading

afender Feb 8, 2021

hlinsen Feb 8, 2021

afender Feb 8, 2021 •

edited

Loading

hlinsen Feb 8, 2021

afender Feb 8, 2021 •

edited

Loading

rlratzel Feb 10, 2021

hlinsen Feb 10, 2021

afender Feb 8, 2021

hlinsen Feb 8, 2021

afender Feb 8, 2021 •

edited

Loading

hlinsen Feb 9, 2021

afender Feb 8, 2021

hlinsen Feb 8, 2021

hlinsen left a comment

hlinsen commented Feb 10, 2021

rlratzel left a comment

rlratzel Feb 10, 2021

BradReesWork commented Feb 10, 2021

BradReesWork commented Feb 10, 2021

BradReesWork commented Feb 11, 2021

Add SG TSP #1360

Add SG TSP #1360

Conversation

hlinsen commented Jan 27, 2021 • edited Loading

rlratzel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afender Feb 8, 2021 • edited Loading

Choose a reason for hiding this comment

BradReesWork commented Feb 3, 2021

rlratzel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Feb 7, 2021 • edited Loading

Codecov Report

hlinsen commented Feb 8, 2021

afender left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afender Feb 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afender Feb 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afender Feb 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afender Feb 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlinsen left a comment

Choose a reason for hiding this comment

hlinsen commented Feb 10, 2021

rlratzel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BradReesWork commented Feb 10, 2021

BradReesWork commented Feb 10, 2021

BradReesWork commented Feb 11, 2021

hlinsen commented Jan 27, 2021 •

edited

Loading

afender Feb 8, 2021 •

edited

Loading

codecov-io commented Feb 7, 2021 •

edited

Loading

afender left a comment •

edited

Loading

afender Feb 8, 2021 •

edited

Loading

afender Feb 8, 2021 •

edited

Loading

afender Feb 8, 2021 •

edited

Loading

afender Feb 8, 2021 •

edited

Loading