This project implements multiple methods for shortest distance computation in directed weighted graphs that leverage preprocessing of the graph to provide fast shortest distance queries. The implemented methods are Contraction Hierarchies, Transit Node Routing (based on Contraction Hierarchies), Transit Node Routing with Arc Flags (extension of Transit Node Routing) and Distance Matrix computation. Besides that, the API also contains standard algrithms for providing shortest distance queries without preprocessing the graph: Dijkstra and A*. Below, you can see a comparison of the speed and memory requirements of the implemented methods on two real-world graphs.
The project is split into two major components.
One component is the preprocessor (an executable called shortestPathsPreprocessor
) which takes an arbitrary graph in a supported format (described later) and prepares the graph, creating structures for fast distance computation. Additionally, the preprocessor allows running a set of queries and benchmark the time required to answer them.
This way, the user can easily evaluate whether the performance is sufficient for their use case.
The second component is the library (a shared library called libshortestPaths.so
in Linux or shortestPaths.dll
in Windows), which can load the structures prepared by the preprocessor to answer shortest distance queries.
The library can be used in a C++
application, but it can also be integrated into a Java
application.
A simple example application written in Java
which uses the library can be found in the javatests
subdirectory.
This application also serves as a testing tool to check whether the project is working correctly on a given machine.
The ./prebuilt
directory contains precompiled shortestPaths
library and
shortestPathsPreprocessor
executable for Linux and Windows for the x86-64 CPU
architecture.
- boost-config
- boost-graph
- boost-numeric-conversion
- boost-program-options
- p-ranav-csv2
- indicators
- hdf5[cpp]
- proj at least 9.3
mkdir build && cd build
cmake -DCMAKE_TOOLCHAIN_FILE="<vcpkg dir>/scripts/buildsystems/vcpkg.cmake" -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --target shortestPathsPreprocessor --config Release
There are also additional build targets:
shortestPaths
: the librarydoc
: for generating Doxygen documentation (requires Doxygen to be installed)func_test_runner
: for running functional tests
To use the project, first build the desired data structures using the preprocessor and then use the library to answer queries.
To test the usage, you can use the test data located in the data/functest
directory.
The preprocessor has a simple command line interface. Let us now assume that you have a directed weighted graph in one of the supported formats (see Input/Output File Formats). The command line interface requires you to use specific arguments. The first argument determines whether you want to preprocess a graph file or benchmark a preprocessed graph using some method. The arguments following the first one are specific to each of the usages.
General usage:
./shortestPathsPreprocessor -m <method name> -f <input format> -i <input path> -o <output path> [--precision-loss <precision loss>] <method specific arguments>
where:
<input format>
is one ofxengraph
,dimacs
,adj
,csv
<input path>
is path to the input file (including file extension) or folder (for CSV input format)<output path>
is path to the output file (excluding file extension - the appropriate extension based will be added automatically)<precision loss>
(optional) is a positive integer denoting how much weight precision to lose. Each loaded weight will be divided by this value before rounding. (default: 1)
To preprocess a graph using Contraction Hierarchies, call the preprocessor with the the method argument set to ch
:
Example Usage:
./shortestPathsPreprocessor -m ch -f xengraph -i my_graph.xeng -o my_graph
To preprocess a graph for Transit Node Routing, call the preprocessor with the method argument set to tnr
.
Method specific arguments:
--tnodes-cnt
is a positive integer that determines the size of the transit nodes (less than or equal to the number of nodes in the graph)int-size
(optional) is integer size to be used in the distance matrix during preprocessing (can be set to 16 or 32, default: native); effective only ifpreprocessing-mode
is set todm
Example Usage:
./shortestPathsPreprocessor -m tnr -f xengraph -i my_graph.xeng -o my_graph --preprocessing-mode dm --tnodes-cnt 1000
The --preprocessing-mode
argument determines which of the three available preprocessing modes will be used.
The fast
mode will precompute the structures in the smallest time out of the three, but the performance of the
resulting data structure will be significantly lower.
The modes slow
and dm
will produce exactly the same result, but the dm
mode uses distance matrix to
speed up the precomputation and therefore, it is faster than the slow
mode, but it requires a lot of memory.
This argument determines the size of the transit nodes set. It must be an integer between 1 and the number of nodes in the graph. Less transit nodes usually mean lower memory requirements, but also worse query times. By choosing the appropriate size of the transit node set, you can find a great balance between memory requirements and performance.
To preprocess a graph for Transit Node Routing with Arc Flags, call the preprocessor with the method argument set to tnraf
.
Method specific arguments:
--preprocessing-mode
is one ofslow
,dm
(for more info, see Preprocessing Mode)--tnodes-cnt
is a positive integer that determines the size of the transit nodes (less than or equal to the numbr of nodes in the graph)--int-size
(optional) is integer size to be used in the distance matrix during preprocessing (can be set to 16 or 32, default: native); effective only if--preprocessing-mode
is set todm
Example Usage:
./shortestPathsPreprocessor -m tnraf -f xengraph -i my_graph.xeng -o my_graph --preprocessing-mode dm --tnodes-cnt 1000
To generate a distance matrix, call the preprocessor with the method argument set to dm
.
Method specific arguments:
--preprocessing-mode
is one ofslow
,fast
--output-format
is one ofxdm
,csv
,hdf
--int-size
(optional) is integer size to be used in the distance matrix during preprocessing (can be set to 16 or 32, default: native). Note that this is not the output size (in case of a binary output format), the output integer size is set automatically based on the maximum distance in the graph.
Example Usage:
./shortestPathsPreprocessor -m dm -f csv -i my_graph.csv -o my_graph --preprocessing-mode fast --output-format csv
The fast
mode provides a significant computational speed advantage over the slow
mode, at an expense of much larger memory usage.
The library can be used to load the data structures precomputed by the preprocessor and then answer queries quickly using those structures.
Compile the library by:
cmake --build . --target shortestPaths --config Release
The library provides a query manager for each of the three methods (Contraction Hierarchies, Transit Node Routing and Transit Node Routing with Arc Flags), these query managers serve as an interface that can be used from other applications.
There is no query manager for distance matrix as there is no computation involved, the distance matrix can be used as it is.
The query managers are called CHDistanceQueryManagerAPI
, TNRDistanceQueryManagerAPI
and TNRAFDistanceQueryManagerAPI
.
Each manager provides three functions.
The first function is called initializeCH
(or initializeTNR
or initializeTNRAF
respectively).
This function can load the data structure prepared by the preprocessor along with the mapping file and initialize the query manager for queries.
The second function is distanceQuery
and it's purpose is to answer queries.
It expects the original IDs of the start and the goal nodes as arguments, then returns the shortest distance from start to goal.
The third function is called clearStructures
.
It function deallocates all the memory required for the data structure.
This function should be always explicitly called from your application when you will not need to use the query manager anymore.
A simple Java
application that uses the library to answer queries can be found in the javatests
subdirectory.
This application was used to test that the API is functional and everything works as intended.
You can however use this application as an example of how to use the library from Java
.
You can consult the readme for this simple application here.
Create a JAVA_HOME
system property with the absolute path to the JDK, e.g., C:\Program Files\Java\jdk-15.0.1
For more information about how to integrate this project with other programming languages, please consult the readme in the src/API
subdirectory.
The project contains Functional tests that tests the correctness of the whole application. Target: func_test_runner
.
The project has a benchmark
target that can be used to benchmark the performance of the implemented methods. Genneral usage:
./benchmark -m <method> --input-structure <input_data_structure> --query-set <query_set> [--mapping-file <mapping_file>] [-o <output_path>]
where:
<method>
is one ofdijkstra
,astar
,ch
,tnr
,tnraf
,dm
- the method being benchmarked<input_data_structure>
is path to the data structure preprocessed using the preprocessor for the selectedmethod
. For dijkstra and Astar, use the CSV format (path to folder that containsnodes.csv
andedges.csv
input_data_structure
argument.<query_set>
is path to the query set (file format described in the File Formats section below)<mapping_file>
(optional) is path to the mapping file (file format described in the File Formats section below), which will be used to transform node IDs from the query set to the corresponding node IDs used by the query algorithms<output_path>
(optional) is path to the output file for the computed distances
For benchmarking, you will need a set of queries which can either use IDs in the range from 0 to n-1 (where n is the number of the nodes in the graph) or you can use arbitrary integer node IDs. In the second case, you also need to provide a mapping file (see next section). The formats of the query set file and the mapping file are described in the Input/Output File Formats section below.
During benchmarking, all of your queries are answered using a chosen method. Afterwards, the total time need to answer all the queries in seconds is printed alongside the average time needed to answer one query in milliseconds. Additionally, you can specify an output file, where the computed distances will be stored. Those distances can then be used for verification of the correctness of the more complex methods.
Having the PROJ utility installed is required for A* benchmarking. Path to PROJ directory needs to
be provided in an environment variable PROJ_DATA
, eg. like this:
PROJ_DATA=/usr/share/proj ./shortestPathsPreprocessor benchmark -m astar [options]
You can also copy the proj.db
file into the working directory instead.
If you provide the output_path
argument, the application will output a plain text file that will contain the name of the query set file used for the benchmark on the first line, and then on each following line, the result of each
query.
You can use these files to validate that various methods return the same results.
In this part, we will describe all the input formats that can be used with the preprocessor application. The first three formats are input formats for directed weighted graphs that can be used as input for the creation of data structures for one of the methods or for the benchmarking of the dijkstra algorithm. The other two formats are related only to benchmarking.
If you need to get information about output file formats, please look here for a description of all file formats used by the library.
Probably the simplest format for directed weighted graphs. Graphs are represented as plain text files in this format. The graph file looks as follows:
- It begins with a line
XGI n e
whereXGI
is a fixed string which serves as a magic constant.n
ande
are positive integers denoting the number of nodes and edges, respectively. - After that, exactly
e
lines follow in the formats t w f
, where each line represents one edge. Here:s
is the source node of the edge andt
is the target node of the edge. Both these values must be in the range from 0 ton
-1 (nodes are indexed from 0).w
is the weight of the edge, in our case a positive integer andf
is a flag denoting whether the edge is a one way edge. It must be either 1 or 0. Iff
is equal to 1, the edge is a one way edge and only an edge froms
tot
is added to the graph. Iff
is 0, we treat the edge as a bidirectional edge and therefore two edges are added into the graph, an edge froms
tot
and an edge fromt
tos
, both with the weightw
.
The expected suffix for XenGraph files is .xeng
although it is not enforced.
This input format was used during the 9th DIMACS Implementation Challenge on shortest paths. The graph file is a plain text file and it looks as follows:
- Lines beginning with the character
c
can occur anywhere in the file. Those lines are comment lines and are skipped during loading. - First line of the file not starting with the character
c
must be in the formatp sp <nodes> <edges>
, where:p sp
is a fixed string which serves as a magic constant,<nodes>
is a positive integer denoting the number of nodes, and<edges>
is another positive integer denoting the number of edges.
- After that, there must be exactly
e
lines of the formata s t w
, where each line represents one edge. Here:a
is a fixed character constant indicating that the line describes one edge (as opposed toc
representing a comment line),s
is the source node of the edge, andt
is the target node of the edge. In this format, nodes are indexed from 1, so boths
andt
must be in the range from 1 ton
(Internally, nodes are indexed from 0, so during loading, each node ID is automatically decremented). In this format, each edge is considered to be a directional edge, so if we want a bidirectional edge betweens
andt
, we must provide two lines, one for an edge froms
tot
and one for an edge fromt
tos
, both with the same weight.
The expected suffix for DIMACS graph files is .gr
although it is not enforced.
One of the three input formats that can be used to describe the input graph for the preprocessing.
A CSV file that represents an adjacency matrix of a graph.
- The file is expected to not have a header.
- The file must have its amount of rows equal to its amount of columns.
- Each value on row
i
and columnj
represents the weight of a directed edge from nodei
to nodej
. - Weights are expected to be valid positive integers, or
nan
for where there is no edge joining the nodes.
A pair of files named nodes.csv
and edges.csv
located in the same directory.
- Path to the directory containing these files must be specified in the
input_path
argument. nodes.csv
represents a list of alln
nodes identified by numbers from 0 ton
-1 (id
column) and optionally (for A* benchmarking) columnsx
(longitude) andy
(latitude).edges.csv
is a list of edges. The required columns areu
(source edge identifier),v
(target edge identifier) andcost
(edge weight, non-negative integer or floating point number).- In this format, each edge is considered to be a directional edge, so if we want a bidirectional edge between
u
andv
, we must provide two lines, one for an edge fromu
tov
and one for an edge fromv
tou
, both with the same weight. - Values must be separated by a Tab character (
\t
).
For easier benchmarking, the user can provide a set of queries which will be used for the benchmark in a very simple plain text format:
- The file begins with a line that contains only a single integer
cnt
denoting the number of queries. - The first line is followed by exactly
cnt
lines in the formats g
each denoting one query.s
andg
are both positive integers. When benchmarking without mapping,s
andg
must be in the range from 0 ton
-1 wheren
is the number of nodes in the graph. When benchmarking with mapping,s
andg
can be arbitrary positive integers as long as they fit in the long long unsigned int datatype (usually 64 bit).
If your application internally for some reason uses some node IDs that are not in the range from 0 to n
-1
(where n
is the number of nodes in the graph), you can use a mapping file that will allow the application to
transform node IDs from your application to IDs in the range from 0 to n
-1.
Using those mapping files, you can let the C++ application do all the mapping work, so will not need to change the
IDs in your application.
The mapping file is a plain text file in the following format:
- The file begins with a line
XID n
whereXID
is a fixed string which serves as a magic constant andn
is the amount of nodes in the graph. - The first line is followed by exactly
n
lines each only containing one integeri
. Thej
-th line represents the original ID of the(j-2)
-th node. So the second line contains the original ID of the node with the ID 0 in our application. The third line contains the original ID of the node with the ID 1, and so on up to the linen+1
contains the original ID for the node with the IDn-1
in our application.
Integration of this library into an application written in languages other than C++
and Java
is also possible,
but it will require you to generate new 'glue code' for the desired language using a tool called SWIG
.
You will also need to change the CMakeLists.txt
file in order to compile the library for such usage.
The steps required for the integration with a different language are briefly described here.
If you want to use the library with SiMoD, you can consult the Amod readme for step-by-step instructions on how to accomplish this.
Most of the code in this project was written as a part of the master's thesis of Michal Cvach.
The code for the distance matrix computation was taken from the APSP-in-parallel project