A large number of real-world optimization problems can be formulated as Mixed IntegerLinear Programs (MILP). MILP solvers expose numerous configuration parameters to controltheir internal algorithms. Solutions, and their associated costs or runtimes, are significantlyaffected by the choice of the configuration parameters, even when problem instances havethe same number of decision variables and constraints. On one hand, using the default solverconfiguration leads to suboptimal solutions. On the other hand, searching and evaluatinga large number of configurations for every problem instance is time-consuming and, insome cases, infeasible. In this study, we aim to predict configuration parameters for unseenproblem instances that yield lower-cost solutions without the time overhead of searching-and-evaluating configurations at the solving time. Toward that goal, we first investigate thecost correlation of MILP problem instances that come from the same distribution when solvedusing different configurations. We show that instances that have similar costs using one solverconfiguration also have similar costs using another solver configuration in the same runtimeenvironment. After that, we present a methodology based on Deep Metric Learning to learnMILP similarities that correlate with their final solutions’ costs. At inference time, given a newproblem instance, it is first projected into the learned metric space using the trained model, andconfiguration parameters are instantly predicted using previously-explored configurationsfrom the nearest neighbor instance in the learned embedding space. Empirical results onreal-world problem benchmarks show that our method predicts configuration parametersthat improve solutions’ costs by up to 38% compared to existing approaches.
We use MongoDB as a data store solution. It helps in the MILP sampling process as well as in the deployment of learned models for inference. While model training does not require MongoDB, it is a central component for efficient storage and parallel execution of offline space exploration using SMAC.
So, as a first step, download and run MongoDB. You may also run it using the official Docker image here.
After successfully running a DB server, perform the following steps:
- Create a database called
milptunedb
. - Create collections for each dataset with the dataset name. You should have 3 collections named:
1_item_placement
,2_load_balancing
, and3_anonymous
. In addition, create a collection namedmilptune_metadata
.
You may use MongoDB Compass to make data management easier from a GUI.
- In your home directory, download this repo to your home directory.
- Download dataset and trained model from the following link
- Extract downloaded data using
tar -xzv milptune-data.tar.gz
- Copy trained models to the repository using
cp -r milptune-data/models MILPTune
. This will add trained models and learned embeddings to themodels
folder. - Import parameters configuration data (inside
milptune-data/dataset
) to the database -- each in its corresponding collection. This step will import all the configurations explored during the offline exploration in this work. The database is queried with a MILP instance embedding to gets its stored parameters configurations and their associated costs. - Inside the
MILPTune
directory, create a.env
file from.env.example
and enter database server connection credentials. - Setup the environment according to the section below.
Environment Setup There are two options to set up the environment:
(1) using conda env create -n milptune -f conda.yaml
to install dependencies in a new conda environment. Activate the environment using conda activate milptune
. After that, run python setup.py install
from inside the MILPTune directory to install the package.
or
(2) using the docker build -t milptune .
to build an image with all dependencies encapsulated. Run the docker image using docker run -it -v <local-data-dir>:/data milptune
. This also mount the downloaded data directory to /data
inside the container to access cached data.
To predict a parameters configuration for a given problem instance, simply run:
from milptune.configurator.knn import get_configuration_parameters
from milptune.scip.solver import solve_milp
configs, distances = get_configuration_parameters(
instance_file=<instance-file.mps.gz>,
dataset_name=<dataset-name>,
n_neighbors=1, n_configs=1)
solution, cost, time = solve_milp(
params=configs[0]['params'],
instance=<instance-file.mps.gz>)
where <instance-file>
is the MILP instance file in .mps
format, <dataset-name>
is the dataset to predict from (i.e. 1_item_placement
, 2_load_balancing
, or 3_anonymous
). The parameters n_neighbors
and n_configs
instruct MILPTune to predict a single parameters configuration from the nearest neighbor.
If you are running from inside docker, the <instance-file>
would be /data/<instance-file-path>
.
Also, make sure that the MongoDB server is reachable from inside the container.
The function solve_milp
calls the SCIP solver with the predicted parameters configuration and reports back the solution, cost (i.e. primal bound) and time.
There is also a helper script inside the evaluation
folder:
python evaluation/configure.py <instance-file> <dataset-name>
Data preprocessing is the step of converting all dataset instances into their bipartite graph representation and saving them in the database for the training pipeline.
This can be done using:
from milptune.train.preprocess import index_bipartite
index_bipartite("<path-to-train-instances>", "<dataset-name>")
This will preprocess all instances in the train
directory of a dataset to its corresponding collection in the database.
We provide the training script for each dataset in the models
directory.
The scripts load data from the database and caches them locally in the first epoch, but then uses the cached tensors for the subsequent epochs.
After training, the collection of milptune_metadata
needs to be populated with the necessary metadata for the trained model.
Below is the schema of the document:
{
"<dataset-name>": {
"model": "~/MILPTune/models/<dataset-name>/model.pt",
"dims": {
"embedding_dim": <embedding-dimension>,
"n_gnn_layers": <number-of-gnn-layers>,
"gnn_hidden_dim": <hidden-dimension-size>
}
}
}
After that, embeddings of the training instances are indnexed back into the database for the nearest neighbor search process.
This can be achieved by:
from milptune.train.index import index_embeddings
index_embeddings("<dataset-name")
Evaluation and comparison against the default configuration and against using SMAC can be found under evaluation/evaluate.py
.
You can run it on a single instance using:
python evaluate.py <instance-path> <dataset-name> <output-dir>
We provide evaluate.sh
to run the evaluation on all instances in a given directory using:
./evaluate.sh /path/to/dataset/valid/ <dataset-name> <output-dir>
@article{hosny2023automatic,
title={Automatic MILP solver configuration by learning problem similarities},
author={Hosny, Abdelrahman and Reda, Sherief},
journal={Annals of Operations Research},
pages={1--28},
year={2023},
publisher={Springer}
}
BSD 3-Clause License. See LICENSE file. See licenses of some used dependencies in the milptune folder.