This repository is related to the publication "Improving Vehicle Re-Identification using CNN Latent Spaces: Metrics Comparison and Track-to-track Extension" (https://arxiv.org/abs/1910.09458). This paper is a postprint of the paper submitted and accepted to IET Computer Vision (https://digital-library.theiet.org/content/journals/iet-cvi).
We defined track of vehicle as a set of images of a vehicle recorded by a given camera.
For a given image , we extract its latent representation (LR) by projecting it in the latent space of a neural network (in our experiments, the second-to-last layer of a CNN).
We construct the matrix , the LR of the track
Given a distance metric , a query track , and a set of test tracks
The track-to-track ranking (T2T) process consists in ranking every track of to construct an ordered set , such that a track is the nearest track from the query according to the distance function , being the first match (i.e. the nearest) and , being the last (i.e. the farthest).
The image-to-track ranking (I2T) corresponds to the T2T ranking procedure but with a query track composed of only one image , and its corresponding LR (only the distance metric d used differs).
In I2T ranking process the distance is computed between a query composed of one image , and a test track .
- MED : Minimal Euclidean Distance
- MCD : Minimal Cosine Distance
- RSCR : Residual of the Sparse Coding Reconstruction
with
In track-to-track (T2T) ranking process the distance is computed between a query track and a test track .
- If the distance metric is based on MED or MCD, an aggregation function is used to aggregate the set of I2T distances () between each of the query and the test track :
- min : minimum of distances
- mean : average of distances
- med : median of distances
- mean50 : average of distances between the 50% smallest distances
- med50 : average of distances between 50% smallest distances
with
Note : denotes the Frobenius norm
The python package vehicle_reid
contains code for :
- Extract the latent representation of images of vehicle using the second-to-last layer of our CNN fine-tuned in the task of vehicle recognition as proposed in our paper. The CNN considered here is based on the DenseNet201 (https://arxiv.org/abs/1608.06993) architectures which has been fine-tuned using the VeRI dataset (https://github.com/VehicleReId/VeRidataset). Corresponding weights are given in
"data/cnn_weights/VeRI_densenet_ft50.pth"
- Compute the Ranking Vehicle Re-identification between tracks of vehicle using the various distance metrics studied in the paper.
- Compute the performance metrics rank1 rank5 and mAP.
The module vehicle_reid
is composed of 3 modules
latent_representation.py
- Extract latent representation (LR) of each track of vehicle -> return a json file containing the LR for each track
ranking.py
- Compute the ranking for each query track -> return a json file containing the ranking for each query track
performance.py
- Compute the performance metrics. namely rank1 rank5 and mAP (See paper for details)
- numpy==1.19.2
- torchvision==0.7.0
- torch==1.6.0
- scikit_learn==0.23.2
The directory data
contains data to test the module vehicle_reid
. Note that to perform the VeRI experiments presented on the paper, you'll need the VeRI dataset which can be found, by simple request to authors, here : https://github.com/JDAI-CV/VeRidataset
- data/cnn_weights/VeRI_densenet_ft50.pth : Pre-trained weights for the DenseNet201 architecture. The model has been trained to classify vehicles of training set the VeRI dataset. Only its latent space (the second-to-last layer) is used to extract features
- data/image_sample : some VeRI tracks of vehicle (splitted in query and test).
python3 run_example.py