[Official Pytorch implementation]
A repo with code for Ad Deduplication using pretrained CLIP given an ad title and thumbnail.
Author: Giannis Pitoskas
E-mail: [email protected]
We encourage to create a virtual environment and install the project's dependencies.
pip install -r requirements.txt
python src/main.py
python src/main.py --inference --load_model_id [model_id]
--batch_size 1
--n_epochs 5
--lr 5e-4
--weight_decay 0.2
--beta1 0.9
--beta2 0.98
--adam_epsilon 1e-6
--inference
--no_cuda
--seed 1
--load_model_id None
--fbeta 0.75
--num_workers 1
--evaluation_metric f1_score
--inference_similarity_threshold 0.905
--n_pairs_train 10000
--n_pairs_val 2500
--n_pairs_test 2500
--positive_percentage_train 0.5
--positive_percentage_val 0.5
--positive_percentage_test 0.5
--pretrained_model_name openai/clip-vit-base-patch32
--margin 1.0
python src/main.py --n_epochs 10 --lr 5e-3 --seed 42
python src/main.py --inference --load_model_id [model_id] --batch_size 64 --inference_similarity_threshold f1_score
- for the text data:
data/dataset/data.txt
- for the image data:
data/dataset/images
You can download our pre-trained model checkpoint from the following link:
Download Model Checkpoint (.pt file)
The checkpoint will have a filename in the format checkpoint_{ID}.py
, where ID
corresponds to the load_model_id
argument of the main script.
Place the downloaded checkpoint file in the following directory structure within your project:
experiments/
├── Model_{ID}/
│ └── checkpoint_{ID}.pt
Replace id in both Model_{ID}
and checkpoint_{ID}.pt with the respective model identifier load_model_id
. Ensure the checkpoint file is located within the appropriate /Model_{ID}
directory.
After downloading the model checkpoint, you can:
-
use it for further training/fine-tuning:
python src/main.py --load_model_id [model_id]
-
use it for inference:
python src/main.py --inference --load_model_id [model_id]
This is a python script to play around with for determining whether two ads are similar or not:
python src/predict_ad_pair.py --text_filepath1 [text_filepath1] --text_filepath2 [text_filepath2] --image_filepath1 [image_filepath1] --image_filepath2 [image_filepath2]