Skip to content

A repo with code for Ad Deduplication using pretrained CLIP given an ad title and thumbnail.

Notifications You must be signed in to change notification settings

jpitoskas/Multimodal-Ad-Similarity

Repository files navigation

Ad Similarity

[Official Pytorch implementation]

A repo with code for Ad Deduplication using pretrained CLIP given an ad title and thumbnail.

Author: Giannis Pitoskas
E-mail: [email protected]

Instructions:

We encourage to create a virtual environment and install the project's dependencies.

Install Dependencies:

pip install -r requirements.txt

Train with Default Arguments

python src/main.py

Inference with Default Arguments

python src/main.py --inference --load_model_id [model_id]

Arguments

--batch_size 1
--n_epochs 5
--lr 5e-4
--weight_decay 0.2
--beta1 0.9
--beta2 0.98
--adam_epsilon 1e-6
--inference
--no_cuda
--seed 1
--load_model_id None
--fbeta 0.75
--num_workers 1
--evaluation_metric f1_score
--inference_similarity_threshold 0.905
--n_pairs_train 10000
--n_pairs_val 2500
--n_pairs_test 2500
--positive_percentage_train 0.5
--positive_percentage_val 0.5
--positive_percentage_test 0.5
--pretrained_model_name openai/clip-vit-base-patch32
--margin 1.0

Example Custom Run for Training

python src/main.py --n_epochs 10 --lr 5e-3 --seed 42

Example Custom Run for Inference

python src/main.py --inference --load_model_id [model_id] --batch_size 64 --inference_similarity_threshold f1_score

Dataset:

  • for the text data: data/dataset/data.txt
  • for the image data: data/dataset/images

Download our Pre-trained Model Checkpoint:

You can download our pre-trained model checkpoint from the following link:

Download Model Checkpoint (.pt file)

Checkpoint Naming Convention

The checkpoint will have a filename in the format checkpoint_{ID}.py, where ID corresponds to the load_model_id argument of the main script.

Directory Structure

Place the downloaded checkpoint file in the following directory structure within your project:

experiments/
    ├── Model_{ID}/
    │   └── checkpoint_{ID}.pt

Replace id in both Model_{ID} and checkpoint_{ID}.pt with the respective model identifier load_model_id. Ensure the checkpoint file is located within the appropriate /Model_{ID} directory.

Usage Instructions

After downloading the model checkpoint, you can:

  • use it for further training/fine-tuning:

    python src/main.py --load_model_id [model_id]
  • use it for inference:

    python src/main.py --inference --load_model_id [model_id]

Single Ad Pair Prediction

This is a python script to play around with for determining whether two ads are similar or not:

python src/predict_ad_pair.py --text_filepath1 [text_filepath1] --text_filepath2 [text_filepath2] --image_filepath1 [image_filepath1] --image_filepath2 [image_filepath2]

About

A repo with code for Ad Deduplication using pretrained CLIP given an ad title and thumbnail.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published