Skip to content
/ ViT-LR Public

A latent replay model using Vision Transformer (ViT) as backbone

Notifications You must be signed in to change notification settings

Dequino/ViT-LR

Repository files navigation

ViT-LR

A latent replay model using Vision Transformer (ViT) as backbone

Usage

Requires the AR1 conda enviroment found here and the Core50 database found here.

Edit the Code50 root folder path in the source code when creating the database object.

python3 test.py

Prints the model structure and parameters.

python3 vitlr.py

Trains the model on the Core50 dataset, nicv2_391 scenario. A tensorboard logs folder and a state dict will be generated.

python3 vitlr_stats.py

Prints the average training speed of single image batch

Jupyter notebooks:

  1. Confusion matrix Prints the confusion matrix of a trained model, requires the output state dict generated by vitlr.py
  2. Attention rollout Prints a visual representation of the attention rollout, mostly experimental, requires a state dict.

How to edit the model

You can edit the number of transformer blocks and the depth of the replay layer by editing models/vit.py.

  • Line 53/54: insert the depth of the latent replay layer (number of blocks frozen + 1). For example, if you want to train after the sixth transformer block (and freeze all the preceding ones) modify it with:
            elif i <= 7:
                lat_list.append(layer)
  • Line 55/56: insert the total number of transformer blocks the model is going to use (number of blocks + 1). For example, if you want a model with 8 transformer blocks, modify it with:
            elif i > 9 and i < 14:
                continue

The current model uses a base 12-blocks vision transformer, therefore the maximum amount of transformer blocks allowed is 12.

It is possible to use more than 12 blocks by extending the base model. Go into models/model.py and modify the model_name from B_32_imagenet1k to L_32_imagenet1k. This model uses by default 24 self-attention blocks instead of 12. By doing so, it is required to also modify the models/vit.py if statements (15 -> 27, 14 -> 26).

After modifying the model, you must specify the last frozen parameter in the params.cfg file, the freeze_below_layer argument.

You can check if the model is correct by running the test.py script, which also gives you a list of parameters you can use to find the correct layer to insert in the cfg file.

It is possible to change hyperparameters and image batch size by modifying the params.cfg file.

Repository files description

  • README.md: This file
  • attention_rollout.ipynb: jupyter notebook illustrating how to do attention rollout on a trained model
  • confusion_matrix.ipynb: jupyter notebook illustrating how to print a confusion matrix of a trained model
  • data_loader.py: Core50 data loader
  • params.cfg: configuration file
  • test.py: script printing the current model architecture and layer list
  • testload.py: test script to load a state dict (ignore it)
  • utils.py: multiple functions used by the model, including accuracy evaluation
  • vit_rollout.py: functions used to do attention rollout
  • vitcumultrain.py: (old) script used to test cumulative training (ignore it)
  • vitlr.py: main training script
  • vitlr_stats.py: training scrpit that stops early and evaluates the average training speed on single image batches
  • models/configs.py: ViT configuration file, contains all the pre-trained ViT models that can be used as base for the model
  • models/model.py: ViT base model
  • models/test.py: another test script (ignore this)
  • models/transformer.py: the transformer block, including the self-attention algorithm inside
  • models/utils.py: util functions used by the transformer
  • models/vit.py: personalized wrapper for the ViT model used by ViT-LR

About

A latent replay model using Vision Transformer (ViT) as backbone

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published