Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release FocalNet-DINO Baseline #155

Merged
merged 7 commits into from
Dec 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions docs/source/tutorials/Download_Pretrained_Weights.md
Original file line number Diff line number Diff line change
Expand Up @@ -361,3 +361,74 @@ train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth"
</details>

Please refer to [DINO](https://github.com/IDEA-Research/detrex/tree/main/projects/dino) project for more details about the usage of vit backbone.

## FocalNet
Here we borrowed the download links from the [official implementation](https://github.com/microsoft/FocalNet#imagenet-22k-pretrained) of FocalNet.

<table class="docutils"><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Model</th>
<th valign="bottom">Depth</th>
<th valign="bottom">Dim</th>
<th valign="bottom">Kernels</th>
<th valign="bottom">#Params. (M)</th>
<th valign="bottom">Download</th>
<tr><td align="left"> FocalNet-L </td>
<td align="center">[2, 2, 18, 2]</td>
<td align="center">192</td>
<td align="center">[5, 7, 9]</td>
<td align="center"> 207 </td>
<td align="center"> <a href="https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_large_lrf_384.pth">download</a> </td>
<tr><td align="left"> FocalNet-L </td>
<td align="center">[2, 2, 18, 2]</td>
<td align="center">192</td>
<td align="center">[3, 5, 7, 9]</td>
<td align="center"> 207 </td>
<td align="center"> <a href="https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_large_lrf_384_fl4.pth">download</a> </td>
<tr><td align="left"> FocalNet-XL </td>
<td align="center">[2, 2, 18, 2]</td>
<td align="center">256</td>
<td align="center">[5, 7, 9]</td>
<td align="center"> 366 </td>
<td align="center"> <a href="https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_xlarge_lrf_384.pth">download</a> </td>
<tr><td align="left"> FocalNet-XL </td>
<td align="center">[2, 2, 18, 2]</td>
<td align="center">256</td>
<td align="center">[3, 5, 7, 9]</td>
<td align="center"> 207 </td>
<td align="center"> <a href="https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_xlarge_lrf_384_fl4.pth">download</a> </td>
<tr><td align="left"> FocalNet-H </td>
<td align="center">[2, 2, 18, 2]</td>
<td align="center">352</td>
<td align="center">[5, 7, 9]</td>
<td align="center"> 687 </td>
<td align="center"> <a href="https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_huge_lrf_224.pth">download</a> </td>
<tr><td align="left"> FocalNet-H </td>
<td align="center">[2, 2, 18, 2]</td>
<td align="center">352</td>
<td align="center">[3, 5, 7, 9]</td>
<td align="center"> 687 </td>
<td align="center"> <a href="https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_huge_lrf_224_fl4.pth">download</a> </td>
</tr>
</tbody></table>

<details open>
<summary> <b> Using FocalNet Backbone in Config </b> </summary>

```python
# focalnet-large-4scale baseline
model.backbone = L(FocalNet)(
embed_dim=192,
depths=(2, 2, 18, 2),
focal_levels=(3, 3, 3, 3),
focal_windows=(5, 5, 5, 5),
use_conv_embed=True,
use_postln=True,
use_postln_in_modulation=False,
use_layerscale=True,
normalize_modulator=False,
out_indices=(1, 2, 3),
)
```
</details>
16 changes: 16 additions & 0 deletions docs/source/tutorials/Model_Zoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,22 @@ Here we provides our pretrained baselines with **detrex**. And more pretrained w
<td align="center">100</td>
<td align="center">58.5</td>
<td align="center"> <a href="https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/dino_swin_large_384_5scale_36ep.pth"> model </a></td>
<tr><td align="left"> <a href="https://github.com/IDEA-Research/detrex/blob/main/projects/dino/configs/dino_focalnet_large_lrf_384_4scale_12ep.py"> DINO-ViTDet-Large-4scale </a> </td>
<td align="center">ViT</td>
<td align="center">IN1k, MAE</td>
<td align="center">50</td>
<td align="center">100</td>
<td align="center">57.5</td>
<td align="center"> <a href="https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/dino_focal_large_lrf_384_4scale_12ep.pth"> model </a></td>
</tr>
<tr><td align="left"> <a href="https://github.com/IDEA-Research/detrex/blob/main/projects/dino/configs/dino_focalnet_large_lrf_384_4scale_12ep.py"> DINO-ViTDet-Large-4scale </a> </td>
<td align="center">ViT</td>
<td align="center">IN1k, MAE</td>
<td align="center">50</td>
<td align="center">100</td>
<td align="center">58.0</td>
<td align="center"> <a href="https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/dino_focal_large_lrf_384_fl4_4scale_12ep.pth"> model </a></td>
</tr>
</tr>
<tr><td align="left"> <a href="https://github.com/IDEA-Research/detrex/blob/main/projects/dino/configs/dino_vitdet_base_4scale_12ep.py"> DINO-ViTDet-Base-4scale </a> </td>
<td align="center">ViT</td>
Expand Down
36 changes: 33 additions & 3 deletions projects/dino/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heun
<td align="center">56.9</td>
<td align="center"> <a href="https://github.com/IDEA-Research/detrex-storage/releases/download/v0.1.1/dino_swin_large_4scale_12ep.pth">model</a></td>
</tr>
<tr><td align="left"><a href="configs/dino_swin_large_384_5scale_12ep.py">DINO-Swin-L-384-4scale</a></td>
<tr><td align="left"><a href="configs/dino_swin_large_384_5scale_12ep.py">DINO-Swin-L-384-5scale</a></td>
<td align="center">Swin-Large-384</td>
<td align="center">IN22k to IN1k</td>
<td align="center">12</td>
Expand Down Expand Up @@ -162,7 +162,36 @@ Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heun
</tr>
</tbody></table>

**Pretrained DINO with Pure ViT Backbone**
**Pretrained DINO with FocalNet Backbone**
<table><tbody>
<th valign="bottom">Name</th>
<th valign="bottom">Backbone</th>
<th valign="bottom">Pretrain</th>
<th valign="bottom">Epochs</th>
<th valign="bottom">Denoising Queries</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">download</th>
<tr><td align="left"><a href="configs/dino_focalnet_large_lrf_384_4scale_12ep
ep.py">DINO-Focal-Large-4scale</a></td>
<td align="center">FocalNet-384-LRF-3Level</td>
<td align="center">IN22k</td>
<td align="center">12</td>
<td align="center">100</td>
<td align="center">57.5</td>
<td align="center"> <a href="https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/dino_focal_large_lrf_384_4scale_12ep.pth">model</a></td>
</tr>
<tr><td align="left"><a href="configs/dino_focalnet_large_lrf_384_4scale_12ep
ep.py">DINO-Focal-Large-4scale</a></td>
<td align="center">FocalNet-384-LRF-4Level</td>
<td align="center">IN22k</td>
<td align="center">12</td>
<td align="center">100</td>
<td align="center">58.0</td>
<td align="center"> <a href="https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/dino_focal_large_lrf_384_fl4_4scale_12ep.pth">model</a></td>
</tr>
</tbody></table>

**Pretrained DINO with ViT Backbone**
<table><tbody>
<th valign="bottom">Name</th>
<th valign="bottom">Backbone</th>
Expand Down Expand Up @@ -211,7 +240,8 @@ Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heun

**Note**:
- `Swin-X-384` means the backbone pretrained resolution is `384 x 384` and `IN22k to In1k` means the model is pretrained on `ImageNet-22k` and finetuned on `ImageNet-1k`.
- ViT backbone using MAE pretraining weights following [ViTDet](https://github.com/facebookresearch/detectron2/tree/main/projects/ViTDet) which can be downloaded in [MAE](https://github.com/facebookresearch/mae).
- ViT backbone using MAE pretraining weights following [ViTDet](https://github.com/facebookresearch/detectron2/tree/main/projects/ViTDet) which can be downloaded in [MAE](https://github.com/facebookresearch/mae). And it's not stable to train ViTDet-DINO without warmup lr-scheduler.
- `Focal-LRF-3Level`: means using `Large-Receptive-Field (LRF)` and `Focal-Level` is setted to `3`, please refer to [FocalNet](https://github.com/microsoft/FocalNet) for more details about the backbone settings.

**Notable facts and caveats**: The position embedding of DINO in detrex is different from the original repo. We set the tempureture and offsets in `PositionEmbeddingSine` to `10000` and `-0.5` which may make the model converge a little bit faster in the early stage and get a slightly better results (about 0.1mAP) in 12 epochs settings.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from .dino_focalnet_large_lrf_384_4scale_12ep import (
train,
dataloader,
optimizer,
lr_multiplier,
model,
)

from detectron2.layers import ShapeSpec


# modify training config
train.init_checkpoint = "/path/to/focalnet_large_lrf_384_fl4.pth"
train.output_dir = "./output/dino_focalnet_large_fl4_5scale_12ep"

# convert to 4 focal-level
model.backbone.focal_levels = (4, 4, 4, 4)
model.backbone.focal_windows = (3, 3, 3, 3)

# convert to 5 scale output features
model.backbone.out_indices = (0, 1, 2, 3)
model.neck.input_shapes = {
"p0": ShapeSpec(channels=192),
"p1": ShapeSpec(channels=384),
"p2": ShapeSpec(channels=768),
"p3": ShapeSpec(channels=1536),
}
model.neck.in_features = ["p0", "p1", "p2", "p3"]
model.neck.num_outs = 5
model.transformer.num_feature_levels = 5
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from detrex.config import get_config

from .dino_focalnet_large_lrf_384_fl4_5scale_12ep import (
train,
dataloader,
optimizer,
model,
)

# using 36ep scheduler
lr_multiplier = get_config("common/coco_schedule.py").lr_multiplier_36ep

# modify training config
train.max_iter = 270000
train.init_checkpoint = "/path/to/focalnet_large_lrf_384_fl4.pth"
train.output_dir = "./output/dino_focalnet_large_fl4_5scale_36ep"