-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add PoseFormer backbone #1215
base: dev-0.26
Are you sure you want to change the base?
Conversation
* update mmvc installation CI and doc * fix lint
* add derepcation message for deploy tools * change import warnings positions * do yapf * do isort
* hrformer * modify cfg * update url and readme for hrformer. * add readme for hrformer papar * modify reaadme * fix publish year Co-authored-by: ly015 <[email protected]>
* clean inference code * LoadImageFromFile supports given img
…b#1161) co-authored-by: ly015 <[email protected]>
…en-mmlab#1214) * switch to open-mmlab/pre-commit-hooks * deprecate .dev_scripts/github/update_copyright.py
* add windows-ci * reduce input size to save memory * fix codecov * reduce input size to save memory * skip voxelpose unittest for Windows * remove win+cuda test
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev-0.26 #1215 +/- ##
============================================
+ Coverage 83.44% 84.13% +0.69%
============================================
Files 205 217 +12
Lines 16625 17813 +1188
Branches 2976 3158 +182
============================================
+ Hits 13872 14987 +1115
- Misses 2001 2030 +29
- Partials 752 796 +44
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
from .base_backbone import BaseBackbone | ||
|
||
|
||
class MultiheadAttention(BaseModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmcv also has MultiheadAttention module. Can we use that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This MultiheadAttention module is adapted from mmcls. Compared with the MultiheadAttention in mmcv, its counterpart in mmcls is more similar to the implementation in the official PoseFormer repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give mmcv Transformer blocks a try. However, if the output and the model weights of the re-implemented PoseFormer based on mmcv Transformer blocks could not fit the official version of PoseFormer, keeping the mmcls Transformer modules might be a good choice.
num_joints=17, | ||
in_chans=2, | ||
embed_dim_ratio=32, | ||
depth=4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the depth
for spatial and temporal transformer always the same? If not, do we need to distinguish them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the official implementation of PoseFormer, the spatial and temporal transformer share the same depth.
However, I think it's obviously better to distinguish the two depths for clarity and I will follow this comment.
if norm_cfg is None: | ||
norm_cfg = dict(type='LN') | ||
# Temporal embed_dim is num_joints * spatial embedding dim ratio | ||
embed_dim = embed_dim_ratio * num_joints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does embed_dim
mean the embedding dimension for each frame? How about using embed_dim_per_frame
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I use spatial_embed_dim and temporal_embed_dim instead of embed_dim_ratio and embed_dim.
init_cfg=init_cfg) for i in range(depth) | ||
]) | ||
|
||
self.blocks = nn.ModuleList([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does self.blocks
mean the temporal transformer blocks? How about using self.temporal_blocks
for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.temporal_blocks is a better choice.
return x | ||
|
||
|
||
class TransformerEncoderLayer(BaseModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmcv has the similar layer BaseTransformerLayer
module under mmcv/cnn/bricks/transformer.py
, which also includes MultiheadAttention
. Can we use that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TransformerEncoderLayer module is adapted from mmcls. Compared with the BaseTransformerLayer in mmcv, its counterpart in mmcls is more similar to the implementation in the official PoseFormer repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give mmcv Transformer blocks a try. However, if the output and the model weights of the re-implemented PoseFormer based on mmcv Transformer blocks could not fit the official version of PoseFormer, keeping the mmcls Transformer modules might be a good choice.
Drop path rate / weight_decay
Change samples_per_gpu to 128
Sorry, I have resigned from the Shanghai AI Lab for a long time. I'm busy working on other projects so I don't have much time for this. If someone could help me finish this feature, I will be always willing to help. |
Motivation
Modification
Add backbone, head and config of the PoseFormer (ICCV 2021) into the repository.
BC-breaking (Optional)
Use cases (Optional)
Checklist
Before PR:
After PR: