Explore: Efficient Transformers #20

msrepo · 2023-05-07T12:56:48Z

Description

Currently, we are struggling to train transformer models for inputs larger than 128^3 volume. Within the volumetric grid representation of shape, the resolution achievable from this may not be enough. For example, for large bone shapes such as hip and ribs, we have had to use voxel resolution of ~2.5mm which is very coarse.
We might be able to output a more high resolution image if the transformers were lighter in terms of memory usage.

Proposal

Use a 2D encoder and a 3D decoder. How do we execute this using transformer? A case in point how do we port the ideas from TransVert, 3DReconNet into pure transformers. How do we concatenate features from the two parallel x-ray branches and decode a 3D output?
Implement/port Conv+transformer
These might be more efficient in terms of memory usage.
Try:
implement/port various lightweight transformers and other interesting ideas

3D Medical Axial transformer, CoTr and AFTer-UNet seem promising

msrepo self-assigned this May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore: Efficient Transformers #20

Explore: Efficient Transformers #20

msrepo commented May 7, 2023

Explore: Efficient Transformers #20

Explore: Efficient Transformers #20

Comments

msrepo commented May 7, 2023

Description

Proposal