[RFC] API For Common Layers In Torchvision #4333

oke-aditya · 2021-08-29T19:52:43Z

Picks up from discussion in #4293 (comment)

🚀 Feature

API for Commonly used Layers in building models.

Motivation

A huge code duplication is involved in building very basic blocks for large neural networks. Some of these blocks can be standardized and re-used. Also these could be offered to end user as an API so that downstream libraries can build models easily.

E.g. for duplication are SqueezeExcitation, ConvBNRelu, etc.

Pitch

Create an API called torchvision.nn or torchvision.layers
Our implementations need to be generic but not locked to certain activation functions or channels, etc.
These can be simply classes based on nn.Module.

An example

class ConvNormAct(nn.Module):
    def __init__(
        self,
        in_planes: int,
        out_planes: int,
        kernel_size: int = 3,
        stride: int = 1,
        groups: int = 1,
        dilation: int = 1,
        norm_layer: Optional[Callable[..., nn.Module]] = None,
        activation_layer: Optional[Callable[..., nn.Module]] = None,
    ) -> None: 

        super().__init__()
        #  Thsi could be even a nn.Sequential Block instead.
        padding = (kernel_size - 1) // 2 * dilation
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation, groups, bias=False)
        self.norm_layer = norm_layer
        self.activation = activation_layer

    def forward(x: Tensor):
         x = self.conv1(x)
         if self.norm_layer is not None:
              x = self.norm_layer(x)
         x = self.activation(x)
        return x

User can use this as

from torchvision import layers
l1 = layers.ConvNormAct(3, 10, norm_layer=nn.BatchNorm2d(), activation_layer=nn.Relu(inplace=True))
dummy_input = torch.randn(3, 224, 224)
out = l1(dummy_input)

Also layers can be mixed into new custom models.

E.g.

class SimpleModel(nn.Module):
    def __init__():
        self.l1 = layers.ConvNormAct(3, 10, activation_layer=nn.Relu(inplace=True))
        self.l2  = layers.ConvNormAct(10, 3, activation_layer=nn.Relu(inplace=True))

    def forward(x: Tensor):
        x = self.l1(x)
        x = self.l2(x)
        return x

Points to Consider

We have torchvision.ops then why layers?

Ops are transforms that do manipulations with pre-processing and post processing of structures such as Boxes, Masks, Anchors, etc. These are not used in "model building" but are optional steps for specific models.
Also these are

E.g. NMS, IoU, RoI, etc.

One doesn't need ops for every model.
E.g. You don't need to do RoI align, for DeTR. Or you don't computer IoU for segmentation masks.

With separate API can be clear distinction in what are layer for models and operators for tasks such as detection, segmentation.

Should `torchvision.nn` contain losses?

This is tricky, and for now I see no clear winner.
PyTorch does not differentiate the API for losses or layers.
E.g. we do nn.Conv2d which builds a convolutional layer. Also we do nn.CrossEntropy or nn.MSE which builds a loss function.

I'm not sure whether layers should be torchvision.layers or torchvision.nn (if implemented of course)

Users don't need to worry about colliding namespaces. They can do.

from torch import nn
from torchvision import nn as tnn

Note that nn seems to be the convention adopted by torchtext.

Other points to consider.

Portability: -
Currently most of the torchvision models are easily copy pastable. E.g. We can easily copy paste mobilenetv2.py file and edit it on the go to customize models.
By Adding such API we can reduce the internal code duplication but these files would no longer be single standalone files for models.
Layer Customization : -
Layer Customization has far too many options to consider.
E.g. there are several implementations possible for BasicBlock of ResNet or some slight modifications of inverted residual layer.
One can't create an implementation that will suit all the needs for everyone. If one tries to, then the API would be significantly complicated.
TorchScript: -
We shouldn't be hampering torchscript compatibility of any model while implementing above API.

Additional context

Some candidates for layers

ConvNormAct
LinearDropoutAct
SqueezeExcitation
StochasticDepth
BasicBlock
MLP Simple Multi Layer Perceptron, often duplicated in downstream library codebases E.g. Detr
FrozenBatchNorm2d Also used in DETR.
Not sure why it is under ops, it doesn't it there

Also Quantizable versions of these !

QuantizableConvNormAct
QuantizableLinearDropoutAct

Quantizable versions will allow users to directly fuse up the models.

Additionally I will recommend not hurrying this feature, we could create torchvision.experimental.nn and start working out things.(or probably I can try in a fork)
Linking plans #3911 #4187

P.S. I'm a junior developer and all my thoughts are probably step too far. So please forgive me if I'm wrong.

cc @datumbox @pmeier @NicolasHug @fmassa

The text was updated successfully, but these errors were encountered:

pmeier · 2021-08-30T06:36:12Z

I'm a junior developer and all my thoughts are probably step too far. So please forgive me if I'm wrong.

Don't worry @oke-aditya , that is what issues are for. You've put in quite some work to compile this, so even if we totally disagree (not saying that we are) you don't need to be sorry at all. Thanks a lot for your work!

I'm all for reducing duplications. For my own projects I have written ConvBNReLU quite a few times, so having this in a central location is a good thing to have. I can't say much about the other blocks that you have proposed, but even if downstream libraries don't need them, I consider reducing duplications in our own code a win.

I'm not sure whether layers should be torchvision.layers or torchvision.nn

For the naming issue I would go for torchvision.nn to follow torch.nn and torchtext.nn. In our implementations, I would promote import torchvision.nn as tvnn as canonical import to avoid possible conflicts with the import of the other domain libraries.

Should torchvision.nn contain losses?

I don't see why not. Not sure if we want to push it as feature, i.e. making it a priority to add something there, but if someone wants to contribute a loss I think we can accept this. So I would only put the general model blocks in there in the first place and see what the community wants.

Portability:-

Since torchvision is a proper library (can be installed and imported from), I wouldn't worry about the needs of someone copy-pasting our code. Plus, in the MobileNetV2 example you gave, we import internal stuff

vision/torchvision/models/mobilenetv2.py

Line 4 in 5ef75d7

from .._internally_replaced_utils import load_state_dict_from_url

so copy-pasting the code is already not possible.

Layer Customization : -

Yes, there is always a trade-off between customizability and ease-of-use. Given that we implement quite a few important models, IMO we can use this as "limit" for customizability. That means, all our stuff should be covered without some hacky workarounds or special casing. Everything beyond that probably warrants a custom implementation of the user. We can always make it more general in the future if there is some need.

Additionally I will recommend not hurrying this feature, we could create torchvision.experimental.nn and start working out things.(or probably I can try in a fork)

The standard way is to "mark" (in the documentation as well as in the release notes) a module in a prototype state. The aim of this is twofold:

We actively encourage feedback on these modules to better fit the user need. Of course we put in some thoughts in the design before we go public, but like you said: there are so many users with different needs, that it is impossible to get it right in the first try.
For prototype (and somewhat weaker also for beta) functionality, we are not bound to BC. If we not completely botch our initial thoughts we probably won't change the whole UI after the first round of iterations, but smaller UI changes are ok.

kazhang · 2021-09-20T17:49:44Z

Thanks @oke-aditya for bringing up this topic!
Just sharing my 2 cents. I think there are two kinds of APIs: developer APIs and public APIs. Developers APIs are those helper function/utils that we use within the codebase to improve dev efficiency and avoid code duplication. Public APIs are the products we offer to the community and commit on backward compatibility.

For example, I would treat layers like ConvNormAct as developer API since it's just an shortcut for composing three operators together. Can we put this kind of API under torchvision.ops.misc? and only used within torchvision?

For layers like SqueezeExcitation that are back by papers, referenced in many model architectures. We could make them public to the community. Putting them in torchvision.nn or torchvision.ops both sound good to me.

oke-aditya · 2021-09-20T18:04:12Z

Hi @kazhang I agree to your thoughts. We should be exposing only those layers that are referenced in model architectures.

It would be nice to reduce internal code duplication by keeping them under nn.misc.py or perhaps naming these classes with _ or keeping in _.py file in layers / nn folder.

I would prefer nn as to separate out model building logic from model pre / post processing / helper logic.

pmeier added module: models needs discussion new feature labels Aug 30, 2021

datumbox mentioned this issue Sep 4, 2021

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed

16 tasks

datumbox mentioned this issue Sep 15, 2021

Add RegNet Architecture in TorchVision #4403

Merged

kazhang mentioned this issue Sep 20, 2021

Allow custom activation in SqueezeExcitation of EfficientNet #4448

Merged

This was referenced Sep 29, 2021

Moving common layers to ops #4504

Merged

Replace MobileNetV3's SqueezeExcitation with EfficientNet's one #4487

Merged

datumbox closed this as completed in #4504 Sep 30, 2021

datumbox reopened this Sep 30, 2021

oke-aditya mentioned this issue Nov 1, 2021

Expose misc ops at package level #4812

Merged

datumbox mentioned this issue Feb 13, 2022

[RFC] New Ops in TorchVision #5414

Open

10 tasks

This was referenced May 19, 2022

Adding multi-layer perceptron in ops #6053

Merged

[RFC] Batteries Included - Phase 2 #5410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] API For Common Layers In Torchvision #4333

[RFC] API For Common Layers In Torchvision #4333

oke-aditya commented Aug 29, 2021 •

edited

Loading

pmeier commented Aug 30, 2021

kazhang commented Sep 20, 2021

oke-aditya commented Sep 20, 2021

[RFC] API For Common Layers In Torchvision #4333

[RFC] API For Common Layers In Torchvision #4333

Comments

oke-aditya commented Aug 29, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Points to Consider

We have torchvision.ops then why layers?

Should torchvision.nn contain losses?

Other points to consider.

Additional context

pmeier commented Aug 30, 2021

kazhang commented Sep 20, 2021

oke-aditya commented Sep 20, 2021

oke-aditya commented Aug 29, 2021 •

edited

Loading

Should `torchvision.nn` contain losses?