Add RAFT model for optical flow #5022

NicolasHug · 2021-12-02T11:38:25Z

Towards #4644

This PR adds the RAFT model, along with its basic building blocks (feature encoder, correlation block, update block, etc.) and two model builder function: raft_large() and raft_small().

The architecture (not the code!) is exactly the same as the original implementation from https://github.com/princeton-vl/RAFT, which will allow us support the original paper's weights if we want to. This architecture differs slightly from what is described in the paper. I have annotated the paper with these differences, hoping this can help the review:
RAFT- Recurrent All-Pairs Field Transforms forOptical Flow (1).pdf

A summary is this:

API

The RAFT class accepts torch.nn.Module instances as input and offers a low-level API.

RAFT(
        feature_encoder=feature_encoder,
        context_encoder=context_encoder,
        corr_block=corr_block,
        update_block=update_block,
        mask_predictor=mask_predictor,
    )

The model builder functions raft_large() and raft_small() are higher-level and do not require any parameter. They can however take as input the same parameters as the RAFT class, so as to override their defaults. E.g.:

model = raft_large(mask_predictor=MyOwnMaskPredictor())  # all the other building blocks will use the default architecture

The building blocks like FeatureEncoder, ResidualBlock, UpdateBlock, etc. are (sort of publicly) available in torchvision.models.optical_flow.raft, but are not exposed in __all__.

Still left TODO, here or in follow-up PRs:

Add an expect test for both raft() and raft_small()
Add better docs
Add type annotations (🥲)

In follow up PRs I will also submit the training reference and associated transforms.

I will write review comments below to hightlight important bits, or things where I'm not to sure what the best way is.

cc @datumbox

facebook-github-bot · 2021-12-02T11:38:32Z

💊 CI failures summary and remediations

As of commit f077d7c (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug

Here are my initial comments @fmassa @datumbox @haooooooqi !

NicolasHug · 2021-12-02T11:40:20Z

torchvision/models/optical_flow/__init__.py

@@ -0,0 +1 @@
+from ._raft.raft import RAFT, raft, raft_small


We should expose the building blocks as well, stuff like ResidualBlock, FeatureEncoder, etc.

Should we expose these in a models.optical_flow.raft namespace?

I remember @datumbox mentioning a few issues when we have both a module and a function with the same name. Ideally, I'd like to keep the raft() name for the function builder, but I'm happy to get your thoughts on this

Blocking:

See @pmeier's #4867 (comment) around this. We might need to rename this to raft_large similar to mobilenet_v3_large and mobilenet_v3_small.

Question:

Can you talk about the choice to create a second private submodule _raft? Why is that?

FYI:

We should expose the building blocks as well, stuff like ResidualBlock, FeatureEncoder, etc.

If you check existing models, such as resnet and faster_rcnn you will see we don't expose these on the __all__ or anywhere else. We had discussions about what this means but we didn't have a consensus (public API vs developer API discussion). I would be in favour of not exposing these publicly to be consistent with everywhere else. We should discuss and resolve this once we have time in the roadmap.

Thanks for the feedback

Can you talk about the choice to create a second private submodule _raft? Why is that?

I wrote it like this out of habit but I had no intention of keeping it this way

Instead of renaming raft() to raft_large(), would it be OK instead to rename the .py file to something like raft_core.py ? Or raft_implem.py?

We would have raft(), raft_small() and RAFT available from torchvision.models.optical_flow and then we would be able to access the other building blocks like ResidualBlock in torchvision.models.optical_flow.raft_core -- I would make sure to exclude them from __all__.

Would that be OK?

Though calling it raft_large would make it consistent with other small/large models, I can see that on the paper they don't actually call it large. They do use Raft-s or "small" for the small version, so that's canonical at least. So I understand why you want to avoid naming it like that.

Renaming the file should be OK but @pmeier should confirm. Though I'm not sure what the actual name should be. We typically name the files after their algorithm and we let the model builder have extra info on the variant.

@fmassa thoughts on this?

OK, I'll use raft.py and raft_large() for now to move forward. We can revisit later if needed, thanks for the input

Not commenting on the name, because I don't have an opinion on that. We just need to make sure that we don't have an attribute with the same name as a module in the same namespace. Otherwise we can no longer access the module. Thus having raft.py and def raft() is problematic, but raft.py and raft_large() is fine.

NicolasHug · 2021-12-02T11:41:22Z

torchvision/models/optical_flow/_raft/raft.py

+
+
+class ResidualBlock(nn.Module):
+    # This is pretty similar to resnet.BasicBlock except for one call to relu, and the bias terms


We can probably merge this with the resnet implementation (same for BottleneckBlock below)... but this might make the API awkward and bloated. For now I'd say it's simpler and safer to keep these implementations separate

FYI: Agreed. One more reason why we shouldn't expose this on the __all__.

Let's keep this here as is

NicolasHug · 2021-12-02T11:42:01Z

torchvision/models/optical_flow/_raft/raft.py

+        self.convflow1 = ConvNormActivation(2, flow_layers[0], norm_layer=None, kernel_size=7)
+        self.convflow2 = ConvNormActivation(flow_layers[0], flow_layers[1], norm_layer=None, kernel_size=3)


Does it make sense to use a ConvNormActivation while passing norm_layer=None? No strong opinion from me

Nit: It's up to you. The class was recently updated to support it but I checked and so far all the uses of ConvNormActivation currently pass a non-None value. So you would be the first to use this idiom. It saves you a few lines of code, but not too much.

I'm ok with this.

NicolasHug · 2021-12-02T11:43:19Z

torchvision/models/optical_flow/_raft/raft.py

+    flow_head_hidden_size,
+    # Mask predictor
+    use_mask_predictor,
+    **kwargs,


kwargs are here to override the RAFT class parameters, e.g. to override the entire feature_encoder, or the whole update_block.

NicolasHug · 2021-12-02T11:44:19Z

torchvision/models/optical_flow/_raft/raft.py

+
+
+class FeatureEncoder(nn.Module):
+    def __init__(self, *, block=ResidualBlock, layers=(64, 64, 96, 128, 256), norm_layer=nn.BatchNorm2d):


I named this layers to be sort of consistent with the ResNet class. I'm happy to consider other names though

NicolasHug · 2021-12-02T11:45:13Z

torchvision/models/optical_flow/_raft/raft.py

+
+
+class RecurrentBlock(nn.Module):
+    def __init__(self, *, input_size, hidden_size, kernel_size=((1, 5), (5, 1)), padding=((0, 2), (2, 0))):


Where possible I tried to write defaults that will correspond to the normal RAFT model (kernel_size, padding). I refrained from doing that for the input and output shapes though. Thoughts?

FYI: I don't know how much "standard" these values are. If you think multiple input_sizes and hidden_sizes will keep using the same kernels, then that's OK. From other models I think that's reasonable.

NicolasHug · 2021-12-02T11:45:57Z

torchvision/models/optical_flow/_raft/raft.py

+
+
+class MaskPredictor(nn.Module):
+    def __init__(self, *, in_channels, hidden_size, multiplier=0.25):


I'm keeping the 0.25 default for consistency with the original code, but I'm tempted to set it to 1 to encourage users not to use it

FYI: Given that TorchVision is supposed to follow very closely the papers, using the 0.25 aligns with this principle. It will also allow you to port weights easier. Personally I'm fine with how you implement and document it. I would also recommend writing a blogpost about your implementation (similar to 1, 2) as this will allow you to discuss such details more thoroughly and help new joiners understand the implementation.

NicolasHug · 2021-12-02T11:47:43Z

torchvision/models/optical_flow/_raft/raft.py

+        # As in the original paper, the actual output of the context encoder is split in 2 parts:
+        # - one part is used to initialize the hidden state of the reccurent units of the update block
+        # - the rest is the "actual" context.


This is a bit that I did not see mentioned in the paper. I wish we could separate the initialization of the hidden state from the context encoder, but this would likely prevent us from exactly reproduce the original implementation.

NicolasHug · 2021-12-02T11:48:14Z

torchvision/models/optical_flow/_raft/raft.py

+
+        batch_size, _, h, w = image1.shape
+        torch._assert((h, w) == image2.shape[-2:], "input images should have the same shape")
+        torch._assert((h % 8 == 0) and (w % 8 == 0), "input image H and W should be divisible by 8")


This "8" downscaling factor is hard-coded in different places. We should be able to generalize it in future versions.

FYI: other models have similar requirements. An example is MobileNets which use the _make_divisible() method located here. You could create a similar helper method and generalize but that's a NIT.

datumbox

@NicolasHug Great work.

I'm reviewing code practices and API, not ML validity as this is something you checked already with @haooooooqi.

I clearly marked my comments as FYI (which are just for discussion), Questions (where I just want more info), NIT (which are non-blocking and could optionally be addressed on follow up PRs) and Blocking (which I think need to be done here).

My Blocking comments are minimal as you can see below and can be addressed easily. After addressing them, ping me to approve but I would wait for the review of @fmassa prior merging.

torchvision/models/optical_flow/_raft/raft.py

datumbox · 2021-12-02T11:56:46Z

torchvision/models/optical_flow/__init__.py

@@ -0,0 +1 @@
+from ._raft.raft import RAFT, raft, raft_small


Blocking:

See @pmeier's #4867 (comment) around this. We might need to rename this to raft_large similar to mobilenet_v3_large and mobilenet_v3_small.

Question:

Can you talk about the choice to create a second private submodule _raft? Why is that?

FYI:

We should expose the building blocks as well, stuff like ResidualBlock, FeatureEncoder, etc.

If you check existing models, such as resnet and faster_rcnn you will see we don't expose these on the __all__ or anywhere else. We had discussions about what this means but we didn't have a consensus (public API vs developer API discussion). I would be in favour of not exposing these publicly to be consistent with everywhere else. We should discuss and resolve this once we have time in the roadmap.

datumbox · 2021-12-02T12:02:37Z

torchvision/models/optical_flow/_raft/raft.py

+
+
+class ResidualBlock(nn.Module):
+    # This is pretty similar to resnet.BasicBlock except for one call to relu, and the bias terms


FYI: Agreed. One more reason why we shouldn't expose this on the __all__.

torchvision/models/optical_flow/_raft/raft.py

datumbox · 2021-12-02T12:18:24Z

torchvision/models/optical_flow/_raft/raft.py

+
+
+class MaskPredictor(nn.Module):
+    def __init__(self, *, in_channels, hidden_size, multiplier=0.25):


FYI: Given that TorchVision is supposed to follow very closely the papers, using the 0.25 aligns with this principle. It will also allow you to port weights easier. Personally I'm fine with how you implement and document it. I would also recommend writing a blogpost about your implementation (similar to 1, 2) as this will allow you to discuss such details more thoroughly and help new joiners understand the implementation.

torchvision/models/optical_flow/_raft/raft.py

datumbox · 2021-12-02T12:24:25Z

torchvision/models/optical_flow/_raft/raft.py

+
+        batch_size, _, h, w = image1.shape
+        torch._assert((h, w) == image2.shape[-2:], "input images should have the same shape")
+        torch._assert((h % 8 == 0) and (w % 8 == 0), "input image H and W should be divisible by 8")


FYI: other models have similar requirements. An example is MobileNets which use the _make_divisible() method located here. You could create a similar helper method and generalize but that's a NIT.

fmassa

Thanks a lot @NicolasHug !
Approving to unblock, but check @datumbox blocking comments as well before merging.

torchvision/models/optical_flow/_raft/raft.py

fmassa · 2021-12-02T13:02:05Z

torchvision/models/optical_flow/_raft/utils.py

+
+
+def grid_sample(img, absolute_grid, *args, **kwargs):
+    """Same as torch's grid_sample, with absolute pixel coordinates instead of normalized coordinates."""


Food for thought: would it be a significant limitation if we were to return the flows in normalized coordinates, and perform all computations in normalized coordinates?

Or is it standard that flow images are in absolute coordinates?

Some of the benefits of keeping it in relative coordinates is that you don't need to multiply by the scaling factor when upsampling an image.

datumbox · 2021-12-02T14:06:13Z

torchvision/models/optical_flow/raft.py

+def raft_large(*, pretrained=False, progress=True, **kwargs):
+
+    if pretrained:
+        raise NotImplementedError("Pretrained weights aren't available yet")


Perhaps using NotImplementedError is indeed the right exception here but I would recommend for now to throw:

Suggested change

raise NotImplementedError("Pretrained weights aren't available yet")

raise ValueError(f"No checkpoint is available for model.")

This is because some tests check for this specific exception:

vision/test/test_prototype_models.py

Lines 45 to 46 in 3d8723d

if "No checkpoint is available" in msg:

pytest.skip(msg)

NicolasHug · 2021-12-02T16:25:39Z

test/test_models.py

@@ -818,5 +818,31 @@ def test_detection_model_trainable_backbone_layers(model_fn, disable_weight_load
    assert n_trainable_params == _model_tests_values[model_name]["n_trn_params_per_layer"]


+@needs_cuda
+@pytest.mark.parametrize("model_builder", (models.optical_flow.raft_large, models.optical_flow.raft_small))
+@pytest.mark.parametrize("scripted", (False, True))


I'm parametrizing over this because testing with _check_jit_scriptable unfortunately fails on very few entries, e.g.:

Mismatched elements: 153 / 11520 (1.3%) Greatest absolute difference: 0.0002608299255371094 at index (0, 0, 79, 45) (up to 0.0001 allowed) Greatest relative difference: 0.021354377198448304 at index (0, 1, 53, 68) (up to 0.0001 allowed)

I could add tol parameters to the check, but I feel like this current test is just as fine

NicolasHug · 2021-12-06T11:26:43Z

Test failures are unrelated, merging. Thanks a lot for the reviews!!

Reviewed By: NicolasHug Differential Revision: D32950937 fbshipit-source-id: 7e024dad4c3d55bc832beadfd1b3ffe867f238f3

Add RAFT

d51172f

NicolasHug added module: models new feature labels Dec 2, 2021

pytorch-probot bot added the ciflow/default label Dec 2, 2021

facebook-github-bot added the cla signed label Dec 2, 2021

NicolasHug commented Dec 2, 2021

View reviewed changes

datumbox reviewed Dec 2, 2021

View reviewed changes

NicolasHug added 4 commits December 2, 2021 12:35

Merge branch 'main' of github.com:pytorch/vision into raft_model_arch

12375b8

Minor fixes

578373b

add _init_weights() method

19629a3

weights -> pretrained

44b4290

fmassa approved these changes Dec 2, 2021

View reviewed changes

NicolasHug added 5 commits December 2, 2021 13:11

Use ConvNormActivation in MaskPredictor

7ea67ea

Use nn.Identity instead of checking for None layers

6b3af06

Extract out _compute_corr_volume method

173ccde

Use F.relu instead of torch.relu

8e444fd

Re-organize file structure and rename raft() into raft_large

c05b872

datumbox reviewed Dec 2, 2021

View reviewed changes

NicolasHug added 3 commits December 2, 2021 16:20

Added support for torchscript, and added expect test

6678cff

avoid import

b1375ab

ValueError instead of NotImplementedError

6f9bc85

NicolasHug commented Dec 2, 2021

View reviewed changes

NicolasHug mentioned this pull request Dec 3, 2021

RAFT model and training reference #4644

Closed

12 tasks

NicolasHug added 4 commits December 3, 2021 11:49

Allow higher tolerance for expectTest

f655ec6

The docssssssss

a85b6b4

fix hooks

c891a93

Fix hooks -- remastered

9ae9e38

datumbox approved these changes Dec 6, 2021

View reviewed changes

Merge branch 'main' into raft_model_arch

f077d7c

NicolasHug merged commit 01ffb3a into pytorch:main Dec 6, 2021

facebook-github-bot pushed a commit that referenced this pull request Dec 9, 2021

[fbsync] Add RAFT model for optical flow (#5022)

9279b59

Reviewed By: NicolasHug Differential Revision: D32950937 fbshipit-source-id: 7e024dad4c3d55bc832beadfd1b3ffe867f238f3

datumbox mentioned this pull request Mar 9, 2022

Standardize errors for when pre-trained weights are not available #5572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RAFT model for optical flow #5022

Add RAFT model for optical flow #5022

NicolasHug commented Dec 2, 2021 •

edited

Loading

facebook-github-bot commented Dec 2, 2021 •

edited

Loading

NicolasHug left a comment

NicolasHug Dec 2, 2021

datumbox Dec 2, 2021 •

edited

Loading

NicolasHug Dec 2, 2021 •

edited

Loading

datumbox Dec 2, 2021

NicolasHug Dec 2, 2021

pmeier Dec 2, 2021

NicolasHug Dec 2, 2021

datumbox Dec 2, 2021

fmassa Dec 2, 2021

NicolasHug Dec 2, 2021

datumbox Dec 2, 2021

fmassa Dec 2, 2021

NicolasHug Dec 2, 2021

NicolasHug Dec 2, 2021

NicolasHug Dec 2, 2021

datumbox Dec 2, 2021

NicolasHug Dec 2, 2021

datumbox Dec 2, 2021

NicolasHug Dec 2, 2021

NicolasHug Dec 2, 2021

datumbox Dec 2, 2021

datumbox left a comment

datumbox Dec 2, 2021 •

edited

Loading

datumbox Dec 2, 2021

datumbox Dec 2, 2021

datumbox Dec 2, 2021

fmassa left a comment

fmassa Dec 2, 2021

datumbox Dec 2, 2021 •

edited

Loading

NicolasHug Dec 2, 2021

NicolasHug commented Dec 6, 2021



		class ResidualBlock(nn.Module):
		# This is pretty similar to resnet.BasicBlock except for one call to relu, and the bias terms

		self.convflow1 = ConvNormActivation(2, flow_layers[0], norm_layer=None, kernel_size=7)
		self.convflow2 = ConvNormActivation(flow_layers[0], flow_layers[1], norm_layer=None, kernel_size=3)



		class FeatureEncoder(nn.Module):
		def __init__(self, *, block=ResidualBlock, layers=(64, 64, 96, 128, 256), norm_layer=nn.BatchNorm2d):



		class RecurrentBlock(nn.Module):
		def __init__(self, *, input_size, hidden_size, kernel_size=((1, 5), (5, 1)), padding=((0, 2), (2, 0))):



		class MaskPredictor(nn.Module):
		def __init__(self, *, in_channels, hidden_size, multiplier=0.25):



		def grid_sample(img, absolute_grid, args, *kwargs):
		"""Same as torch's grid_sample, with absolute pixel coordinates instead of normalized coordinates."""

	raise NotImplementedError("Pretrained weights aren't available yet")
	raise ValueError(f"No checkpoint is available for model.")

Add RAFT model for optical flow #5022

Add RAFT model for optical flow #5022

Conversation

NicolasHug commented Dec 2, 2021 • edited Loading

API

facebook-github-bot commented Dec 2, 2021 • edited Loading

💊 CI failures summary and remediations

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

NicolasHug Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox left a comment

Choose a reason for hiding this comment

datumbox Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Dec 6, 2021

NicolasHug commented Dec 2, 2021 •

edited

Loading

facebook-github-bot commented Dec 2, 2021 •

edited

Loading

datumbox Dec 2, 2021 •

edited

Loading

NicolasHug Dec 2, 2021 •

edited

Loading

datumbox Dec 2, 2021 •

edited

Loading

datumbox Dec 2, 2021 •

edited

Loading