-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Abstractions for segmentation / detection transforms #1406
Comments
Hello,
I am not sure that combining transforms and the object being transformed is a good idea. I understand the necessity of applying the same transform with different interpolation methods but while in some generic cases, it may be easy to predict, in some corner cases, it may not work. For example, if we assume bilinear interpolation for an image and then someone comes up and says that he prefers cubic interpolation. How does that work? Do we add yet another parameter for that? But this is a transform parameter in a Tensor-like object... I am not sure that you saw my last (heavily edited) comment in #1315. I think it solves the issue at hand
Here is a more detailed version of it and adapted to meet all the above requirements. It only needs to update the existing transforms by adding:
The example below uses a dictionary of parameters for the constructor to be generic but it could be implemented with the normal "list" of parameters which can then be put in an internal dict anyway. This maintains backward compatibility. We could have a affine_augment = tv.transforms.RandomAffine({'degrees':90, 'translate':(-50, 50), 'scale':(0.5, 2)})
affine_augment_label = tv.transforms.RandomAffine({'interpolation':'nearest'}, master=affine_augment)
image = affine_augment(image)
label = affine_augment_label(label) And inside the transform we could have something like this: def __init__(params, master=None)
self.master = master
self.async_params = [] # parameters that do not follow master transform
if master is not None:
master_params = master.get_params()
for param_name in master_params.keys():
self.params[param_name] = master_params[param_name]
for param_name in params.keys:
self.params[param_name] = params[param_name]
self.async_params.append(param_name)
def __call__(input, params=None):
if self.master is not None:
master_params = master.get_params()
for param_name in master_params.keys():
if param_name not in self.async_params:
self.params[param_name] = master_params[param_name]
self.update_params() # update self.async_params
temp_params = self.params
if params is not None:
for param_name in params.keys:
temp_params[param_name] = params[param_name]
return apply_params(input, temp_params)
def get_params()
return self.params This also allows customized transforms in the training loop: affine_augment = tv.transforms.RandomAffine({'degrees':90, 'translate':(-50, 50), 'scale':(0.5, 2)})
# update params and apply transform
image = affine_augment(image)
#get current params
params = affine_augment.get_params()
params['interpolation'] = 'nearest'
#do not use updated params, apply the provided ones instead
label = affine_augment(label, params) |
PS: if you absolutely want to have one transform taking several inputs, we could have a class BundleTransform(object):
def __init__(self, *transforms):
self.transforms = transforms
def __call__(*inputs):
if len(inputs) == len(self.transforms):
return [self.transforms[i](inputs[i]) for i in range(len(self.transforms))]
affine_augment = tv.transforms.RandomAffine({'degrees':90, 'translate':(-50, 50), 'scale':(0.5, 2)})
affine_augment_label = tv.transforms.RandomAffine({'interpolation':'nearest'}, master=affine_augment)
affine_augment_bundle = BundleTransform(affine_augment, affine_augment_label)
image, label = affine_augment_bundle(image, label) |
Previously the transform objects were responsible for actually performing the transformation. You propose to move this logic to the objects that hold data. In your pipeline, This means that the transformation logic would be attached to the data objects. In the end, I think, there is no other way to solve this in a general way. After all, it's the user who knows what their data means and what does it mean to "transform it". I think that makes sense - the last code snippet in your post made me smile, this is exactly what I need! What I'm worried with, however, is the resulting complexity on the back end of this approach. Two questions here:
Honestly, the concern stems from this whole PS: Are
If the object-oriented transformation API stays the same, you can just construct your transform object with cubic interpolation. As I understand the above, nothing changes about that. Your master-follower scheme would allow constructing two separate There seem to be two ways to go: either a small patch to existing transforms, or a total makeover as proposed here. Of the two most reasonable proposals for the former, param-based (#1315) and master-follower (comment), the only difference is basically who passes the params to whom: is it the To sum up, I don't dislike this proposal - I'm just cautious about it. The question that's important to me is: what will be simpler an average PyTorch user: writing my own |
Thanks @SebastienEske and @Noiredd for your feedback! I agree with @Noiredd points about the master-worker approach proposed by @SebastienEske (and the param-based approach) , in particular
To answer @Noiredd questions
Great question, and I see two different cases here. If you live out-of-tree (i.e., not modifying torchvision), I would imagine that the user would implement a new transform by handling himself the different input types: class MyNewTransform(object):
def __call__(self, x):
if isinstance(x, CombinedObject):
out = []
for y in x:
out.append(self(y))
return out
elif isinstance(x, BoxList):
# do something here
elif isinstance(x, SegmentationMask):
# something lese
else:
# finally This doesn't require monkey-patching the torchvision objects to support your new transform, and the user controls what they want to support in a very explicit way. If your code lives in torchvision, you could extend methods to each class that you care about so that this new transform is transparently handled.
I think this should be fairly simple. A cousin approach of what I proposed is what I did in class Mesh(object):
def resize(self, ...):
pass
def transpose(self, ...):
pass
... Now, why don't we just do this and try not to mess with
torchvision would provide those classes |
Sorry for the late reply, but at this point I don't have much to add. Also, I think my understandig of the bigger picture is not clear enough to give reasonable suggestions. However, if you need help implementing (even for experimenting) anything, feel free to CC me again. |
Hello, I am ok with whatever approach you choose to use but, considering the amount of change, it's probably good to be aware of the limitations that the one your are proposing has. Allow me to play the devil's advocate one more time ;-):
That is a lot more complicated (and bug prone) than just passing a parameter to an existing function.
So if someone wants to contribute a new transform, he/she will need to implement it for all the classes? Even with a single implementation in I also realized that part of the transform is implemented in the content class
Users already do it with transforms and they are not attached to the data objects. Attaching the transforms to the data objects gives less flexibility and will inevitably be less generic. @Noiredd mentioning bad/good OOP got me thinking about it ;-) To me, having transforms manage their dependencies on their own is a lightweight approach. The only reason I can think of for having a supervisor would be in the case of distributed training. But even then, all the inputs would probably be loaded on the same GPU so it is not needed. If it were needed I would expect it to be because of memory constraints in which case the class I am far from being any expert for that but if I understand correctly, could putting the transforms in the data container violate the single responsibility principle. The new classes would be responsible for both storing/representing the data and transforming it... The need to modify every class to introduce a new transform feels like breaking the open-closed principle (the closed part of it) as well. PS: I'll try to think of something nicer for the master/slave approach by tomorrow. |
Well... I may have a better solution than the master/slave. Only I am surprised that it was not proposed before. Maybe it was and there is something wrong with it. I guess you'll tell me. Why not allow a transform to take lists of parameters and inputs? # nbtrans is the number of transforms/inputs that the user wants to define
def __init__(param1, param2, ..., nbtrans=None)
# get value of nbtrans
self.nbtrans = nbtrans
# check the other inputs and infer nbtrans value from them if needed
if isinstance(param1, list):
if self.nbtrans == None:
self.nbtrans = len(param1)
elif self.nbtrans != len(param1):
raise ValueError('inconsistent number of parameters')
if isinstance(param2, list):
if self.nbtrans == None:
self.nbtrans = len(param2)
elif self.nbtrans != len(param2):
raise ValueError('inconsistent number of parameters')
...
if self.nbtrans == None:
self.nbtrans = 1
# create the proper set of parameters
if isinstance(param1, list):
self.param1 = param1
else:
self.param1 = [param1 for i in range(self.nbtrans)]
if isinstance(param2, list):
self.param2 = param2
else:
self.param2 = [param2 for i in range(self.nbtrans)]
...
def __call__(input):
if isinstance(input, list):
if len(input) = self.nbtrans:
# update internal state if needed
...
# compute the transformation for each input/set of parameters
return [apply_trans(input[i], i) for i in range(len(input))]
elif self.nbtrans == 1:
# update internal state if needed
...
# compute the transformation for each input/set of parameters
return [apply_trans(input[i], 0) for i in range(len(input))]
else:
raise ValueError('wrong number of inputs')
else:
# should we throw an exception if self.nbtrans>1?
# update internal state if needed
...
return apply_trans(input, 0) If a parameter was originally a list we can simply test if it has become a list of lists. And we don't need to rewrite boxes = torch.rand(10, 4)
masks = torch.rand(1, 100, 200)
image = torch.rand(3, 100, 200)
transforms = Compose([
RandomResizeCrop(224, interpolation=['coords', 'nearest', 'bilinear']),
RandomFlip(input_type=['coords', 'image', 'image']),
ColorAugment(input_type=['coords', 'image', 'image']),
])
boxes, mask, image = transforms([boxes, mask, image]) Handling the coordinates is a bit tricky but the And we don't touch the tensors; the modifications remain in the transforms which I think is really a key point to have. I believe this satisfies all seven criteria above plus the one from @Noiredd about several transforms communicating together without supervision. Now, there is only one transform. The possible values for What do you think? |
Doing some extra thinking. There are at least two ways to resize coordinates: return the exact float values and return the rounded integer values. The first one is akin to bilinear/bicubic interpolation and the second one to nearest neighbor. So maybe we also want to have the |
Hello, it's been a while and since this has not moved on I feel like maybe I voiced my concerns a bit too much. If that's the case, I'm sorry about it. In any case, my point was to make sure that all concerns and alternatives are presented. The above is all I got so @fmassa whatever you decide to go with will be good for me. |
No worries at all, thanks for sharing your thoughts! I've been slow to respond on this one because I've been busy working on other parts of torchvision. I'm going to be getting back to segmentation / detection transforms most probably in January |
One issue when I stumbled on long ago when reading PASCAL segmentation multiclass masks: pytorch/pytorch#5436 |
Hi everyone. I needed this feature a while ago, so I thought to suggest my solution. solution:
Transform structure:For example I choose RandomHorizontalFlip. See the changes I added and the following functionallity: class RandomHorizontalFlip(torch.nn.Module):
def __init__(self, p=0.5):
super().__init__()
self.p = p
self.track=None # added to save the current state
def forward(self, img, enforce=None): # added enforce input
threshold = enforce if enforce is not None else torch.rand(1) # added usage of last state
if threshold < self.p:
self.track = 0 # save the current state
return F.hflip(img)
else:
self.track = 1 # save the current state
return img Functionallity :example code: flip_t = RandomHorizontalFlip(0.5)
img = flip_t(img)
Usage:Uncorrolated transforms: img1 = flip_t(img1)
img2 = flip_t(img2) Same transforms 2 images: img1 = flip_t(img1)
img2 = flip_t(img2, enforce = flip_t.track) Same transforms N images: img_list = [img1, img2, ..., imgN]
img_list[0]= flip_t(img_list[0])
for i in range(1,N):
img_list[i] = flip_t(img_list[i], enforce = flip_t.track) Different hyperparameters:Assume we want to apply RandomRotation on img and mask with different interpolations. img_t = RandomRotation(degrees, resample=PIL.Image.BILINEAR)
mask_t = RandomRotation(degrees, resample=PIL.Image.NEAREST)
img = img_t(img)
mask = mask_t(mask,, enforce = flip_t.track) None:
Custom transformation:Assume def label_custom_flip(label, enforce=None):
if enforce = 1: return label # No flip
new_label = 'right' if label=='left' else 'left' # flip the str label
return new_label
img1 = flip_t(img1) # same as before
label = label_custom_flip(label, enforce = flip_t.track) Compose:We can also adapt the compose interface in the same way. my_transforms = Compose([...])
img1 = my_transforms(img1)
img2 = my_transforms(img2, enforce = my_transforms.tracks) Where
---I hope it is a good solution and you liked my suggestion. |
Continue of my comment above #1406 (comment) @ErezYosef class Compose:
def __init__(self, transforms):
self.transforms = transforms
self.tracks = [None]*len(self.transforms) #added
def __call__(self, img, enforce=None):
if enforce is None: # added if
for t in self.transforms:
img = t(img)
return img
else: # I add this section:
for i, t in enumerate(self.transforms):
img = t(img) if enforce[i] is None else t(img, enforce[i])
self.tracks[i] = t.track if hasattr(t, 'track') else None
return img Please review this implementation also. Conclusion:Basically, I aim to change the naive transforms implementation to a more powerful and robust form:
|
With respect to:
and to elaborate a bit more on what this 'function overloading' using import torch
HANDLED_FUNCTIONS_IMAGE = {}
HANDLED_FUNCTIONS_SEG = {}
def implements_segmentation(torch_function):
"Register an implementation of a torch function for a Tensor-like object."
def decorator(func):
HANDLED_FUNCTIONS_SEG[torch_function] = func
return func
return decorator
def implements_image(torch_function):
"Register an implementation of a torch function for a Tensor-like object."
def decorator(func):
HANDLED_FUNCTIONS_IMAGE[torch_function] = func
return func
return decorator
class SegmentationMask():
def __init__(self, data):
self.data = data
def __torch_function__(self, func, types, args, kwargs):
if func not in HANDLED_FUNCTIONS_SEG:
return NotImplemented
if not all(issubclass(t, self.__class__) for t in types):
return NotImplemented
return HANDLED_FUNCTIONS_SEG[func](self.data, *args[1:], **kwargs)
class Image():
def __init__(self, data):
self.data = data
def __torch_function__(self, func, types, args, kwargs):
if func not in HANDLED_FUNCTIONS_IMAGE:
return NotImplemented
if not all(issubclass(t, self.__class__) for t in types):
return NotImplemented
return HANDLED_FUNCTIONS_IMAGE[func](self.data, *args[1:], **kwargs)
@implements_image(torch.nn.functional.interpolate)
def interpolate_image(*args, **kwargs):
if 'mode' in kwargs:
del kwargs['mode']
return torch.nn.functional.interpolate(*args, **kwargs, mode='linear')
@implements_segmentation(torch.nn.functional.interpolate)
def interpolate_segmentation(*args, **kwargs):
if 'mode' in kwargs:
del kwargs['mode']
return torch.nn.functional.interpolate(*args, **kwargs, mode='nearest')
s = SegmentationMask(
torch.tensor([1,2,3,4,5,6,7,8,9]).float().view(1,3,3)
)
print('segmentation: ', torch.nn.functional.interpolate(s, 2))
s = Image(
torch.tensor([1,2,3,4,5,6,7,8,9]).float().view(1,3,3)
)
print('image: ', torch.nn.functional.interpolate(s, 2)) with results
|
There is an alternative approach covered here: pmeier/torchvision-datasets-rework#1 |
This is a proposal. I'm not sure yet it's the best way to achieve this, so I'm putting this up for discussion.
tl;dr
Have specific tensor subclasses for
BoxList
/SegmentationMask
,CombinedObjects
, etc, which inherits fromtorch.Tensor
and overrides methods to properly dispatch to the relevant implementations. Depends on__torch_function__
from pytorch/pytorch#22402 (implemented in pytorch/pytorch#27064)Background
For more than 2 years now, users have asked for ways of performing transformations of multiple inputs at the same time, for example for semantic segmentation or object detection #9
The recommended solution is to use the functional transforms in this case #230 #1169 , but for simple cases, this is a bit verbose.
Requirements
Ideally, we would want the following to be possible:
Compose
style interface for simple casesProposed solution
We define new classes for each type of object, which should all inherit from
torch.Tensor
, and implement / override a few specific methods. It might depend on__torch_function__
from pytorch/pytorch#22402 (implemented in pytorch/pytorch#27064)Work with
Compose
-styleWe propose to define a
CombinedObjects
(better names welcome), which is a collection of arbitrary objects (potentially named, but that's not a requirement).Calling any of the methods in it should dispatch to the corresponding methods of its constituents. A basic example is below, I'll mention a few more points about it afterwards):
In this way, if the underlying objects follows the same protocol (i.e., implement the required functions), then this should allow to combine an arbitrary number of objects with a
Compose
API viawhich satisfies point 1 and 2 above, and part of point 3 (except for the different transformation hyperparameters for image / segmentation mask, which I'll cover next).
Different behavior for mask / boxes / images
In the same vein as the
CombinedObject
approach from the previous section, we would have subclasses oftorch.Tensor
forBoxList
/SegmentationMask
/ etc which would override the behavior of specific functions so that they work as expected.For example (and using code snippets from pytorch/pytorch#25629), we can define a class for segmentation masks where rotation / interpolation / grid sample always behave with nearest interpolation:
and we can also give custom implementations for bounding boxes:
This would allow to cover the remaining of point 3. Because they are subclasses of
torch.Tensor
, they behave asTensor
except in some particular cases where we override the behavior.This would be used as follows:
Be simple and modular
This is up for discussion. The fact that we are implementing subclasses that do not behave exactly like tensors can be confusing and misleading. But it does seem to simplify a number of things, and makes it possible for users to leverage the same abstractions in torchvision for their own custom types, without having to modify anything in torchvision, which is nice.
Related discussions
Some other proposals have been discussed in #230 #1169 and many other places.
cc @Noiredd @SebastienEske @pmeier for discussion
The text was updated successfully, but these errors were encountered: