-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random transforms for both input and target? #9
Comments
hmm, good point. For example:
|
Ewww, that's a fragile ugly hack. I also don't see how it will help, since the target isn't an image. Can't you just extend the CocoDataset? Generate the random parameters and then apply the same logical op to both image and target: class TransformedCocoDataset(CocoDataset):
def __getitem__(self, i):
input, target = super(TransformedCocoDataset, self).__getitem__(i)
hflip = random.random() < 0.5
if hflip:
image.hflip(image)
# flip target rectangle
# etc.
return input, target Perhaps we should provide similar operations that work on bounding boxes? |
Oh -- I misread the issue. I guess the target is an image for segmentation. It still seems to me like you want to generate the random parameters once and apply the operation twice. For things that aren't trivial (like RandomSizedCrop), we may want to refactor out the part that generates the random parameters to the image op. Probably don't have to do anything for trivial ops like horizontal-flip. |
I think the solution proposed by @colesbury about sub-classing on the dataset is the most general one. Also, the current way of passing As such, are you ok if we merge |
The way I see it in @colesbury code, we will have the same probleme when trying to compose different transform functions, because random parameters are created within the call function. we won't be able to customize transform functions, and will have to create a subdataset per set of transform functions we want to try. What about special transformation for both imputs and targets ? This may create some duplicates functions like randomcrop for image based target (maybe add target as an optional argument ?) , but i don't see how we would apply properly coherent transformations for input and target. We could also give a seed as an argument in addition to img in getitem , but we are not garanteed I feel like there should 3 types of transform : |
I actually just ran into this problem myself. Another potential solution is to allow transforms to perform the same operation over a list of images. Thus, CocoDataset would look like: if self.transform is not None:
img, label = self.transform(img, label) and the transform itself may look like: def __call__(self, *images):
# perform some check to make sure images are all the same size
if self.padding > 0:
images = [ImageOps.expand(img, border=self.padding, fill=0) for im in images]
w, h = images[0].size
th, tw = self.size
if w == tw and h == th:
return images
x1 = random.randint(0, w - tw)
y1 = random.randint(0, h - th)
return [img.crop((x1, y1, x1 + tw, y1 + th)) for img in images] It may be possible to further abstract this out and create a container class for an image that automatically applies the same operations across the collection. That way the operations could be agnostic to what they are actually operating on. |
I gave it a shot on implementing some generic random transform that could be applied so several inputs (even for example images + bounding boxes). An example implementation can be found here.
It's a bit rough (and maybe a bit too complicated), but it could handle the cases we have mentioned in this thread. |
@fmassa does your proposal also consider transforms which depend on input and target image? This would be helpful when you have for example want to crop your input image to the size of the target image. |
@bodokaiser Yes, this should be handled as well. class MyJointOp(object):
def __call__(self, input, target):
# perform something on input and target
return input, target and then you would use it as follows (using my transforms), supposing that your dataset outputs flip_gen = mytransforms.RandomFlipGenerator()
mytransforms.Compose([
MyJointOp(), # take all inputs/targets
[transforms.ColorAugmentation(), None], # Color augmentation in input, no operation in target
flip_gen, # get a random seed for the flip transform and returns the identity
[transforms.RandomFlip(flip_gen), transforms.RandomFlip(flip_gen)], # apply the same flip to input and target
...
]) |
@fmassa Do you plan to merge your implementation into torchvision? |
@bodokaiser for the moment I'll only merge the part that separates the random parameters from the transforms, so that you can apply the same random transform to different inputs / targets. |
@fmassa I understand the complexity argument moreover supporting same random parameters for both transforms is already a huge plus. I think for my problems I will end up just moving some preprocessing to |
Hi guys, for the same issue, I proposed a similar solution by sharing the seed across input and target transform functions in Keras: keras-team/keras#3338 . As mentioned above, that's too complicated to handle, my implementation is buggy and not thread-safe(the input and target can mis-synchronised). When moving to torch, I found the easy and robust way to implementing this kind of transformations is by merging and splitting image channels. The idea is to use one transform for handling both input and target images, by using Here is my implementation: transform = EnhancedCompose([
Merge(), # merge input and target along the channel axis
ElasticTransform(),
RandomRotate(),
Split([0,1],[1,2]), # split into 2 images
[CenterCropNumpy(size=input_shape), CenterCropNumpy(size=target_shape)],
[NormalizeNumpy(), None],
[Lambda(to_tensor), Lambda(to_tensor)]
]) class EnhancedCompose(object):
"""Composes several transforms together, support separate transformations for multiple input.
"""
def __init__(self, transforms):
self.transforms = transforms
def __call__(self, img):
for t in self.transforms:
if isinstance(t, collections.Sequence):
assert isinstance(img, collections.Sequence) and len(img) == len(t), "size of image group and transform group does not fit"
tmp_ = []
for i, im_ in enumerate(img):
if callable(t[i]):
tmp_.append(t[i](im_))
else:
tmp_.append(im_)
img = tmp_
elif callable(t):
img = t(img)
elif t is None:
continue
else:
raise Exception('unexpected type')
return img
class Merge(object):
"""Merge a group of images
"""
def __init__(self, axis=-1):
self.axis = axis
def __call__(self, images):
if isinstance(images, collections.Sequence) or isinstance(images, np.ndarray):
assert all([isinstance(i, np.ndarray) for i in images]), 'only numpy array is supported'
shapes = [list(i.shape) for i in images]
for s in shapes:
s[self.axis] = None
assert all([s==shapes[0] for s in shapes]), 'shapes must be the same except the merge axis'
return np.concatenate(images, axis=self.axis)
else:
raise Exception("obj is not a sequence (list, tuple, etc)")
class Split(object):
"""Split images into individual images
"""
def __init__(self, *slices, **kwargs):
assert isinstance(slices, collections.Sequence)
slices_ = []
for s in slices:
if isinstance(s, collections.Sequence):
slices_.append(slice(*s))
else:
slices_.append(s)
assert all([isinstance(s, slice) for s in slices_]), 'slices must be consist of slice instances'
self.slices = slices_
self.axis = kwargs.get('axis', -1)
def __call__(self, image):
if isinstance(image, np.ndarray):
ret = []
for s in self.slices:
sl = [slice(None)]*image.ndim
sl[self.axis] = s
ret.append(image[sl])
return ret
else:
raise Exception("obj is not an numpy array") Also note that, by doing this, I would propose to implement all the transformations with numpy and scipy.ndimage in torchvision, which is more powerful than PIL, and also to get rid of the limitations on the channel number and image mode that PIL can handle. And this implementation can also support more than two image pairs, meaning sometimes we can have multiple inputs and outputs. For example, some kind of sample weight map which need to be transformed at the same time. I have been using my implementation for a while and it helped me a lot. |
@oeway This idea is great for my current project that requires uncertain number of targets for an image. The transform functions seem to be different from that in torchvision. Could you provide the whole implementation for this project. I would like to give it a try. |
@Iwye Thanks for your interest. The interface of transform functions are the same as torchvision ( |
@oeway Great. It will be more flexible to operate on numpy in my case. Look forward to it. |
So does anyone have any ideas about how to perform this transform.ColorAugmentation()? Any links of unmerger PR / fork is OK. Thanks in advance. |
@catalystfrank you can find color augmentation transforms in here. |
@oeway one issue I see with that approach concerns the image-resizing random transforms: while the input image typically use bilinear transformation, the discretely-labelled target uses a neirest-neighbour assignment. |
Once we can process different channels in one function, we can make a
dedicated transform for that, for example, we can pass a list of
interpolation methods for each channel or one method for all channels .
That shouldn't be an issue.
…On Tue, 18 Apr 2017 at 01:50 Maxim Berman ***@***.***> wrote:
@oeway <https://github.com/oeway> one issue I see with that approach
concerns the image-resizing random transforms: while the input image
typically use bilinear transformation, the discretely-labelled target uses
a neirest-neighbour assignment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAdNy-25hYKH1ChjJpUthBw2eVOvQEE-ks5rw_q-gaJpZM4K9yvA>
.
|
@lwye and others interested in my solution, here is a standalone image transform module for dense prediction tasks, I used for my project, in the end you can find some code shows how to use the module: It's compatible with torch's transform function interface, but there is no dependency on torch functions, so you can use it with other DL libraries as well. You may find bug or have ideas for improvement, in that case, please comment in the gist or here. |
any progress for this thread? @oeway 's method is great for e.g. segmentation task but for detection where targets are not images, I think @fmassa 's proposal about using list of transforms is more general. Will that (or something similar) be merged to core? Or as mentioned it is up to users to subclass their datasets? |
@blackyang I need to write some tests to #115 to verify if it works properly even in multi-threaded settings, but I've been lacking time to do it lately. |
EDIT: modified to work with torchvision 0.7 I've solved this issue this way in my cityscape dataset wrapper: def __getitem__(self,index):
img = Image.open(self.data[index]).convert('RGB')
target = Image.open(self.data_labels[index])
seed = np.random.randint(2147483647) # make a seed with numpy generator
random.seed(seed) # apply this seed to img tranfsorms
torch.manual_seed(seed) # needed for torchvision 0.7
if self.transform is not None:
img = self.transform(img)
random.seed(seed) # apply this seed to target tranfsorms
torch.manual_seed(seed) # needed for torchvision 0.7
if self.target_transform is not None:
target = self.target_transform(target)
target = torch.ByteTensor(np.array(target))
return img, target |
In my segmentation application I have made some transformations functions that either accept an (image, label) pair, or a single (image), and that also work with (transormation_image, transormation_label) pairs, allowing me to use
which expect an (image, label) pair as input. I wrote the code without wanting to be generic so it needs some work, but maybe its a direction to pursue, rather than concatenating the image and label along a channel. |
Another important feature that would be nice to have is to have a parameter for constructors or for the entire transformation pipeline that would accept a RandomState instead of setting global seeds or using global random generators. This is a very common design mistake, also seen on Keras and many other frameworks, where they keep setting global random seed everywhere. |
For whom is still interested: I wrote a little hack that augment two images with the same transformation:
setting the same seed of np.random will give you the same uniformly sampled values before the two color-jitters are applied. EDIT: apparently newer versions of TorchVision have started to use
|
@lpuglia It just should be noted that this will give you the same random seed in each worker See also here: |
It does not work for me. The colorjitter still performs different for multiple images. why? |
Figured it out. Another random seed also needs to be set.
|
I find this helpful in image segmentation, https://github.com/albu/albumentations/blob/master/notebooks/example_kaggle_salt.ipynb |
Hello, I would really think that if each transform had its own RNG object, we could provide it with a seed and that would solve the problem. Something along the lines of
Here is a more complete code example |
|
Another possibility without having to change the seed it to save and load the state. In this example I transform both the origin and the target applying the same instance of a random transformation. For me the only seed making a change was the
|
Add test_fcos_resnet50_fpn_expect.pkl
I think I have a simple solution:
|
* [llm] Init draft NVIDIA reference * [LLM] Add exact HPs used to match NVIDIA's convergence curves * [LLM] Add data preprocessing steps and remove dropout * [LLM] fix eval, add ckpt load util, remove unnecessary files * [LLM] Update data preprocessing stage in README * Full validation and google settings * Apply review comments * Anmolgupt/nvidia llm reference update (pytorch#3) * Update Nvidia LLM reference code version Co-authored-by: Anmol Gupta <[email protected]> * fixes to imports (pytorch#5) Co-authored-by: Anmol Gupta <[email protected]> * distributed checkpoint and mlperf logger support (pytorch#6) * readme and mllogger keywords update (pytorch#7) Co-authored-by: Anmol Gupta <[email protected]> * Update fp32_checkpoint_checksum.log * Update README.md * Update README.md * Update README.md * mlperf logger keywords update (pytorch#8) Co-authored-by: Anmol Gupta <[email protected]> * [LLM] Create framework folder * [LLM] Update README to follow reference template * Describe LLM checkpoint format in README (pytorch#9) Describe LLM checkpoint format in README * [LLM] Readme updates, small fixes * readme update and run script eval update (pytorch#10) Co-authored-by: Anmol Gupta <[email protected]> --------- Co-authored-by: Mikołaj Błaż <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: Anmol Gupta <[email protected]> Co-authored-by: mikolajblaz <[email protected]>
In some scenarios (like semantic segmentation), we might want to apply the same random transform to both the input and the GT labels (cropping, flip, rotation, etc).
I think we can get this behaviour emulated in a segmentation dataset class by resetting the random seed before calling the transform for the labels.
This sound a bit fragile though.
One other possibility is to have the transforms accept both inputs and targets as arguments.
Do you have any better solutions?
The text was updated successfully, but these errors were encountered: