Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training becomes very slow with these transforms. #426

Open
shivanference opened this issue Feb 29, 2024 · 10 comments
Open

Training becomes very slow with these transforms. #426

shivanference opened this issue Feb 29, 2024 · 10 comments

Comments

@shivanference
Copy link

I am training a CNN model with Augraphy and some other transforms. When I include just 4-5 Augraphy transforms with 10-20% probability, my training becomes ~10 times slower.

When I checked the htop, I noticed that load average was shooting up very high when using these transforms.

I tried to do few things but nothing helped in speeding the training, such as reducing num_workers, etc.

Please guide me on how can I overcome this issue.

@kwcckw
Copy link
Collaborator

kwcckw commented Mar 1, 2024

Hi, could you include the snippet of code on how you using Augraphy in your training?

@shivanference
Copy link
Author

Sure, following is the flow.

Importing transforms

from augraphy.augmentations import ShadowCast, ReflectedLight, Folding....

Creating a wrapper around it

class GenericTransforms:
    def __init__(self, transform, prob):
        self.transform = transform
        self.prob = prob

    def __call__(self, img):
        if torch.rand(1) < self.prob:
            return self.transform(img)
        return img
  
Folding_ = GenericTransforms(Folding(), prob=0.05)
ReflectedLight_ = GenericTransforms(ReflectedLight(), prob=0.1)
ShadowCast_ = GenericTransforms(ShadowCast(), prob=0.2)
# ....

Then I compose the transforms using torchvision.transforms

def _compose_transforms_from_config(transform_config):

    preprocessing_transforms = []
    transform_map = {'ReflectedLight': ReflectedLight_, 'Folding': Folding_, 'ShadowCast': ShadowCast_,...}
    for transform in transform_config:
        trans_type = transform_map[transform['Type']]
        transform_instance = trans_type(**transform['Kwargs'])
        preprocessing_transforms.append(transform_instance)
    preprocess_transforms = transforms.Compose(preprocessing_transforms)
    return preprocess_transforms

Then these transforms are used in dataset class.

def __getitem__(self, idx):
    img = self.read_image(idx)
    x = self.transforms(ecg)
    return x, self.targets[idx]

Note: I am using other 5-6 custom transforms in same manner. But as I include augraphy transforms, training becomes too slow with load average shooting up. But ram is under control. I am training on a 20 core machine.

Please let me know of any other information is required.

@kwcckw
Copy link
Collaborator

kwcckw commented Mar 1, 2024

By looking at the benchmark results:
https://github.com/sparkfish/augraphy/tree/dev/benchmark

ReflectedLight is one of the slowest augmentation. I think you can try to remove that and see if the speed have increased? If speed is a concern, you may consider using only those augmentations with higher value of Img/sec.

@shivanference
Copy link
Author

I did that, I only considered the augmentation, for which Img/sec was more than or around 1. But did not help much in speeding.

@kwcckw
Copy link
Collaborator

kwcckw commented Mar 1, 2024

I did that, I only considered the augmentation, for which Img/sec was more than or around 1. But did not help much in speeding.

So probably you can let me know roughly your image size? Then i can try to reproduce this with the code above from my end too.

@shivanference
Copy link
Author

Thanks. The image size is around 900x1100.

@shivanference
Copy link
Author

I have narrowed down the issue.

As I mentioned, load average shoots up with these transforms. When I limit the threads used by Numpy and Opencv to 1, then the transforms run 5-6 times faster.

os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
cv2.setNumThreads(1)

But the thread limiting is somehow not working for the subprocesses. Hence fetching data from dataloader is still slow as it spawns multiple workers.

@kwcckw
Copy link
Collaborator

kwcckw commented Mar 2, 2024

When I limit the threads used by Numpy and Opencv to 1, then the transforms run 5-6 times faster.

Okay, and looks like your provided code above is not complete, what would be this transform_config?

Then in x = self.transforms(ecg), preprocess_transforms is used?

In your 20 cores machine, you uses multi GPUs too?

@shivanference
Copy link
Author

A minor correction in the code:

This is how the transforms are defined:

class Moire_:

    def __init__(self, prob=0.15):
        self.moire = Moire()
        self.prob = prob

    def __call__(self, img):
        if torch.rand(1) < self.prob:
            return self.moire(img)
        return img

This is how transform config look like:

- Type: ReflectedLight
  Kwargs: {}
- Type: DirtyDrum
  Kwargs: {}
- Type: Folding
  Kwargs: {}

We use 2 GPUs to train.

@kwcckw
Copy link
Collaborator

kwcckw commented Mar 2, 2024

I tried with colab with image size of (1100,1100,3) but i only see an increase of 30% processing time with Augraphy, and that is with probability of 1 with all 3 augmentations.

Here's the notebooks:
https://drive.google.com/drive/folders/1kaUWqVY5xKhKzDJP2zyiDoyOWtgtpVQU?usp=sharing

Probably there's some overhead in the custom augmentation functions in multi gpu or multi cores. Have you try with other created augmentation function instead of Augraphy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants