Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier trainin - Mosaic data augmentation #4432

Closed
AlexeyAB opened this issue Dec 2, 2019 · 31 comments
Closed

Classifier trainin - Mosaic data augmentation #4432

AlexeyAB opened this issue Dec 2, 2019 · 31 comments

Comments

@AlexeyAB
Copy link
Owner

AlexeyAB commented Dec 2, 2019

Related to: #4264

Use for Classifier training:

Run training with flag -show_imgs to see how images are changed (in separate windows and saved to files aug_... .jpg) and how labels are changed (see the console).

image

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 2, 2019

@WongKinYiu I also implemented Mosaic data augmentation for Classifier.

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Dec 2, 2019

@AlexeyAB no available gpu 🐧🐧🐧
which combination of data augmentation is suggested?

@Look4-you
Copy link

Look4-you commented Dec 3, 2019

@AlexeyAB Hi.

  1. Is it to randomly select one of these data enhancement methods (like hue, mosaic, saturation) to process each batch of images when training?

  2. Besides, the jitter is set at each [yolo] layer, is it aim at feature map?

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 3, 2019

@WongKinYiu

May be cutmix=1 mosaic=1


I would recommend to train the smallest model (for quick comparison) using alternately one approach and compare accuracy %Top1 gain:

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 3, 2019

@Look4-you

Is it to randomly select one of these data enhancement methods (like hue, mosaic, saturation) to process each batch of images when training?

Every data augmentation method occurs randomly if enabled.


Besides, the jitter is set at each [yolo] layer, is it aim at feature map?

random= and jitter= are used ony from the last [yolo]-layer.
jitter - resizes input image
random - resizes network

@Look4-you
Copy link

@AlexeyAB Thanks a lot!

@WongKinYiu
Copy link
Collaborator

I would recommend to train the smallest model (for quick comparison) using alternately one approach and compare accuracy %Top1 gain:

@AlexeyAB OK,

I will do experiments for smallest model first.
Thanks for your suggestion.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 3, 2019

@WongKinYiu Also you can try to train with large mini_batch (if you have 32-256 GB CPU-RAM) #4386

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Dec 3, 2019

@AlexeyAB Hmm... I think my GPU scheduling are full until next year.

I will do more comparison if I can borrow more gpus/machines.
Currently I borrow a Titan RTX and a Tesla V100 to compare with the results of my Titan X and 1080 ti of different mini batch size.

I think I can do an experiment with large mini_batch using single 1080ti and 64 GB CPU-RAM next week.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 3, 2019

@WongKinYiu If you use CPU-RAM #4386 for increasing mini_batch size, then there is a bottleneck in PCIe, so it doesn't require high-end GPU, you can use GTX 1060/1070. So if you have 128-256 GB CPU-RAM then you can set mini_batch size 4-8x large than on Titan RTX 24 GB or Tesla V100 16/32 GB.

@WongKinYiu
Copy link
Collaborator

@AlexeyAB I know, but I would like to make it has only one control factor.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 3, 2019

@WongKinYiu

I know, but I would like to make it has only one control factor.

What is the control factor?

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Dec 3, 2019

@AlexeyAB I hope the only difference is mini_batch size, so I want to run the experiment on same machine, same gpu, .... (maybe it is controlled variable?)

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 3, 2019

@WongKinYiu

I hope the only difference is mini_batch size, so I want to run the experiment on same machine, same gpu, .... (maybe it is controlled variable?)

Yes.
Also if you use CPU-RAM + GPU-processing then this is still a controllable factor.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 4, 2019

@WongKinYiu Also I added blur=1 for training Classifier: #3320 (comment)

@Look4-you
Copy link

Look4-you commented Dec 6, 2019

@AlexeyAB Hi.
How random works to resizing the network in the last [yolo]-layer.
I know that random=1 - randomly resizes network for each 10 iterations from 1/1.4 to 1.4(data augmentation parameter is used only from the last layer) , but how?

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Dec 6, 2019

@Look4-you This thread is for Mosaic data augmentation for Classifier, not for random=, which can be used only for Detector. Create new issue.

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Dec 22, 2019

@AlexeyAB

Can I set

mosaic=1
cutmix=1
blur=1
label_smooth_eps=0.1

@AlexeyAB
Copy link
Owner Author

@WongKinYiu Yes.

You can set:

  • for Classifier:
mosaic=1
cutmix=1
blur=1
lable_smooth_eps=0.1
  • for Detector:
mosaic=1
blur=1
lable_smooth_eps=0.1

@WongKinYiu
Copy link
Collaborator

@AlexeyAB thank u very much.

@AlexeyAB
Copy link
Owner Author

@WongKinYiu

Also try to train Classifier with

[net]
mosaic=1

but change this line:

d.y.vals[i][j] = d.y.vals[i][j] * s1 + d2.y.vals[i][j] * s2 + d3.y.vals[i][j] * s3 + d4.y.vals[i][j] * s4;

to these 2 lines and recompile:

const float max_s = max_val_cmp(s1, max_val_cmp(s2, max_val_cmp(s3, s4)));
d.y.vals[i][j] = d.y.vals[i][j] * s1 / max_s + d2.y.vals[i][j] * s2 / max_s + d3.y.vals[i][j] * s3 / max_s + d4.y.vals[i][j] * s4 / max_s;

@WongKinYiu
Copy link
Collaborator

@AlexeyAB

Hello,

The memory leak problem is very serious when i set

mosaic=1
cutmix=1
blur=1
lable_smooth_eps=0.1

or even

mosaic=1
cutmix=1

or even disable OPENCV with above setting

i try to modify the code https://github.com/AlexeyAB/darknet/blob/master/src/data.c#L1510
but not solve the problem.

@WongKinYiu
Copy link
Collaborator

https://github.com/AlexeyAB/darknet/blob/master/src/data.c#L1531
this line will check if mixup mode is set as:

  • mosaic = 1 (mixup mode 3)
  • mosaic = 1, cutmix = 1 (mixup mode 4)

however, https://github.com/AlexeyAB/darknet/blob/master/src/data.c#L1548
this line will re-assign mixup mode:

  • mosaic = 1 (mixup mode 3)
  • cutmix = 1 (mixup mode 2)

https://github.com/AlexeyAB/darknet/blob/master/src/data.c#L1640
so the free_data in this line is not expected:

  • d3 and d4 have content even though mixup mode is re-assigned as 2

so i change this part from

        if (mixup == 3) {
            free_data(d3);
            free_data(d4);
        }

to

        free_data(d3);
        free_data(d4);

but I think the better way is to make a copy of original mixup mode.

@AlexeyAB
Copy link
Owner Author

@WongKinYiu Hi, Thanks!

I fixed this bug: b8605bd

It seems that mosaic give Top1/Top2 improvement: https://github.com/WongKinYiu/CrossStagePartialNetworks

Also did you try to use mosaic with such modification? #4432 (comment)

@WongKinYiu
Copy link
Collaborator

not yet, i m trying to solve the problem of optimized_memory = 1.

the memory usage of optimized_memory = 1 is as follows:
image
however, the expected is:
image

@AlexeyAB
Copy link
Owner Author

@WongKinYiu

Are you planning to use optimized_memory = 1, or optimized_memory = 3 ?


I didn't find very simple solution for this.

  • or we should pass net.optimized_memory parameter to the make_...() function of each layer (make_convolutional(), make_shortcut(), ...) to suppress memory allocation for output_gpu, delta_gpu, activation_input_gpu

  • or we just can free these arrays after these make_...() functions - but there will be such a surge in memory consumption:

    darknet/src/parser.c

    Lines 1263 to 1294 in b8605bd

    // futher GPU-memory optimization: net.optimized_memory == 2
    if (net.optimized_memory >= 2 && params.train && l.type != DROPOUT)
    {
    l.optimized_memory = net.optimized_memory;
    if (l.output_gpu) {
    cuda_free(l.output_gpu);
    //l.output_gpu = cuda_make_array_pinned(l.output, l.batch*l.outputs); // l.steps
    l.output_gpu = cuda_make_array_pinned_preallocated(NULL, l.batch*l.outputs); // l.steps
    }
    if (l.activation_input_gpu) {
    cuda_free(l.activation_input_gpu);
    l.activation_input_gpu = cuda_make_array_pinned_preallocated(NULL, l.batch*l.outputs); // l.steps
    }
    if (l.x_gpu) {
    cuda_free(l.x_gpu);
    l.x_gpu = cuda_make_array_pinned_preallocated(NULL, l.batch*l.outputs); // l.steps
    }
    // maximum optimization
    if (net.optimized_memory >= 3 && l.type != DROPOUT) {
    if (l.delta_gpu) {
    cuda_free(l.delta_gpu);
    //l.delta_gpu = cuda_make_array_pinned_preallocated(NULL, l.batch*l.outputs); // l.steps
    //printf("\n\n PINNED DELTA GPU = %d \n", l.batch*l.outputs);
    }
    }
    if (l.type == CONVOLUTIONAL) {
    set_specified_workspace_limit(&l, net.workspace_size_limit); // workspace size limit 1 GB
    }
    }

@WongKinYiu
Copy link
Collaborator

@AlexeyAB Thanks,

currently i also do not find a good way to deal with it.
ok, i ll take a look modified mosaic first.

@AlexeyAB
Copy link
Owner Author

@WongKinYiu

optimized_memory = 1 optimizes memory consumption very poorly.

Anyway, for significant optimization, you should use optimized_memory = 3 and in this case, the CPU-memory consumption will be much more important than the GPU-memory consumption, and this issue (surge in consumption) will not be so significant.

@WongKinYiu
Copy link
Collaborator

#4432 (comment) got nan

@BernoGreyling
Copy link

Hi @AlexeyAB,

I think there might be a bug with mosaic flag on a Detector with the combination of settings I might be using. The bounding boxes overlap from one image to another for example :

image

My config has the following [net] params:
`[net]

Testing

#batch=1
#subdivisions=1

Training

batch=64
subdivisions=16
width=736
height=1280
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
mixup=0
mosaic=1
#blur=1
letter_box=1`

Is mosaic supposed to work with detectors or only for classifiers? From the top discussion it looks like it should work?

Thanks!

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Jan 4, 2020

@BernoGreyling

  • Do you use the latest repository?
  • Do you get this issue for all images or only for some images?
  • mosiac=1 is supported for both Classifier and Detector and it improves accuracy, there is separate issue for mosaic for Detector: Detector - Mosaic data augmentation  #4264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants