Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Why I got the map value -1 #43

Closed
auroua opened this issue Oct 27, 2018 · 6 comments
Closed

Why I got the map value -1 #43

auroua opened this issue Oct 27, 2018 · 6 comments

Comments

@auroua
Copy link

auroua commented Oct 27, 2018

❓ Questions and Help

After training on coco2017 dataset, I got the following output:

2018-10-26 07:57:49,972 maskrcnn_benchmark.inference INFO: Total inference time: 0:30:09.844855 (0.08900146815007108 s / img per device, on 2 devices)
2018-10-26 07:57:57,928 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2018-10-26 07:57:57,928 maskrcnn_benchmark.inference INFO: Preparing bbox results
2018-10-26 07:58:06,302 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=7.88s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=84.73s).
Accumulating evaluation results...
DONE (t=22.14s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
2018-10-26 08:00:20,193 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', -1.0), ('AP50', -1.0), ('AP75', -1.0), ('APs', -1.0), ('APm', -1.0), ('APl', -1.0)]))])

I used the file e2e_faster_rcnn_R_50_FPN_1x.yaml, and modified the lr to 0.01.

@fmassa
Copy link
Contributor

fmassa commented Oct 27, 2018

Hum, this is weird.

Can you show me the full command that you used for running the experiment?
Also, how did you modify the paths_catalog to load the coco2017 instead of the coco2014?

Thanks!

@auroua
Copy link
Author

auroua commented Oct 28, 2018

The paths_catalog.py contains a mistake, I have fixed and got a result, but the map is only around 22. I trained on coco_train2017 and tested on coco_val 2017. I used the default config in file e2e_faster_rcnn_R_50_FPN_1x.yaml. I changed the lr to 0.005, and MAX_ITER to 50000. I trained on two gpus 2 images per gpu. How could I get the reported map. Thanks~

@fmassa
Copy link
Contributor

fmassa commented Oct 28, 2018

Hi,

In order to reproduce the results on fewer GPUs than 8, you'll need indeed to change the learning rate (which is good in your case), but also the number of iterations should be increased from the default by a factor of 4x, as well as the learning rate schedules.
So you should have 90000 * 4= 360000 iterations, and you need to change the lr schedules to be [240000, 320000].

Check the README on the single GPU training sections for more informations.

I'm closing the issue ad it doesn't seem to be a bug, but please let me know if you have other questions

@fmassa fmassa closed this as completed Oct 28, 2018
@auroua
Copy link
Author

auroua commented Oct 28, 2018

Thanks for your kindly replay.

@auroua
Copy link
Author

auroua commented Nov 1, 2018

I followed you advice and got the following results on coco2017 val dataset:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.587
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.405
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.401
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.485
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.483
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.508
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643

Thanks!

@gongzaizhou
Copy link

Hi,

In order to reproduce the results on fewer GPUs than 8, you'll need indeed to change the learning rate (which is good in your case), but also the number of iterations should be increased from the default by a factor of 4x, as well as the learning rate schedules.
So you should have 90000 * 4= 360000 iterations, and you need to change the lr schedules to be [240000, 320000].

Check the README on the single GPU training sections for more informations.

I'm closing the issue ad it doesn't seem to be a bug, but please let me know if you have other questions

I also have same problem。
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_faster_rcnn_R_101_FPN_1x.yaml" OUTPUT_DIR "./save_models/8gpus_101/" 2>&1 | tee train_8-101.log &

paths_catalog.py

What the paths_catalog.py mistake?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants