-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating mean Average Recall (mAR), mean Average Precision (mAP) and F1-Score #2513
Comments
Hello, did this method work for you? |
Hi @sohinimallick ! |
Big thanks for this! It's working on my end so far. Edit: No it's not, whoops. I'm getting an error when calling
I have a feeling that it's either the fact that I'm using a newer version of TF, or the Here's my code for reference 👇 def evaluate_model(dataset, model, cfg, list_iou_thresholds=None):
if list_iou_thresholds is None: list_iou_thresholds = np.arange(0.5, 1.01, 0.1)
APs = []
ARs = []
for image_id in dataset.image_ids:
image, image_meta, gt_class_id, gt_bbox, gt_mask = modellib.load_image_gt(dataset, cfg, image_id)
scaled_image = modellib.mold_image(image, cfg)
sample = np.expand_dims(scaled_image, 0)
yhat = model.detect(sample, verbose=0)
r = yhat[0]
AP, precisions, recalls, overlaps = utils.compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'], iou_threshold=0.5)
AR = compute_ar(r['rois'], gt_bbox, list_iou_thresholds)
ARs.append(AR)
APs.append(AP)
mAP = mean(APs)
mAR = mean(ARs)
f1_score = 2 * ((mAP * mAR) / (mAP + mAR))
return mAP, mAR, f1_score
evaluate_model(dataset,model,config) |
I'm using the Colab environment for training my models, I run this command (magic cell):
And it returns me the entire environment configured to work with Tensorflow in version 1.15.2. Colab maintains a stable version of Tensorflow 1 and 2. Well, I believe that the version of tensorflow may be the problem, but I also noticed that its function |
@wiktor-jurek I solved this putting USE_MINI_MASK = False in both inference and training |
BTW @WillianaLeite...do you have any suggestions to output number of TP/FP somehow. |
Hello @WillianaLeite I have a question. |
Hello @WillianaLeite I tried the code you wrote in my own work. I have 5 classes in my dataset. Results: |
Bingo. That made the difference. Thanks. |
Hello @sain0722, I had the same question regarding mold_image and [image]. Have you received an answer? Or do you know why it is necessary to mold the images before detection? |
@sain0722 i believe mold_image does the normalization i.e subtract from the mean of the dataset |
The |
Hi anyone know how to calculate the mAP for bbox. Currently, lots of calculations were found focus on instance but for detection, how to do that? Thank you for the help |
Problem is, compute_ap function does it with masks and compute_recall does it with bboxes, it does not work |
@marcojulioarg |
Hey Actually, if you are using matterport TF 2.0 version from here , then you have to add USE_MINI_MASK=False inside config then do the things. |
@sain0722 @CZ2021 I found that in the model.py file in the detect function |
@WillianaLeite Hi! In doing my master thesis I have came to your same problems! I found this issue and I have currently done same path as you! |
I understood the same as you @felipetobars. I think it's not necessary to use mold_image before calling the detection function. I did some testing and got better results by not using the mold_image before. |
I also got better results without using the function @guilhermemarim |
Hello, `def detect(self, images, verbose=0):
why exactly do you mold the image before giving it to the to the detect()-method? The way you do it, the input image gets molded double, right? |
Are you aware that the weighted-average or micro-average recall is just another name for the ordinary accuracy score? Or that the macro-average recall (equal weight per class irrespective of imbalance in number of instances) is just another name for the balanced accuracy score? Compare them for yourself. They match exactly:
|
@WillianaLeite thank you so much, it works on my project |
My question is what needs to be passed into evaluate_model. Thank you. |
Hey @andreaceruti ! Facing the same issue atm - did you get it work with detectron2? thanks in advance :) |
Hi @WillianaLeite , thanks for providing your code. I have a question regarding the compute_ap() function from mrcnn that you maybe know: Does it compute the AP and mAP based on the boxes or based on the segmentation? I made a formula by hand myself, that is using the segmentation and I do get different results from the compute_ap() one built in mrcnn. However I do not know if eventually I am maybe doing something wrong with my function or it they are just using different inputs. Thanks and regards! |
@WillianaLeite I'm sorry but how is the I mean, even if we want to get the AP across all the thresholds, then why don't we use compute_ap_range instead? But back to the main point, the recall formula looks off to me and I would appreciate any response especially if I'm looking at it the wrong way. |
I also just observed that the Would appreciate any response to this and if I'm wrong, please let me know as well. |
Hi @WillianaLeite, I believe that your formula for computing F1-score is not accurate and is not applicable to your model. Mean average precision, or average precision for a single class is computed as an estimate of the area under the precision-recall curve. This unification is done because the precision and recall metrics are inversely proportional and change when you alter the IoU threshold. Furthermore, the F1 score formula is used for binary classification tasks, not for object detection or segmentation. You are better off sticking to mAP and AR scores to compare your different models. |
Hello. from build.lib.mrcnn.model import load_image_gt, mold_image
def evaluate_model(Dataset, model, cfg, list_iou_thresholds=None):
evaluate_model(Dataset,model,config)` Tips for reporting errors: |
Hi guys!
I've been looking for a long time to find the correct way to calculate the F1-Score using the lib Mask-RCNN. I created several issues 2178, 2165, 2187, 2189, studied for a long time and I believe I found the right form. Before presenting the code used, let's go to the settings I used.
Calculating mean Average Precision (mAP)
To calculate the mAP, I used the compute_ap function available in the utils.py module. For each image I call the compute_ap function, which returns the Average Recall (AR) and adds it to a list. After going through all the images, I average the Average Recalls.
Where the parameters:
Calculating mean Average Recall (mAR)
To calculate the mAR I used the post An Introduction to Evaluation Metrics for Object Detection as a mathematical basis.
The calculation of the mAR is similar to the mAP, except that instead of analyzing precision vs recall, we analyze the recall behavior using different iou thresholds. In the post Average Recall it is defined as:
In the code what we need to do is create a function that calculates the Average Recall, and then we follow with the approach similar to mAP, we will go through each of the images, calculate their Average Recall, add it to a list and at the end we make an average and we find the mAR.
Basically, we are calling the compute_recall function of the utils.py module for each of the thresholds that we define in the formula.
Where,
pred_boxes: Are the coordinates of the expected bounding box;
gt_boxes: Are the coordinates of the actual bounding box;
list_iou_thresholds: List of thresholds that will be used.
Now let's add mAR to our evaluate_model function.
Calculating F1-Score
Now that we know our mAP and mAR, just apply the f1-score formula. Let's add the f1-score formula to our evaluate_model function.
This was the way I found to calculate mAP, mAR and f1-score, what did you think? I believe that I am on the right path, I am not an expert in the area and I had a lot of difficulty in reaching this result, I accept any type of feedback. I hope to contribute in some way!
The text was updated successfully, but these errors were encountered: