Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between ultralytics v4.0 predictions and model created from update_module_state_from_ultralytics #114

Closed
Tomakko opened this issue Jun 11, 2021 · 13 comments · Fixed by #117 or #118
Labels
bug / fix Something isn't working question Further information is requested

Comments

@Tomakko
Copy link
Contributor

Tomakko commented Jun 11, 2021

🐛 Bug

Hi! :)

Im am currently training a model in ultralytics v4.0 and want to compile in to TVM with the intermediate step through yolov5 rt. I am comparing the outputs of the ultralytics v4.0 and the model created from update_module_state_from_ultralytics for the same image.

To Reproduce (REQUIRED)

Steps to reproduce the behavior:

  1. Load ultralytics model:
from models.experimental import attempt_load
from utils.general import check_img_size, non_max_suppression, scale_coords

model = attempt_load('.last.pt',map_location='cpu')  # load FP32 model
model.eval()

with torch.no_grad():
    pred = model(torch.from_numpy(img)[0]


# Apply NMS
conf_thres = 0.1
iou_thres = 0.45
pred = non_max_suppression(pred, conf_thres, iou_thres, agnostic=True)[0]
print (pred)
#[tensor([[2.11597e+02, 1.37264e+02, 3.22576e+02, 2.64709e+02, 2.02256e-01, 5.00000e+00],
        [2.87690e+02, 1.53040e+02, 3.20757e+02, 2.62483e+02, 1.43088e-01, 5.00000e+00],
        [2.90143e+02, 1.66800e+02, 3.20759e+02, 2.59762e+02, 1.10790e-01, 5.00000e+00]])]
  1. Converted model
model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0',num_classes=10,custom_path_or_model='.last.pt',set_fp16=False)
model.eval()
with torch.no_grad():
    pred = model(torch.from_numpy(img))
print(pred )
#(tensor([[139.78711,  49.54448, 394.38562, 352.42865],
        [266.29352,  77.71037, 342.15277, 337.81256],
        [283.28748, 115.52697, 326.03149, 323.86658]]), tensor([0.20226, 0.14309, 0.06404]), tensor([5, 5, 5]))
  1. Compare predictions

Somehow the class and scores seems to be the same (at least for the first two bboxes), but the bboxes itself have different coordinates. Do you have an idea where the mismatch lies? As far as i can see, ultralytics does not do any other post processing than the non_max_suppression. Do you have a test script where you tested and compared outputs of update_module_state_from_ultralytics?

Thanks!

Edited by @zhiqwang for code formatting

@Tomakko Tomakko added the bug / fix Something isn't working label Jun 11, 2021
@zhiqwang
Copy link
Owner

zhiqwang commented Jun 11, 2021

Hi @Tomakko

Thanks for reporting this bugs. There is a similar bug about this in #92, you can check it for more details.

And this notebook is used to verify the inference result with ultralytics/yolov5, but it only works with releases 0.2.x, actually there is a plan to update this notebook to 0.4.x.

Currently I can’t ensure the time to complete this, but I promise to complete this as soon as possible, and any contributions are welcome here.


EDIT: the updated notebook is at here.

@zhiqwang zhiqwang self-assigned this Jun 11, 2021
@Tomakko
Copy link
Contributor Author

Tomakko commented Jun 14, 2021

Hi @zhiqwang,

in the previous discussions it was reported that the outputs look good in general, just that there are some minor differences. However, in my case, the bbox coordinates are substantially different.

Is the approach under point 2. in my original post above, still the way to go with the master branch of yolov5-rt-stack or do i need to revert to v0.2?

Just for compeleteeness, i am feeding in an example image as follows:

img_raw = cv2.imread(path)
img = letterbox(img_raw, new_shape=640)[0]
img = read_image_to_tensor(img, False)
img = img.to('cpu')

with torch.no_grad():
    model_out = model(img[None])

@Tomakko
Copy link
Contributor Author

Tomakko commented Jun 14, 2021

I have also tried loading the model this way:

model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0',num_classes=10,custom_path_or_model='.last.pt',set_fp16=False)
torch.save(model.state_dict(), 'yolov5s_updated.pt')

from yolort.models import yolov5s

model = yolov5s(pretrained=False, score_thresh=0.10, num_classes=10)
ckpt = torch.load('yolov5s_updated.pt', map_location='cpu')
model.model.load_state_dict(ckpt)

model.eval()
model = model.to('cpu')

However, with this workflow the outputs are completely different compared to the ultralytics model, including the scores.

@zhiqwang
Copy link
Owner

Hi @Tomakko

You can just use the master branch, we will upload an example tutorial to show what happened while updating weights of Ultralytics/YOLOv5 to yolort.

@Tomakko
Copy link
Contributor Author

Tomakko commented Jun 14, 2021

Hi @zhiqwang,

really looking forward to it!

I am currently on your master branch.

@zhiqwang
Copy link
Owner

zhiqwang commented Jun 14, 2021

Hi, @Tomakko

You can check this notebook: https://github.com/zhiqwang/yolov5-rt-stack/blob/master/notebooks/how-to-align-with-ultralytics-yolov5.ipynb, the only trick here is the different image pre-processing operations, we use the same process in this notebook to ensure that the results of the two inferences are consistent.

And the ultralytics/yolov5#3054 (comment) is a good resource about the different dataloaders (pre-precessing) in ultralytics's YOLOv5.

I believe this notebook will solve your problem, and as such I'm closing this issue. Feel free to file a new issue if you have further questions.

@zhiqwang zhiqwang removed their assignment Jun 14, 2021
@zhiqwang zhiqwang added the question Further information is requested label Jun 14, 2021
@Tomakko
Copy link
Contributor Author

Tomakko commented Jun 15, 2021

Hi @zhiqwang, thank you ver ymuch for the notebook. You are awesome :)

It works fine for me with the pretrained model, however it unfortunently does not work for me with a custom trained ultralytics v4.0 model with 10 classes. I only modified your code with respect to loading the image ( from local disk via cv2) and loading the ultralytics model with 10 classes. The ultralytics models prediction look fine when plottet on the image.

I have zipped together the notebook, i am using, the set of weights and an example image for your: https://1drv.ms/u/s!Airq-8SFFd5omVcMjMRvASO_5Ucr?e=zAFocW.

One more thing i noticed is that the centers of the boxes are equal between yolort and ultraltics, only the ultralytics boxes are much larger. Might there some problem with the anchors when we only have 10 classes?

@Tomakko
Copy link
Contributor Author

Tomakko commented Jun 15, 2021

Finally found the issue! The ultralytics training script autoenvolves the anchors per default at the beginning of the training script. They are different from what you defined fix in _yolov5_darknet_pan . I will load the actual anchors there and report back if it works out!

@zhiqwang
Copy link
Owner

Hi @Tomakko

Very glad to hear that you've found the source of this problem, and we're welcome to support the feature of autoenvolved anchors.

@zhiqwang
Copy link
Owner

The ultralytics training script autoenvolves the anchors per default at the beginning of the training script.

@Tomakko , There is a new issue #119 about this problem, let's move the discussion there.

@Root970103
Copy link

Sorry to disturb you, I'm also trying to compile the tvm model. I've updated the latest version of tvm, but still cannot compile correctly. I cannot find out any solution, can you give some advice? Thanks! @zhiqwang @Tomakko

@zhiqwang
Copy link
Owner

zhiqwang commented Jun 21, 2021

@Root970103
Copy link

Root970103 commented Jun 21, 2021

Thanks to @zhiqwang, I ignored the version of Pytorch and torchvision.

It is noted in the tutorial. https://github.com/zhiqwang/yolov5-rt-stack/blob/master/notebooks/export-relay-inference-tvm.ipynb

Currently, Only test TVM with PyTorch 1.7. Other versions may be unstable.

Now the tvm model can be export correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants