-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
understand model output #5304
Comments
👋 Hello @Kieran31, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at [email protected]. RequirementsPython>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started: $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@Kieran31 see PyTorch Hub tutorial for full inference examples on trained custom models. Simple ExampleThis example loads a pretrained YOLOv5s model from PyTorch Hub as import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# Image
img = 'https://ultralytics.com/images/zidane.jpg'
# Inference
results = model(img)
results.pandas().xyxy[0]
# xmin ymin xmax ymax confidence class name
# 0 749.50 43.50 1148.0 704.5 0.874023 0 person
# 1 433.50 433.50 517.5 714.5 0.687988 27 tie
# 2 114.75 195.75 1095.0 708.0 0.624512 0 person
# 3 986.00 304.00 1028.0 420.0 0.286865 27 tie YOLOv5 Tutorials
|
@glenn-jocher thanks. what does size do here? does it chop a height=640, width=1280, RGB image to two 640x640 images? Lines 243 to 252 in 30e4c4f
|
I have a question with the same concept. and I printed the output of (line 298 export.pt): |
@jbattab you might want to start at the beginning and read the YOLO papers, which explain everything well: |
@glenn-jocher thank u. I figured out the Also, line 250 indicates the input image can be a tensor. Lines 243 to 252 in 30e4c4f
However, the output of output = model(torch.zeros(8,3,512,512)) is not a models.common.Detections but a tuple. this is what I raised in my first question. Could you please explain it?
|
@Kieran31 all pytorch models input and output torch tensors. YOLOv5 PyTorch Hub models are AutoShape() classes that wrap a pytorch model and handle inputs and outputs. It's up to you to determine an appropriate --img-size suitable for your deployment requirements. |
Let me ask it in a different way |
@glenn-jocher sorry maybe I'm not clear enough. I tried all input types. filename, URI, OpenCV, PIL, np, multiple give a Lines 243 to 252 in 30e4c4f
|
@Kieran31 yes this is the default behavior. This allows AutoShape models to be used in val.py and detect.py type workflows where more traditional pytorch dataloaders are used that already preprocess the inputs (letterboxing, resizing, etc.) @jbattab see PyTorch Hub tutorial: YOLOv5 Tutorials
|
@glenn-jocher Thanks for your explanation. Lines 255 to 257 in 30e4c4f
I don't find the solution in the PyTorch Hub tutorial. If there is one, I appreciate you pointing it to me. |
@Kieran31 torch inputs create torch outputs because in a traditional torch workflow the dataloader has already padded and collated all images into a batch, and the batch itself does not supply sufficient information to invert these letterboxing operations. Basically you would be attempting to run postprocessing without running preprocessing, which is impossible because postprocessing depends on info generated by preprocessing. |
@glenn-jocher Lines 173 to 185 in a4fece8
Lines 149 to 151 in a4fece8
Lines 182 to 183 in a4fece8
As shown in Line 174 val.py and Line 151 detect.py, the model output of a torch tensor is a tuple. output[0] is for NMS, output[1] is for loss calculation. So if I want to restore the predicted xywh, I just need to pass the whole output[0] to non_max_suppression ?
Also, |
Any update on this? Lines 243 to 252 in 30e4c4f
output is a tuple of length 2. |
@minhtcai
Lines 177 to 183 in 8df64a9
|
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
Hi, |
@hamedmh Lines 177 to 183 in 8df64a9
out is torch.Size([1, 15120, 85]) |
@Kieran31 Thank you for the answer. |
@hamedmh Lines 177 to 183 in 8df64a9
for the first one, I don't know. I'm not a Ultralytics member, and this issue has been closed, so I'm not sure if the ultralytics team can received your questions here. My suggestion would be to open a new issue or/and reading the yolov5 paper. |
@Kieran31 Thank you for the explanation. |
What about an exported model? I finetuned yolov5s and exported it for mobile (torchscript). How to use the model on iOS device if I don't have access to all the utility methods for image preprocessing? |
Hi @mladen-korunoski , Actually the main ops used in the pre-processing is interpolation and pad, and torch provided these two ops, so I guess you can just use torchscript to implement the pre-processing, check the following as an example. |
Hi! I was able to convert the model from yolov5 to neuron with the follow code:
Now that I am trying to test and compare I have the tensors outputs different from yolo as follow: Neuron Yolov5 Model:
Yolov5 (this one):
Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not. I follow @jluntamazon pull but I not able to see difference. # #2953 |
@Kieran31 Hi, It might be late for question but I have some to ask you. I am new in object detection. I'm still confused with computing loss from train_out. |
Can you please explain how can I get the class of the objects present in each anchor? |
Hey team, I have a question for the output shape of my model. After the training process for expiry date detection with yolov5, I got an output like this: index | xmin | ymin | xmax | ymax | confidence | class | name | path And I converted the .pt file for usage in CoreML. And it say that the YOLOv5 gives an output shape (1, 25200, 8). The input image size is 640x640. How could I understand the output? If I print the first 8 elements of the output array, it shows me like this: Could someone give an explanation? Thanks |
@doppelvincent hi there! The model's output tensor shape is [1, 25200, 8], representing the predicted bounding boxes and their attributes. It contains 25200 entries that correspond to bounding box predictions. Each prediction is composed of 8 values: [x_center, y_center, width, height, objectness, class_0_confidence, class_1_confidence, class_2_confidence]. In your output, the values seem to be in the correct order and format. These values represent the predicted bounding box attributes, such as its center coordinates, width, height, objectness score, and class confidences. You can extract and interpret these values for each bounding box to understand the model's predictions. If you need further assistance in interpreting the output or in integrating it into CoreML, feel free to ask. Good luck with your expiry date detection project! |
@glenn-jocher hello! When I use the yolov5-7.0 before add the --train in export.py, the first export onnx is: And now, I use the onnx file to test the accuracy by val.py. The fisrt onnx get error:
the sencond onnx can get success: Loading runs\train\WI_PRW_SSW_SSM_20231127\weights\best.onnx for ONNX Runtime inference... |
@glenn-jocher I also get the same question when use the yolov5-6.2. |
@dengxiongshi Thanks for reaching out. It looks like you're encountering issues with using the ONNX model for validation after the export process. To best troubleshoot this, I recommend the following steps:
Regarding your query about reshaping and concatenating the three outputs from the first export into a single output, the process may involve reshaping the outputs to ensure compatibility and then concatenating them along the appropriate dimension. If the issue persists, I recommend posting your detailed question on the YOLOv5 GitHub repository: https://github.com/ultralytics/yolov5. The community and the Ultralytics team will be better equipped to assist with debugging and resolving the issues you're facing. Let me know if I can help you further with any of these steps! |
Hi @glenn-jocher. For the three steps, I have checked the corresponding environment and dependencies and there is no problem. I submitted an issue in here. |
@dengxiongshi great to hear that you've checked the environment and dependencies thoroughly. I see that you've also raised an issue on the YOLOv5 GitHub repository. Our team will assist you there to address the ONNX export and validation concerns effectively. Feel free to reach out if you have any further queries or need additional assistance. Good luck with resolving the issue! |
I find out that 85 meaning I wonder if i understood correctly. @glenn-jocher |
@mandal4 hello! It looks like you've figured out the output dimensions correctly. The 85 in the output tensor [1, 25556, 85] corresponds to the number of classes plus the bounding box coordinates and the objectness score for each prediction. In YOLOv5, the output tensor typically has the shape [batch_size, number_of_anchors, 4 + 1 + number_of_classes], where:
In your case, with 21 classes, the output would be 4 (bbox) + 1 (objectness) + 21 (classes) = 26. However, you're seeing 85 because If you have any further questions or need clarification, feel free to ask. Good job on diving into the code to understand the model's output! 👍 |
@glenn-jocher hello! I agree with your answer about the output tensor [1, 25556,85], but I still have some question. As you said, the last 80 is the probability of every classes. But I find that |
@Kegard hello again! The values you're seeing in the output tensor are raw logits, not probabilities. They do not sum to 1 because they have not been passed through a softmax function. In YOLOv5, during inference, these logits are typically passed through a sigmoid function to convert them to objectness scores and class confidences, which are separate from each other. The objectness score indicates the likelihood that the bounding box contains any object, while the class confidences represent the likelihood of each class being present in the bounding box. These confidences are not mutually exclusive and are not meant to sum to 1 across all classes. Instead, each class confidence is independent and represents the model's confidence that a particular class is detected within the bounding box. If you want to convert the raw logits to probabilities that sum to 1 for the class predictions, you would apply a softmax function to the class logits. However, this is not the standard practice for YOLO models, as they treat object detection as a multi-label classification problem, where each bounding box can potentially belong to multiple classes with independent probabilities. I hope this clarifies your question! If you need further assistance, feel free to ask. |
Hi @glenn-jocher and everyone, I'm trying to deal with an exported yolov5n.tflite and inference servers. The output I receive from processing an image has a shape of [1, 25200, 85]. This is a sample of the output: About the dimensions, I already understood that:
I am reading all the post related to this topic, but I'm still not being able to manage that output to convert it into an output which I can use and understand, similar to the output obtained when just using yolov8 through the ultralytics module (This may be due to my lack of knowledge since I am just beginning on this topics). As I say, I'm still reading all the previous information related to that, but any help about what steps should I follow would be appreciated. Thank you in advance! |
Hi there! 👋 It sounds like you're on the right track with understanding the output of your YOLOv5n.tflite model. To make sense of these outputs and convert them into a more usable form (bounding boxes, class IDs, and scores), you'll typically need to apply some post-processing steps. Here’s a brief overview:
In pseudo-code, your process might look something like this: # Assuming outputs is your model output with shape [1, 25200, 85]
# Sigmoid the objectness score and class predictions
outputs[..., :4] = torch.sigmoid(outputs[..., :4]) # Adjust bounding boxes
outputs[..., 4:] = torch.sigmoid(outputs[..., 4:]) # Objectness and class preds
# Apply a threshold to filter out low-confidence predictions
conf_threshold = 0.25
mask = outputs[..., 4] > conf_threshold
outputs = outputs[mask]
# Apply NMS
nms_threshold = 0.45
boxes, scores, classes = nms(outputs, nms_threshold)
# boxes, scores, and classes are your final, usable outputs Keep in mind, the This process should help you glean more actionable insights from your model's predictions. Keep experimenting and studying; you're doing great so far! Feel free to ask if you have more questions. Happy coding! |
Hi @glenn-jocher , Thank you so much for your response. I've trying to apply the steps you proposed, but I'm still obtaining an output with no sense. After applying the Sigmoid, most of the objectness values are almost 1.0, what makes no sense. I'm thinking maybe it is related to the fact that the output of the net is quantized, so I probably should dequantize it before (I'm working with an inference server running on an embedded arm64 system). Does it makes sense? Do you know how could I face the problem to dequantize the output of the net previous to applying the sigmoid? Thank you! |
Hi there! Yes, it absolutely makes sense that if you're working with a quantized model, the outputs could be in a quantized format. Before applying sigmoid functions or any further processing, you would indeed need to dequantize these outputs to floating-point values, which can significantly affect your post-processing steps. The approach to dequantize depends on the framework you're using. Generally, if the model was quantized using TensorFlow Lite, the Here's a simplified example in Python for dequantization: def dequantize(quantized_value, scale, zero_point):
# Convert quantized value to a floating point
return scale * (quantized_value - zero_point) You'd need to apply this function to your model outputs using the appropriate Keep in mind, the details may vary depending on your precise setup and framework. If you're using a different environment or library, they might provide built-in methods to handle dequantization more seamlessly. Hope this helps you move forward! Let me know if you have any more questions. Happy coding! 😊 |
When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy. |
2 similar comments
When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy. |
When I convert the YOLO v8 model weights to int16 and validate, I'm getting 0 accuracy, but with float32 model weights, I'm getting 0.87 accuracy. |
Hello @madasuvenky, Thank you for reaching out and providing details about your issue. It sounds like you're experiencing a significant drop in accuracy when converting your YOLOv8 model weights to int16. This is indeed unusual and suggests there might be an issue with the quantization process. To help us investigate further, could you please provide a minimum reproducible code example? This will allow us to better understand the steps you're taking and identify any potential issues. You can find guidelines on creating a minimum reproducible example here. Ensuring we can reproduce the bug is crucial for us to provide an effective solution. Additionally, please make sure you are using the latest versions of Quantization can be tricky, especially when dealing with different data types. If you haven't already, you might want to check the scale and zero-point values used during the quantization process, as incorrect values can lead to significant accuracy drops. Here's a brief example of how you might dequantize your model outputs if you're using TensorFlow Lite: def dequantize(quantized_value, scale, zero_point):
return scale * (quantized_value - zero_point)
# Example usage
quantized_output = ... # Your quantized model output
scale = ... # Scale factor from your model
zero_point = ... # Zero point from your model
dequantized_output = dequantize(quantized_output, scale, zero_point) Feel free to share more details or any specific error messages you're encountering. We're here to help! |
❔Question
Hi team,
I trained the model on 512x512 images. Now I want to do detection on a huge image, for example 5000x5000. So I chopped the huge image to 512x512 images with a tiler and created a dataloader with batch size = 8.
Say my
input_batch
is of shape [8, 3, 512, 512]Now I have difficulty understanding the model output. Can someone help me interpret these?
output
is a tuple of length 2.output[0]
is a tensor of size [8, 16128, 6].output[1]
is a list of length 3.output[1][0]
is a tensor of size [8, 3, 64, 64, 6]output[1][1]
is a tensor of size [8, 3, 32, 32, 6]output[1][2]
is a tensor of size [8, 3, 16, 16, 6]Additional context
I didn't find a tool for integrating multiple image detection results. If this repo does have one, please tell me.
Thanks very much.
The text was updated successfully, but these errors were encountered: