question about training img size and postprocessing #14182

kronee0516 · 2024-07-03T09:50:02Z

kronee0516
Jul 3, 2024

Hi,

when i change the training img size , what is that mean?
would that change the input image size of the model? as my experience in yolov5 , whatever i set to imgsz , the model input is always 640 x 640, even i am using yolov5l6 and set the imgsz to 1280, so i don't understand how is that work.
like if i have a project would like to use yolo , and the camera is capturing certain resolution image.
let' say 768 * 384 which can be divided with the stride 32 16 8 .
could i train a model input is 768 * 384 to avoid doing rescale for the image and the output coordinate ?

for the postprocessing, since most of the ai chip company in China would do quantization in their ways to fit in their chip, and the postprocessing layer seems not friendly to their quantization and they require to remove the postprocessing layer.
may i have a method to remove the postprocessing layer when export to onnx?

Thanks

Answered by glenn-jocher

Jul 4, 2024

Hi @kronee0516,

I'm glad to hear that the previous response was helpful! Let's dive into your additional questions regarding the export and output structure of YOLOv8.

YOLOv8 Output Structure

In YOLOv8, the output structure has been streamlined compared to YOLOv5. Instead of having multiple outputs for different strides, YOLOv8 consolidates the outputs into a single tensor. This tensor has the shape (1, n+4, 8400), where:

n is the number of classes.
4 represents the bounding box coordinates (x, y, width, height).
8400 is the total number of predictions, which is a result of the feature map sizes and strides combined.

Strides and Anchor Boxes

In YOLOv8, the concept of anchor boxes has be…

View full answer

glenn-jocher · 2024-07-03T15:32:54Z

glenn-jocher
Jul 3, 2024
Maintainer

@kronee0516 hi there,

Great questions! Let me address them one by one:

Training Image Size

When you change the imgsz parameter during training, it specifies the size to which your input images will be resized before being fed into the model. This resizing helps standardize the input dimensions, which is crucial for efficient training and inference.

In YOLOv5 and YOLOv8, the model's input size is indeed set by the imgsz parameter. However, the model architecture itself, including the stride and other layers, remains consistent. For example, if you set imgsz=1280, the input images will be resized to 1280x1280 before being processed by the model.

Regarding your specific use case with a camera capturing images at 768x384, you can indeed train a model with this input size to avoid rescaling. Simply set imgsz=768,384 in your training configuration. This will ensure that the model processes images at the native resolution of your camera, potentially improving performance and accuracy by avoiding unnecessary rescaling.

Postprocessing and ONNX Export

For the postprocessing layer, it's common for AI chip companies to require custom quantization methods. To export a YOLO model to ONNX without the postprocessing layer, you can use the export function with specific configurations. Here’s an example of how you can do this:

from ultralytics import YOLO

# Load your trained model
model = YOLO('path/to/your/model.pt')

# Export the model to ONNX without postprocessing
model.export(format='onnx', simplify=True, dynamic=True, opset=12, nms=False)

In this example:

simplify=True simplifies the ONNX model.
dynamic=True allows for dynamic input sizes.
opset=12 specifies the ONNX opset version.
nms=False removes the Non-Maximum Suppression (NMS) layer, which is part of the postprocessing.

For more detailed guidance, you can refer to our Model Export Documentation.

Additional Resources

For further insights and tips on model training, you might find our Model Training Tips Guide helpful.

If you encounter any issues or have further questions, please provide a reproducible example to help us assist you better. You can find more information on creating a minimum reproducible example here.

Hope this helps! 😊

4 replies

kronee0516 Jul 4, 2024
Author

@glenn-jocher
thanks for the reply
this is really helpful

but i have more questions when i try to export

for yolov5 ,there would be 3 outputs come from stride 32, 16 ,8 and 3 anchor boxes for each stride
then , if i train a model predicting n class for the input image size 640 x 640 , the outputs will be like (1,(n+5)*3,80,80),(1,(n+5)*3,40,40),(1,(n+5)*3,20,20) ,and i need to multiply the result with stride and anchor box to get the actual box

but comes to yolov8 , i only got one output shape (1,n+4,8400)
i search a little bit for this and i only know the object confidence score is removed and represented by the class confidence score
but how about those calculation for strides and anchor boxes?

glenn-jocher Jul 4, 2024
Maintainer

Hi @kronee0516,

I'm glad to hear that the previous response was helpful! Let's dive into your additional questions regarding the export and output structure of YOLOv8.

YOLOv8 Output Structure

In YOLOv8, the output structure has been streamlined compared to YOLOv5. Instead of having multiple outputs for different strides, YOLOv8 consolidates the outputs into a single tensor. This tensor has the shape (1, n+4, 8400), where:

n is the number of classes.
4 represents the bounding box coordinates (x, y, width, height).
8400 is the total number of predictions, which is a result of the feature map sizes and strides combined.

Strides and Anchor Boxes

In YOLOv8, the concept of anchor boxes has been simplified. The model uses anchor-free detection, which means it does not rely on predefined anchor boxes. Instead, it predicts the bounding box coordinates directly. This change helps in reducing the complexity and improving the efficiency of the model.

Here’s a brief overview of how the calculations differ:

YOLOv5: Uses anchor boxes and multiple output layers corresponding to different strides (32, 16, 8). The outputs need to be multiplied by the stride and anchor box dimensions to get the actual bounding boxes.
YOLOv8: Uses a single output tensor and predicts bounding boxes directly without relying on anchor boxes. The strides are implicitly handled within the model architecture.

Example Code for Exporting YOLOv8

To export a YOLOv8 model to ONNX, you can use the following code snippet:

from ultralytics import YOLO

# Load your trained model
model = YOLO('path/to/your/model.pt')

# Export the model to ONNX
model.export(format='onnx', simplify=True, dynamic=True, opset=12, nms=False)

This will generate an ONNX model without the postprocessing layers, making it more compatible with various AI chips and quantization methods.

Additional Resources

For more detailed information on the changes and improvements in YOLOv8, you can refer to our YOLOv8 Documentation.

If you encounter any issues or have further questions, please provide a reproducible example to help us assist you better. You can find more information on creating a minimum reproducible example here.

Hope this clarifies your queries! 😊

Answer selected by kronee0516

kronee0516 Jul 8, 2024
Author

@glenn-jocher

sorry for bothering again
what exactly yolov8 is doing when training in larger image than 640 x 640
and what if i don't have much data with large image but my practical use case is processing 1920 x 1088 image.
which is better when i dont have enough large image data?

using small image like 640 x 640 but add the option --imgsz 1920 to train and export the model as imgsz 1920 x 1088
using small image like 640 x 640 but add the option --imgsz 640 to train and export the model as imgsz 640 x 640 , and resize the source image to fit the model

glenn-jocher Jul 14, 2024
Maintainer

Hi @kronee0516,

No bother at all! Happy to help clarify your questions. 😊

Training with Larger Images

When training YOLOv8 with images larger than 640x640, the model adjusts its internal layers to handle the larger input size. This can improve detection accuracy for larger objects and finer details but also increases computational load.

Handling Limited Large Image Data

Given your scenario, where you have limited large image data but need to process 1920x1088 images, here are your options:

Train with Small Images, Export with Large Size:
- Training: Use --imgsz 640 to train on smaller images.
- Export: Export the model with --imgsz 1920.
- Pros: The model will be optimized for larger images during inference.
- Cons: May not generalize well if the training data doesn't represent the larger image characteristics.
Train and Export with Small Images, Resize During Inference:
- Training: Use --imgsz 640 for both training and exporting.
- Inference: Resize 1920x1088 images to 640x640 before feeding them to the model.
- Pros: Consistent training and inference pipeline.
- Cons: Potential loss of detail due to resizing, which might affect detection accuracy.

Recommendation

If your practical use case involves high-resolution images and you have limited large image data, I recommend Option 1. Train with smaller images but export the model with the larger image size. This approach leverages the model's ability to handle high-resolution inputs during inference, potentially improving performance on larger images.

Feel free to experiment with both options to see which yields better results for your specific use case. If you encounter any issues, please ensure you are using the latest version of the packages and provide a reproducible example if needed.

Hope this helps! 😊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

question about training img size and postprocessing #14182

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Ultralytics

question about training img size and postprocessing #14182

kronee0516 Jul 3, 2024

YOLOv8 Output Structure

Strides and Anchor Boxes

Replies: 1 comment · 4 replies

glenn-jocher Jul 3, 2024 Maintainer

Training Image Size

Postprocessing and ONNX Export

Additional Resources

kronee0516 Jul 4, 2024 Author

glenn-jocher Jul 4, 2024 Maintainer

YOLOv8 Output Structure

Strides and Anchor Boxes

Example Code for Exporting YOLOv8

Additional Resources

kronee0516 Jul 8, 2024 Author

glenn-jocher Jul 14, 2024 Maintainer

Training with Larger Images

Handling Limited Large Image Data

Recommendation

kronee0516
Jul 3, 2024

Replies: 1 comment 4 replies

glenn-jocher
Jul 3, 2024
Maintainer

kronee0516 Jul 4, 2024
Author

glenn-jocher Jul 4, 2024
Maintainer

kronee0516 Jul 8, 2024
Author

glenn-jocher Jul 14, 2024
Maintainer