Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MixConv2d() remove shortcut + apply depthwise #5410

Merged
merged 1 commit into from
Oct 30, 2021
Merged

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Oct 30, 2021

MixConv2d fixes:

  1. Apply depthwise convolutions per paper
  2. Remove shortcut (causing errors, not in paper)

Validation test:

import torch

from utils.torch_utils import profile
from models.experimental import MixConv2d
from models.common import Conv

m1 = MixConv2d(128, 256, (3, 5), 1)
m2 = Conv(128, 256, 3, 1)
results = profile(input=torch.randn(16, 128, 80, 80), ops=[m1, m2], n=3)

YOLOv5 🚀 v6.0-39-g3d9a368 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
        4864      0.9961         0.684         4.922         12.76       (16, 128, 80, 80)       (16, 256, 80, 80)
      295424        60.5         0.990         9.727         8.917       (16, 128, 80, 80)       (16, 256, 80, 80)

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced neural network layers with SiLU activation and optimized MixConv2d implementation.

📊 Key Changes

  • Replaced LeakyReLU activation with SiLU (Swish-1 activation function) in certain network layers for potentially improved performance.
  • Refined the MixConv2d layer by introducing a cleaner convolution strategy, which takes into account the greatest common divisor (GCD) for grouped convolutions, and a new method for distributing channels over different kernels based on their size.
  • Ensured the new activation function SiLU follows the inplace pattern to maintain memory efficiency, as previously used activations did.

🎯 Purpose & Impact

  • Enhanced Model Performance: The switch to SiLU activation may lead to improved training results as SiLU often outperforms traditional activations like LeakyReLU in neural networks.
  • Optimized Convolutional Layers: Improved MixConv2d can provide better computational efficiency and precision in how input channels are distributed across different convolutional kernel sizes.
  • Consistent Memory Efficiency: Maintaining inplace activation ensures the memory footprint of models is kept low, which is beneficial for users with limited computing resources.

The changes can result in more accurate models that are efficient in both operation and resource utilization, benefiting a wide range of users, from researchers to industry professionals implementing YOLOv5 in their systems.

@glenn-jocher glenn-jocher linked an issue Oct 30, 2021 that may be closed by this pull request
@glenn-jocher glenn-jocher changed the title Fix MixConv2d() remove shortcut + apply depthwise Fix MixConv2d() remove shortcut + apply depthwise Oct 30, 2021
@glenn-jocher glenn-jocher self-assigned this Oct 30, 2021
@glenn-jocher glenn-jocher merged commit 5d4258f into master Oct 30, 2021
@glenn-jocher glenn-jocher deleted the fix/mixconv branch October 30, 2021 11:38
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

About the use of MixConv2d
1 participant