The Canny edge detector uses a multi-stage algorithm to detect a wide range of edges in images and has been widely applied in various computer vision systems.
- Raffaele Meloni (GitHub, LinkedIn)
- Leslie Xu (GitHub, LinkedIn)
- Amirreza Movahedin (GitHub, LinkedIn)
Deliverable | Location |
---|---|
Python implementation (software) | py-canny/ |
HLS implementation (hardware) | hls-canny/ |
Bitstreams | hls-canny/deliverables |
Jupyter notebook for Pynq-Z1 and Pynq-Z2 | jupyter-canny/ |
To see how to setup the HLS and Vivado IP integrator, please refer to the setup file. And how to set the jupyter notebook, please refer to the jupyter notebook file. While for the python implementation, please refer to the python implementation file.
The Canny edge detection algorithm has 5 stages (plus a preprocessing stage):
- Grayscale conversion - Converting the input image to grayscale.
- Noise Reduction - Using Gaussian filter to blur the input image and remove the existing noise inside the image.
- Gradient calculation - Calculating the intensity gradients of the image.
- Non-maximum suppression - Determining the appropriate threshold to avoid recognizing edge wrongly.
- Double threshold - Applying a double threshold to determine potential edges.
- Edge Tracking by Hysteresis - Pruning edges with weak values and not connected to strong edges.
The tests on software and hardware have been performed with the following conditions:
- The software version was tested with an
Asus ROG Zephyrus G14 GA401Q (2020) with 16 GB RAM and AMD Ryzen 7 3200Mhz CPU, OS Ubuntu 20.04 LTS
. - Both hardware and software version were tested with an input resolution of
1280x720
pixels.
Implementation | Speed (single image) | Framerate | LUTs (%) | FFs (%) | MUXs (%) | Total On-chip Power (W) |
---|---|---|---|---|---|---|
Software | 0.81s | 1.24fps (estimated) | NA | NA | NA | NA |
Hardware | NA (Directly tested with target framerate) | 60fps | 30.98 | 24.12 | 2.23 | 2.014 |
The numbers prove that the hardware implementation outperforms its relative software version about ~60 times faster.
The software implementation of the Canny edge detection algorithm is done in Python. The code is tested on the PYNQ-Z1 and PYNQ-Z2 boards. Each stage of the algorithm is implemented in a separate file. A source directory contains all the input images which are read and processed by the code. In the canny.py file, the images are processed, shown and stored for each substep.
The hardware implementation of the Canny edge detection algorithm is done in HLS. The code is tested on the PYNQ-Z1 and PYNQ-Z2 boards. The hardware realization allows to implement a video processing pipeline. Following steps explains how a canny edge detection is implemented in HLS.
Each pipeline stage is implemented taking care the latency. Divisions and floating point operations are avoided if possible, since they introduce a lot of latency which might exceeds the timing requirements (7ns).
Video processing pipeline is a sequence of processing blocks that are connected in a chain. Each block is responsible for a specific step. HDMI modules handle video frames as streams, therefore FIFOs between pipeline steps are used for flow control. The following diagram shows the high-level block diagram of the Canny edge detection algorithm in hardware.
Hardware handles streams of pixels and it is not possible to store full frame completely. Therefore there is no way to iterate an input frame enterely. We implemented a line buffer and sliding window from scratch to perform convolution. The line buffer stores the input frame line by line.
The sliding window is a 2D array that is moved over the line buffer to perform the convolution.
It is a preprocessing step that converts the input image to grayscale. The HLS code is implemented in the grayscale.cpp file. There are multiple ways to convert an image to grayscale. The HLS code uses the following formula:
Gray = Red ∗ 0.25 + Green ∗ 0.5 + Blue ∗ 0.25
We chose to use this approximation instead of other well known approaches since it avoids floating point operations and divisions. In fact, it can be implemented in hardware through shift operations which are particurarly suited for FPGAs:
Gray = (Red + Green << 1 + Blue) >> 2
Input image | Grayscale image |
---|---|
The Canny algorithm is extremely sensitive to noise. Noise reduction allows to detect better the edges in the image. The Gaussian blur is a common technique used to reduce noise in images using a convolution kernel.
But it leads to a floating point kernel, which is not well suited for hardware implementation. Therefore, we decided to use an approximation which provides a good trade-off between accuracy and performance. This approximation exploits shift operations and integer arithmetic. $$\begin{equation*} K_{approx} = \frac{1}{256} \begin{bmatrix} 1 & 4 & 6 & 4 & 1 \ 4 & 16 & 24 & 16 & 4 \ 6 & 24 & 36 & 24 & 6 \ 4 & 16 & 24 & 16 & 4 \ 1 & 4 & 6 & 4 & 1 \end{bmatrix} \end{equation*}$$
Input image (grayscaled) | Gaussian blur image |
---|---|
This step of the pipeline, the intensity and direction of the edges are detected. An edge is a sudden change in the intensity of the image. The Sobel operator is used to calculate the intensity gradient of the image. Two kernels are convolutioned with the image to calculate the gradient in the horizontal (y) and vertical (x) direction. $$\begin{equation*} K_x= \begin{bmatrix} -1 & 0 & 1 \ -2 & 0 & 2 \ -1 & 0 & 1 \end{bmatrix},: K_y= \begin{bmatrix} 1 & 2 & 1 \ 0 & 0 & 0 \ -1 & -2 & -1 \end{bmatrix} \end{equation*}$$
The convolution prouduces two derivatives in the x and y direction (
The gradient direction is calculated using an approximation of the tangent function:
- Completely horizontal: 0° or 180° (from -22.5° to 22.5°)
- Completely vertical: 90° or 270° (from 67.5° to 112.5°)
- Positive slope: 45° or 225° (from 22.5° to 67.5°)
- Negative slope: 135° or 315° (from 112.5° to 157.5°)
Input image (gaussian blur) | Sobel image |
---|---|
Ideally the final image should have thinner the edges. The non-maximum suppression step is used to thin the edges in the image. The gradient direction is used to determine the direction of the edge and the gradient magnitude is used to determine the strength of the edge.
Basing on the direction 2 neighboring pixels are compared to the processed pixel. If the intensity of either one of the 2 pixels is higher than the processed pixel, the output pixel (weaker) corresponding to the processed pixel is set to 0. Otherwise, the processed pixel remains its value.
The line buffer and the sliding window are used for the comparison in this step.
After this stage the output image results thinner compared to the previous one.
Input image (sobel) | Non-maximum suppression image |
---|---|
Three kinds of edges are detected in the image: strong, weak and non-relevant:
- The strong edges contribute to the final edge.
- The non-relevant edges does not contribute to the final edge.
- The weak edges contribute to the final edge only if they are connected to a strong edge. (This is checked in the next step)
Two thresholds are used to classify the edges:
- The high threshold is used to classify the strong edges from the weak edges.
- The low threshold is used to classify the weak edges from the non-relevant edges.
Neither line buffer nor sliding window are used in this step. Each pixel is processed independently, compared with the thresholds and the output is generated.
Input image (non-maximum-suppression) | Double threshold image |
---|---|
This step is used to connect the weak edges to the strong edges. If the input pixel is strong or non-relevant, it stays the same in the output, but if the input pixel is weak, all 8 neighboring pixels of the input are checked: If none of the 8 neghboring pixels are strong, then the output corresponding to the input pixel is non-relevant (black), otherwise the output is strong (white).
-------------
| | | |
-------------
| | x | | // x is the input pixel compared with the 8 neighboring pixels
-------------
| | | |
-------------
This steps uses the line buffer and the sliding window to check the 8 neighboring pixels.
Input image (double threshold) | Edge tracking by hysteresis image (Final result) |
---|---|
Canny image | Canny image (without non-maximum suppression) |
---|---|
Input image | CV2 Canny image | Our Canny image |
---|---|---|