image is the Torch7 distribution package for processing images. It contains a wide variety of functions divided into the following categories:
- Saving and loading images as JPEG, PNG, PPM and PGM;
- Simple transformations like translation, scaling and rotation;
- Parameterized transformations like convolutions and warping;
- Graphical user interfaces like display and window;
- Color Space Conversions from and to RGB, YUV, Lab, and HSL;
- Tensor Constructors for creating Lenna, Fabio and Gaussian and Laplacian kernels;
Note that unless speficied otherwise, this package deals with images of size
nChannel x height x width
.
The image format is determined from the filename
's
extension suffix. Supported formats are
JPEG,
PNG,
PPM and PGM.
The returned res
Tensor has size nChannel x height x width
where nChannel
is
1 (greyscale) or 3 (usually RGB
or YUV.
Usage:
local fin = torch.DiskFile(imfile, 'r')
fin:binary()
fin:seekEnd()
local file_size_bytes = fin:position() - 1
fin:seek(1)
local img_binary = torch.ByteTensor(file_size_bytes)
fin:readByte(img_binary:storage())
fin:close()
-- Then when you're ready to decompress the ByteTensor:
im = image.decompressJPG(img_binary)
Rescale the height and width of image src
.
Variable size
is a number or a string specifying the
size of the result image. When size
is a number, it specifies the
maximum height or width of the output. When it is a string like
WxH or MAX or ^MIN, it specifies the height x width
, maximum, or minimum height or
width of the output, respectively.
Rescale the height and width of image src
to fit the dimensions of
Tensor dst
.
If list dst
is provided, with or without Tensors, it is used to store the output images.
Otherwise, returns a new res
list of Tensors.
Internally, this function makes use of functions image.gaussian, image.scale and image.convolve.
## Parameterized transformations ## This section includes functions for performing transformations on images requiring parameter Tensors like a warp `field` or a convolution `kernel`. ### [res] image.warp([dst,]src,field,[mode,offset,clamp]) ### Warps image `src` (of size`KxHxW`) according to flow field `field`. The latter has size `2xHxW` where the first dimension is for the `(y,x)` flow field. String `mode` can take on values [lanczos](https://en.wikipedia.org/wiki/Lanczos_resampling), [bicubic](https://en.wikipedia.org/wiki/Bicubic_interpolation), [bilinear](https://en.wikipedia.org/wiki/Bilinear_interpolation) (the default), or *simple*. When `offset` is true (the default), `(x,y)` is added to the flow field. The `clamp` variable specifies how to handle the interpolation of samples off the input image. Permitted values are strings *clamp* (the default) or *pad*. If `dst` is specified, it is used to store the result of the warp. Otherwise, returns a new `res` Tensor. ### [res] image.convolve([dst,] src, kernel, [mode]) ### Convolves Tensor `kernel` over image `src`. Valid string values for argument `mode` are : * *full* : the `src` image is effectively zero-padded such that the `res` of the convolution has the same size as `src`; * *valid* (the default) : the `res` image will have `math.ceil(kernel/2)` less columns and rows on each side; * *same* : performs a *full* convolution, but crops out the portion fitting the output size of *valid*; Note that this function internally uses [torch.conv2](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.conv.dok). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.lcn(src, [kernel]) ### Local contrast normalization (LCN) on a given `src` image using kernel `kernel`. If `kernel` is not given, then a default `9x9` Gaussian is used (see [image.gaussian](#image.gaussian)).To prevent border effects, the image is first global contrast normalized (GCN) by substracting the global mean and dividing by the global standard deviation.
Then the image is locally contrast normalized using the following equation:
res = (src - lm(src)) / sqrt( lm(src) - lm(src*src) )
where lm(x)
is the local mean of each pixel in the image (i.e.
image.convolve(x,kernel)
) and sqrt(x)
is the element-wise
square root of x
. In other words, LCN performs
local substractive and divisive normalization.
Note that this implementation is different than the LCN Layer defined on page 3 of What is the Best Multi-Stage Architecture for Object Recognition?.
### [res] image.erode(src, [kernel, pad]) ### Performs a [morphological erosion](https://en.wikipedia.org/wiki/Erosion_(morphology)) on binary (zeros and ones) image `src` using odd dimensioned morphological binary kernel `kernel`. The default is a kernel consisting of ones of size `3x3`. Number `pad` is the value to assume outside the image boundary when performing the convolution. The default is 1. ### [res] image.dilate(src, [kernel, pad]) ### Performs a [morphological dilation](https://en.wikipedia.org/wiki/Dilation_(morphology)) on binary (zeros and ones) image `src` using odd dimensioned morphological binary kernel `kernel`. The default is a kernel consisting of ones of size `3x3`. Number `pad` is the value to assume outside the image boundary when performing the convolution. The default is 0. ## Graphical User Interfaces ## The following functions, except for [image.toDisplayTensor](#image.toDisplayTensor), require package [qtlua](https://github.com/torch/qtlua) and can only be accessed via the `qlua` Lua interpreter (as opposed to the [th](https://github.com/torch/trepl) or luajit interpreter). ### [res] image.toDisplayTensor(input, [...]) ### Optional arguments `[...]` expand to `padding`, `nrow`, `scaleeach`, `min`, `max`, `symmetric`, `saturate`. Returns a single `res` Tensor that contains a grid of all in the images in `input`. The latter can either be a table of image Tensors of size `height x width` (greyscale) or `nChannel x height x width` (color), or a single Tensor of size `batchSize x nChannel x height x width` or `nChannel x height x width` where `nChannel=[3,1]`, `batchSize x height x width` or `height x width`.When scaleeach=false
(the default), all detected images
are compressed with successive calls to image.minmax:
image.minmax{tensor=input[i], min=min, max=max, symm=symmetric, saturate=saturate}
padding
specifies the number of padding pixels between images. The default is 0.
nrow
specifies the number of images per row. The default is 6.
Note that arguments can also be specified as key-value arguments (in a table).
### [res] image.display(input, [...]) ### Optional arguments `[...]` expand to `zoom`, `min`, `max`, `legend`, `win`, `x`, `y`, `scaleeach`, `gui`, `offscreen`, `padding`, `symm`, `nrow`. Displays `input` image(s) with optional saturation and zooming. The `input`, which is either a Tensor of size `HxW`, `KxHxW` or `Kx3xHxW`, or list, is first prepared for display by passing it through [image.toDisplayTensor](#image.toDisplayTensor): ```lua input = image.toDisplayTensor{ input=input, padding=padding, nrow=nrow, saturate=saturate, scaleeach=scaleeach, min=min, max=max, symmetric=symm } ``` The resulting `input` will be displayed using [qtlua](https://github.com/torch/qtlua). The displayed image will be zoomed by a factor of `zoom`. The default is 1. If `gui=true` (the default), the graphical user inteface (GUI) is an interactive window that provides the user with the ability to zoom in or out. This can be turned off for a faster display. `legend` is a legend to be displayed, which has a default value of `image.display`. `win` is an optional qt window descriptor. If `x` and `y` are given, they are used to offset the image. Both default to 0. When `offscreen=true`, rendering (to generate images) is performed offscreen. ### [window, painter] image.window([...]) ### Creates a window context for images. Optional arguments `[...]` expand to `hook_resize`, `hook_mousepress`, `hook_mousedoublepress`. These have a default value of `nil`, but may correspond to commensurate qt objects. ## Color Space Conversions ## This section includes functions for performing conversions between different color spaces. ### [res] image.rgb2lab([dst,] src) ### Converts a `src` RGB image to [Lab](https://en.wikipedia.org/wiki/Lab_color_space). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.rgb2yuv([dst,] src) ### Converts a RGB image to YUV. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.yuv2rgb([dst,] src) ### Converts a YUV image to RGB. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.rgb2y([dst,] src) ### Converts a RGB image to Y (discard U and V). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.rgb2hsl([dst,] src) ### Converts a RGB image to [HSL](https://en.wikipedia.org/wiki/HSL_and_HSV). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.hsl2rgb([dst,] src) ### Converts a HSL image to RGB. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.rgb2hsv([dst,] src) ### Converts a RGB image to [HSV](https://en.wikipedia.org/wiki/HSL_and_HSV). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.hsv2rgb([dst,] src) ### Converts a HSV image to RGB. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. ### [res] image.rgb2nrgb([dst,] src) ### Converts an RGB image to normalized-RGB. ### [res] image.y2jet([dst,] src) ### Converts a L-levels (1 to L) greyscale image into a L-levels jet heat-map. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor.This is particulary helpful for understanding the magnitude of the values of a matrix, or easily spot peaks in scalar field (like probability densities over a 2D area). For example, you can run it as
image.display{image=image.y2jet(torch.linspace(1,10,10)), zoom=50}
The default value of height
and width
is size
, where the latter
has a default value of 3. The amplitude of the Gaussian (its maximum value)
is amplitude
. The default is 1.
When normalize=true
, the kernel is normalized to have a sum of 1.
This overrides the amplitude
argument. The default is false
.
The default value of the horizontal and vertical standard deviation
sigma_horz
and sigma_vert
of the Gaussian kernel is sigma
, where
the latter has a default value of 0.25. The default values for the
corresponding means mean_horz
and mean_vert
are 0.5. Both the
standard deviations and means are relative to kernels of unit width and height
where the top-left corner is the origin. In other works, a mean of 0.5 is
the center of the kernel size, while a standard deviation of 0.25 is a quarter
of it. When tensor
is provided (a 2D Tensor), the height
, width
and size
are ignored.
It is used to store the returned gaussian kernel.
Note that arguments can also be specified as key-value arguments (in a table).
### [res] image.gaussian1D([size, sigma, amplitude, normalize, mean, tensor]) ### Returns a 1D Gaussian kernel of size `size`, mean `mean` and standard deviation `sigma`. Respectively, these arguments have default values of 3, 0.25 and 0.5. The amplitude of the Gaussian (its maximum value) is `amplitude`. The default is 1. When `normalize=true`, the kernel is normalized to have a sum of 1. This overrides the `amplitude` argument. The default is `false`. Both the standard deviation and mean are relative to a kernel of unit size. In other works, a mean of 0.5 is the center of the kernel size, while a standard deviation of 0.25 is a quarter of it. When `tensor` is provided (a 1D Tensor), the `size` is ignored. It is used to store the returned gaussian kernel.Note that arguments can also be specified as key-value arguments (in a table).
### [res] image.laplacian([size, sigma, amplitude, normalize, [...]]) ### Returns a 2D [Laplacian](https://en.wikipedia.org/wiki/Blob_detection#The_Laplacian_of_Gaussian) kernel of size `height x width`. When used in a 2D convolution, the Laplacian of an image highlights regions of rapid intensity change and is therefore often used for edge detection (ref.: [Laplacian/Laplacian of Gaussian](http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm)). Optional arguments `[...]` expand to `width`, `height`, `sigma_horz`, `sigma_vert`, `mean_horz`, `mean_vert`.The default value of height
and width
is size
, where the latter
has a default value of 3. The amplitude of the Laplacian (its maximum value)
is amplitude
. The default is 1.
When normalize=true
, the kernel is normalized to have a sum of 1.
This overrides the amplitude
argument. The default is false
.
The default value of the horizontal and vertical standard deviation
sigma_horz
and sigma_vert
of the Laplacian kernel is sigma
, where
the latter has a default value of 0.25. The default values for the
corresponding means mean_horz
and mean_vert
are 0.5. Both the
standard deviations and means are relative to kernels of unit width and height
where the top-left corner is the origin. In other works, a mean of 0.5 is
the center of the kernel size, while a standard deviation of 0.25 is a quarter
of it.
$ luarocks install image
> require 'image'
> l = image.lena()
> image.display(l)
> f = image.fabio()
> image.display(f)