Week 2A : CNNs

Review of convolutions

Why not use FC for images?

FC 작동 방식
1. 3232 image data → 10241 Flatten vector
2. 10개의 class가 있으므로 1024*10 feature 배열 생성
3. Flatten vector와 feature 배열을 곱해서 Output class 획득하고 이를 통해 Classification

Issue 1
- The dimensionality of the number of weights that you need to learn scales really poorly with the size of the image
- We want neural networks that we use for computer vision to be able to work well with larger images as well
Issue 2
- Do we really need this many weights at all?
- matrix multiplication : learning a weight a separate weight for each input pixel(image)
- In many images there's maybe a few regions of that image that are particularly important to telling what's inside the image → overkill
Issue 3
- FC layer에 image를 넣으면 3차원 → 1차원 변형 과정에서 image의 spatial information을 잃게 됨

Convolutional filters

entire image를 다루는 대신 flattening 한 후 matrix multiply
과정
1. 5*5 patch를 통해 25 dimensional vector로 추출
2. dot product를 통해 one-dimensional output으로 get
3. 전체 image에서 5*5 patch를 slide해서 위 과정 반복

Filter stacks and ConvNets

input과 output이 정확하게 같은 size는 아니지만 3-dimensional tensor는 유지하고 있다.

앞서 본 것은 linear operation. 실제로는 각 layer 뒤에 non-linearity 적용.

Strides & Padding

Strides
- Convolutions can subsample the image by jumping across some locations - called ‘stride’

만일 위의 예시와 같이 Stride가 커서 filter가 이동했을 때 image를 벗어나게 되면 어떻게 해야할까? → Padding

Padding
- Padding solves the problem of filters running out of image
- Done by adding extra rows/cols to the input (usually set to 0)
- ‘SAME’ padding is illustrated here for filter=(3,3) with stride=(2,2)
- Not padding is called ‘VALID’ padding

Filter Math

Input: WxHxD volume
Parameters:
- K filters, each with size (F, F)
  - commonly set to powers of 2 (e.g. 32, 64, 128)
- ... moving at stride (S, S)
  - commonly (5,5), (3,3), (2,2), (1,1)
- ... with padding P
  - ‘SAME’ sets it automatically
Output
- W’ = (W - F + 2P) / S + 1
- H’ = (H - F + 2P) / S + 1
Each filter has (F * F * D) parameters
K * (F * F * D) total in the layer

Implementation notes

X_col (27 * 9)
- 27 = 3 * 3 * 3 (extracted patch)
- 9 = each of the positions in the input tensor

W (32 * 27)
- 32개의 filter
- the # of weights (3 * 3 filter, 3-channel input) = 27 (333)
W @ X = (3227) @ (279) = (32 * 9)
Reshape (3332)

Other important ConvNet operations

Increasing the receptive field (dilated convolutions)
Decreasing the size of the tensor
- pooling
- 1*1 - convolutions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Week 2A : CNNs.md

Week 2A : CNNs.md

Week 2A : CNNs

Review of convolutions

Why not use FC for images?

Convolutional filters

Filter stacks and ConvNets

Strides & Padding

Filter Math

Implementation notes

Other important ConvNet operations

Receptive fields

Files

Week 2A : CNNs.md

Latest commit

History

Week 2A : CNNs.md

File metadata and controls

Week 2A : CNNs

Review of convolutions

Why not use FC for images?

Convolutional filters

Filter stacks and ConvNets

Strides & Padding

Filter Math

Implementation notes

Other important ConvNet operations

Receptive fields