-
FC 작동 방식
-
3232 image data → 10241 Flatten vector
-
10개의 class가 있으므로 1024*10 feature 배열 생성
-
Flatten vector와 feature 배열을 곱해서 Output class 획득하고 이를 통해 Classification
-
-
Issue 1
- The dimensionality of the number of weights that you need to learn scales really poorly with the size of the image
- We want neural networks that we use for computer vision to be able to work well with larger images as well
-
Issue 2
- Do we really need this many weights at all?
- matrix multiplication : learning a weight a separate weight for each input pixel(image)
- In many images there's maybe a few regions of that image that are particularly important to telling what's inside the image → overkill
-
Issue 3
- FC layer에 image를 넣으면 3차원 → 1차원 변형 과정에서 image의 spatial information을 잃게 됨
-
entire image를 다루는 대신 flattening 한 후 matrix multiply
-
과정
-
5*5 patch를 통해 25 dimensional vector로 추출
-
dot product를 통해 one-dimensional output으로 get
-
전체 image에서 5*5 patch를 slide해서 위 과정 반복
-
input과 output이 정확하게 같은 size는 아니지만 3-dimensional tensor는 유지하고 있다.
앞서 본 것은 linear operation. 실제로는 각 layer 뒤에 non-linearity 적용.
- Strides
- Convolutions can subsample the image by jumping across some locations - called ‘stride’
만일 위의 예시와 같이 Stride가 커서 filter가 이동했을 때 image를 벗어나게 되면 어떻게 해야할까? → Padding
- Padding
- Padding solves the problem of filters running out of image
- Done by adding extra rows/cols to the input (usually set to 0)
- ‘SAME’ padding is illustrated here for filter=(3,3) with stride=(2,2)
- Not padding is called ‘VALID’ padding
- Input: WxHxD volume
- Parameters:
- K filters, each with size (F, F)
- commonly set to powers of 2 (e.g. 32, 64, 128)
- ... moving at stride (S, S)
- commonly (5,5), (3,3), (2,2), (1,1)
- ... with padding P
- ‘SAME’ sets it automatically
- K filters, each with size (F, F)
- Output
- W’ = (W - F + 2P) / S + 1
- H’ = (H - F + 2P) / S + 1
- Each filter has (F * F * D) parameters
- K * (F * F * D) total in the layer
- X_col (27 * 9)
- 27 = 3 * 3 * 3 (extracted patch)
- 9 = each of the positions in the input tensor
- W (32 * 27)
- 32개의 filter
- the # of weights (3 * 3 filter, 3-channel input) = 27 (333)
- W @ X = (3227) @ (279) = (32 * 9)
- Reshape (3332)
- Increasing the receptive field (dilated convolutions)
- Decreasing the size of the tensor
- pooling
- 1*1 - convolutions