LeNet
input => conv & pooling => FC layer
conv filter = 5x5 , stride=1
pooling layer = 2x2, stride=2
Case Studies
AlexNet 2012 ILSVRC'12
why so much nicer than before: fist deep learning and conv net approach
First Large scale CNN
ImageNet classification task를 잘 수행함
Input : 227x227x3
FirstLayer(Conv1) : 96 11x11 filters, stride=4
output vol - [55x55x96]
parameters - (11x11x3)x96 = 35k
SecondLayer(Pool1) : 3x3 filters, stride=2
output vol - [27x27x96]
...
VGG ILSVRC'14
60M
why use small filters?(3x3 conv)
stack of three 3x3 conv(stride=1) layers has same effective receptive field as one 7x7 conv layer
smaller the filter, smaller the number of parameters, can make deeper layer
receptive field 3x3 (first) 5x5 (second) 7x7 (third) => sane receptive field as 7x7 but deeper layer
-> more non-linearity , less params
what is the effective receptive field of three 3x3 conv(stride1) layers?
why many filters in conv layer?
one filter is to create one feature map
GoogleLeNet ILSVRC'14
5M
use Inception module - a good local network typology
network within a network => local topology
동일한 입력을 받는 서로 다른 다양한 필터가 병렬로 존재 - 출력을 모두 depth 방향으로 합침(concatenate)-one tensor
problem: calculation cost
why use small filters?(3x3 conv)
1x1 conv => 28x28x128
3x3 conv => 28x28x192
5x5 conv => 28x28x96
3x3 pool => 28x28x256
-------------------------
total => 28x28x672
depth가 너무 늘어남, 1x1 conv 를 추가하여 (bottleneck layer) depth를 중간중간 줄임
중간의 보조분류기 loss 를 모두 합친 평균으로 학습을 시킴
(auxiliary classification output to inject additional gradient at lower layers)
맨 마지막서 부터 chain rule을 이용하여 gradient 를 전달하면, 네트워크가 깊을경우 0으로 수렴, -> 보조분류기 이용
ResNet
why use small filters?(3x3 conv)
why use small filters?(3x3 conv)
deep network가 안좋은 이유가 overfitting이 아님
optimization에 문제가 생김
the problem is an optimization problem, deeper models are harder to optimize
solution : use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping
direct mapping 대신에 residual mapping
레이어가 직접 H(x) 를 학습하기보다 H(x)-x 를 학습할 수 있도록 만들어줌
skip connection은 가중치가 없으며, 입력을 identity mapping으로 그대로 출력단으로 내보냄
실제 레이어는 변화량 (delta)만 학습하면 됨
입력 X에 대한 잔차(residual)
최종 출력값은 input X + residual
레이어가 full mapping 을 학습하기보다 이런 조금의 변화만 학습
copy learned layers from the shallower model and setting additional layers to identity mapping
use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping
왜 Residual 을 학습하는게 더 쉬움? 그냥 창작자의 가설임
Also...
Nin(Network in Network) 2014
mlp conv layer
conv layer 안에 mlp를 쌓음 (fc layer 몇개를 쌓음)
each conv layer to compute more abstract features for local patches
mlp conv layer with micronetwork within each conv layer to compute more abstract features for local patches
'AI > CS231n' 카테고리의 다른 글
CS231n - Lec11. Detection and Segmentation (0) | 2023.09.17 |
---|---|
CS231n - Lec10. Recurrent Neural Networks (0) | 2023.09.15 |
CS231n-Lec8. Deep Learning Software (0) | 2023.09.11 |
CS231n - Lec7. Training Neural Networks 2 (0) | 2023.09.11 |
CS231n- Lec6. Training Neural Networks 1 (0) | 2023.09.10 |