본문 바로가기

AI/CS231n

CS231n - Lec9. CNN Architecture

반응형

LeNet 

input => conv & pooling => FC layer

conv filter = 5x5 , stride=1 

pooling layer = 2x2, stride=2

 

Case Studies

AlexNet 2012  ILSVRC'12

why so much nicer than before: fist deep learning and conv net approach 

First Large scale CNN

ImageNet classification task를 잘 수행함 

 

Input : 227x227x3 

FirstLayer(Conv1) : 96 11x11 filters, stride=4 

output vol - [55x55x96]

parameters - (11x11x3)x96 = 35k

SecondLayer(Pool1) : 3x3 filters, stride=2

output vol - [27x27x96]

...

 

VGG  ILSVRC'14

60M

why use small filters?(3x3 conv)

stack of three 3x3 conv(stride=1) layers has same effective receptive field as one 7x7 conv layer 

smaller the filter, smaller the number of parameters, can make deeper layer

receptive field 3x3 (first) 5x5 (second) 7x7 (third) => sane receptive field as 7x7 but deeper layer

-> more non-linearity , less params 

what is the effective receptive field of three 3x3 conv(stride1) layers?

why many filters in conv layer?

one filter is to create one feature map

GoogleLeNet  ILSVRC'14

5M

use Inception module  - a good local network typology

network within a network => local topology 

동일한 입력을 받는 서로 다른 다양한 필터가 병렬로 존재 - 출력을 모두 depth 방향으로 합침(concatenate)-one tensor 

problem: calculation cost 

 

why use small filters?(3x3 conv)

1x1 conv => 28x28x128

3x3 conv => 28x28x192

5x5 conv => 28x28x96

3x3 pool => 28x28x256

-------------------------

total  => 28x28x672 

depth가 너무 늘어남, 1x1 conv 를 추가하여 (bottleneck layer) depth를 중간중간 줄임 

 

중간의 보조분류기 loss 를 모두 합친 평균으로 학습을 시킴 

(auxiliary classification output to inject additional gradient at lower layers)

맨 마지막서 부터 chain rule을 이용하여 gradient 를 전달하면, 네트워크가 깊을경우 0으로 수렴, -> 보조분류기 이용 

ResNet

why use small filters?(3x3 conv)

why use small filters?(3x3 conv)

deep network가 안좋은 이유가 overfitting이 아님 

optimization에 문제가 생김 

the problem is an optimization problem, deeper models are harder to optimize

solution : use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping 

direct mapping 대신에 residual mapping 

레이어가 직접 H(x) 를 학습하기보다 H(x)-x 를 학습할 수 있도록 만들어줌 

skip connection은 가중치가 없으며, 입력을 identity mapping으로 그대로 출력단으로 내보냄 

실제 레이어는 변화량 (delta)만 학습하면 됨

입력 X에 대한 잔차(residual) 

최종 출력값은 input X + residual 

레이어가 full mapping 을 학습하기보다 이런 조금의 변화만 학습 

copy learned layers from the shallower model and setting additional layers to identity mapping

use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping 

왜 Residual 을 학습하는게 더 쉬움?  그냥 창작자의 가설임 

 

Also...

Nin(Network in Network) 2014

mlp conv layer

conv layer 안에 mlp를 쌓음 (fc layer 몇개를 쌓음) 

each conv layer to compute more abstract features for local patches 

mlp conv layer with micronetwork within each conv layer to compute more abstract features for local patches 

반응형