CS231n - Lec9. CNN Architecture

LeNet

input => conv & pooling => FC layer

conv filter = 5x5 , stride=1

pooling layer = 2x2, stride=2

Case Studies

AlexNet 2012 ILSVRC'12

why so much nicer than before: fist deep learning and conv net approach

First Large scale CNN

ImageNet classification task를 잘 수행함

Input : 227x227x3

FirstLayer(Conv1) : 96 11x11 filters, stride=4

output vol - [55x55x96]

parameters - (11x11x3)x96 = 35k

SecondLayer(Pool1) : 3x3 filters, stride=2

output vol - [27x27x96]

...

VGG ILSVRC'14

60M

why use small filters?(3x3 conv)

stack of three 3x3 conv(stride=1) layers has same effective receptive field as one 7x7 conv layer

smaller the filter, smaller the number of parameters, can make deeper layer

receptive field 3x3 (first) 5x5 (second) 7x7 (third) => sane receptive field as 7x7 but deeper layer

-> more non-linearity , less params

what is the effective receptive field of three 3x3 conv(stride1) layers?

why many filters in conv layer?

one filter is to create one feature map

GoogleLeNet ILSVRC'14

use Inception module - a good local network typology

network within a network => local topology

동일한 입력을 받는 서로 다른 다양한 필터가 병렬로 존재 - 출력을 모두 depth 방향으로 합침(concatenate)-one tensor

problem: calculation cost

why use small filters?(3x3 conv)

1x1 conv => 28x28x128

3x3 conv => 28x28x192

5x5 conv => 28x28x96

3x3 pool => 28x28x256

-------------------------

total => 28x28x672

depth가 너무 늘어남, 1x1 conv 를 추가하여 (bottleneck layer) depth를 중간중간 줄임

중간의 보조분류기 loss 를 모두 합친 평균으로 학습을 시킴

(auxiliary classification output to inject additional gradient at lower layers)

맨 마지막서 부터 chain rule을 이용하여 gradient 를 전달하면, 네트워크가 깊을경우 0으로 수렴, -> 보조분류기 이용

ResNet

why use small filters?(3x3 conv)

deep network가 안좋은 이유가 overfitting이 아님

optimization에 문제가 생김

the problem is an optimization problem, deeper models are harder to optimize

solution : use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping

direct mapping 대신에 residual mapping

레이어가 직접 H(x) 를 학습하기보다 H(x)-x 를 학습할 수 있도록 만들어줌

skip connection은 가중치가 없으며, 입력을 identity mapping으로 그대로 출력단으로 내보냄

실제 레이어는 변화량 (delta)만 학습하면 됨

입력 X에 대한 잔차(residual)

최종 출력값은 input X + residual

레이어가 full mapping 을 학습하기보다 이런 조금의 변화만 학습

copy learned layers from the shallower model and setting additional layers to identity mapping

use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping

왜 Residual 을 학습하는게 더 쉬움? 그냥 창작자의 가설임

Also...

Nin(Network in Network) 2014

mlp conv layer

conv layer 안에 mlp를 쌓음 (fc layer 몇개를 쌓음)

each conv layer to compute more abstract features for local patches

mlp conv layer with micronetwork within each conv layer to compute more abstract features for local patches

'AI > CS231n' 카테고리의 다른 글

CS231n - Lec11. Detection and Segmentation (0)	2023.09.17
CS231n - Lec10. Recurrent Neural Networks (0)	2023.09.15
CS231n-Lec8. Deep Learning Software (0)	2023.09.11
CS231n - Lec7. Training Neural Networks 2 (0)	2023.09.11
CS231n- Lec6. Training Neural Networks 1 (0)	2023.09.10

똑똑이가되고싶댜

CS231n - Lec9. CNN Architecture

LeNet

Case Studies

AlexNet 2012 ILSVRC'12

VGG ILSVRC'14

GoogleLeNet ILSVRC'14

ResNet

Also...

Nin(Network in Network) 2014

'AI > CS231n' 카테고리의 다른 글

티스토리툴바

CS231n - Lec9. CNN Architecture

LeNet

Case Studies

AlexNet 2012 ILSVRC'12

VGG ILSVRC'14

GoogleLeNet ILSVRC'14

ResNet

Also...

Nin(Network in Network) 2014

'AI > CS231n' 카테고리의 다른 글

'AI/CS231n' Related Articles

티스토리툴바