CS231n- Lec6. Training Neural Networks 1

Part 1 Activation Functions

what if input of neural is always positive?

x * weight and pass through activation function

dL/df(activation func)

dF/dW = x

W가 항상 같은 방향으로만 움직임

Data Preprocessing

zero mean

normalize => every dimension contribute fairly

image: zero-centering only, not normalization

does data preprocessing can solve problem of sigmoid?

only at the first step

Weight Initialization

if we initialize all params = 0, what happens?

w = 0, all neurons operate same, output same, gradient same, update same

=> all neuron looks same. symmetry breaking does not happen which we want

but loss can affect backprop differently?

but all neuron linked with same weight

Small random numbers

W = 0.01 * np.random.randn(D,H)

okay with small networks, but proplems with deeper networks

converge to 0 because of activation function

1 or -1?

in case of tanh, saturation happens

Xavier initialization (Glorot 2010)

initialize input number with Standard Gaussian

W = np.random.randn(fan_in, fan_out)/np.sqrt(fan_in)

but doesnt work well with ReLU - lowering the variance, values are getting smaller - deactivated

Batch Normalization

consider a batch of activation at some layer

to make every each dimension unit gaussian

usually insert after FC or CV layers and before nonlinearity

improve gradient flow

allow higher learning rates

reduces the strong dependence on initialization

acts as a form of regularization in a funny way, and slightly reduces the need for dropout, maybe

normalizing input of layer, not weight of layer

gaussian => - mean / deviation

shift와 scale요소를 추가시키고 학습을 시켜버리면 결국 identity mapping이 되서 BM이 사라지지않음?

doens't happen actually

BabySitting the Learning Process

if data size is small? overfiit is must

Hyperparameter Optimization

learning rate

cross - validation : training set으로 학습, validation set으로 평가

'AI > CS231n' 카테고리의 다른 글

CS231n-Lec8. Deep Learning Software (0)	2023.09.11
CS231n - Lec7. Training Neural Networks 2 (0)	2023.09.11
CS231n - Lec5. Convolutional Neural Networks (0)	2023.09.08
CS231n - Lec4. Backpropagation and Neural Network (0)	2023.09.08
CS231n - Lec3. Loss Functions and Optimization (0)	2023.09.06

똑똑이가되고싶댜

CS231n- Lec6. Training Neural Networks 1

Part 1

Activation Functions

Data Preprocessing

Weight Initialization

Batch Normalization

BabySitting the Learning Process

Hyperparameter Optimization

'AI > CS231n' 카테고리의 다른 글

티스토리툴바

CS231n- Lec6. Training Neural Networks 1

Part 1

Activation Functions

Data Preprocessing

Weight Initialization

Batch Normalization

BabySitting the Learning Process

Hyperparameter Optimization

'AI > CS231n' 카테고리의 다른 글

'AI/CS231n' Related Articles

티스토리툴바