Part 1
Activation Functions
what if input of neural is always positive?
x * weight and pass through activation function
dL/df(activation func)
dF/dW = x
W가 항상 같은 방향으로만 움직임
Data Preprocessing
zero mean
normalize => every dimension contribute fairly
image: zero-centering only, not normalization
does data preprocessing can solve problem of sigmoid?
only at the first step
Weight Initialization
if we initialize all params = 0, what happens?
w = 0, all neurons operate same, output same, gradient same, update same
=> all neuron looks same. symmetry breaking does not happen which we want
but loss can affect backprop differently?
but all neuron linked with same weight
Small random numbers
W = 0.01 * np.random.randn(D,H)
okay with small networks, but proplems with deeper networks
converge to 0 because of activation function
1 or -1?
in case of tanh, saturation happens
Xavier initialization (Glorot 2010)
initialize input number with Standard Gaussian
W = np.random.randn(fan_in, fan_out)/np.sqrt(fan_in)
but doesnt work well with ReLU - lowering the variance, values are getting smaller - deactivated
Batch Normalization
consider a batch of activation at some layer
to make every each dimension unit gaussian
usually insert after FC or CV layers and before nonlinearity
improve gradient flow
allow higher learning rates
reduces the strong dependence on initialization
acts as a form of regularization in a funny way, and slightly reduces the need for dropout, maybe
normalizing input of layer, not weight of layer
gaussian => - mean / deviation
shift와 scale요소를 추가시키고 학습을 시켜버리면 결국 identity mapping이 되서 BM이 사라지지않음?
doens't happen actually
BabySitting the Learning Process
if data size is small? overfiit is must
Hyperparameter Optimization
learning rate
cross - validation : training set으로 학습, validation set으로 평가
'AI > CS231n' 카테고리의 다른 글
CS231n-Lec8. Deep Learning Software (0) | 2023.09.11 |
---|---|
CS231n - Lec7. Training Neural Networks 2 (0) | 2023.09.11 |
CS231n - Lec5. Convolutional Neural Networks (0) | 2023.09.08 |
CS231n - Lec4. Backpropagation and Neural Network (0) | 2023.09.08 |
CS231n - Lec3. Loss Functions and Optimization (0) | 2023.09.06 |