How to compute
once we can express using a computational graph, we can use backpropagation
recursively use the chain rule in order to compute the gradient
how does backpropagation work?
ex) f(x,y,z) = (x+y)z (x=-2, y=5, z=-4)
df/dz = 3 df/dy = df/dq * dq/dy = -4 df/dx = df/dq * dq/dx = -4 <= with CHAIN RULE
local gradient
gradients
sigmoid gate : d(Sigmoid)/dx = (1-Sigmoid) * Sigmoid
add gate : gradient distributor (인자들 모두 전에거 그대로 받음)
max gate : gradient router (하나는 그대로, 하나는 0을 받음)
mul gate : gradient switcher (서로 바꿔서 받음)
df/dx = SUM(df/dqi * dqi/dx)
Neural Network
Linear score function : f = W * x
2-layer neural network : f = W1 * max( 0 , W1 * x )
activation function : ReLU, 1 hidden layer Neural Net == 2-layer Neural Net
W1 * x => hidden layer => W2 로 분류
근데 학생들이 어디서 말인지를 확인하냐? 는 질문을 하던데 무슨 소리인지..
다양한 activation function 사용 가능
'AI > CS231n' 카테고리의 다른 글
CS231n- Lec6. Training Neural Networks 1 (0) | 2023.09.10 |
---|---|
CS231n - Lec5. Convolutional Neural Networks (0) | 2023.09.08 |
CS231n - Lec3. Loss Functions and Optimization (0) | 2023.09.06 |
CS231n - Lec2. Image Classification (0) | 2023.09.05 |
CS231n - Lec1. Intro (0) | 2023.09.01 |