Loss Function
Loss function (cost function) : loss function is a method of evaluating how well your machine learning algorithm models your featured data set. measurement of how good your model is at predicting the expected outcome
Linear Classifier
Todo!
1. Define a loss function that quantify our unhappiness with the scores across the training data
2. Come up with a way of efficiently finding the parameters that minimize the loss function (optimization)
Multiclass SVM loss (Suppor vector machine):
sum(max(Sj-Sji+1 , 0))
+1 is safety margin
how can i choose +1? doesn't matter, it washes out like the overall setting of the scale in W
Q1. What happens to loss if car scores change a bit?
if we jiggle the score, loss will not change
Q2. What is the min/max possible loss?
min=0, max=infinity __/
Q3. At initialization W is small so all s==0, What is the loss?
#ofclasses - 1
=> useful thing to check at practice( for sanity check )
Q4. What if the sum was over all classes? (including j-y_i)
loss += 1
nothing changes significantly but use conventional way of omitting correct one to make mimnimum 0
Q5. What if we used mean instead of sum?
doesn't change, just rescaling
Q6. What if we used squared term Sum(Max(0, Sj-Sji+1)^2)
this is different algorithm. non linear way.
choosing over linear or square?
squared hinge loss : we dont want any big wrong but okay with small wrong
Suppose that we found a W such that loss=0. Is this W unique?
No! 2W, 3W, 4W ..... exists!
minimizing loss for train data is not too good.
add regularization to simplify the model (Occam'sRazor - simplest is the best)
ramda: regularizaiaon strength
다항식이 깊어지지않도록 방지 (L1:차수0, L2:전제차수합0 으로 유도)
L1보다 L2 선호 ( L1은 내가 원하는 특성이 제거됨, L2는 모든것 고려)
Softmax Classifier
in linear classifier, we dont say what scores mean.
but for the multinomial logistic regression, meaning exists
exp : sigmoid
확률=1 , -log=0 , loss=0
확률=0.01, -log=10, loss=10
Q1. What is the min/max possible loss L_i?
min : 0(-log1), max : infinity(-log0)
in order to get totally right, score should be like... +∞(정답), -∞(모든오답)
we will never get 0 loss
Q2. Usually at initialization, W is small so all s=0. what is the loss?
-log(1/C) also for sanity check
Q3. Suppose I take a datapoint and i jiggle a bit(changing its score slightly). What happends to the loss in both cases?
SVM only wants to keep correct score higher than others thats all, doesn't affect loss. But Softmax want to make correct score plus infinity and wrond minus infinity so that jiggling can show significant difference
Optimizaion
how to minimize loss?
Stategy #1 : Random search
Strategy #2 : Follow the slope - GRADIENT DESCENT
numerical gradient : easy to write but slow, approximate
W를 변화시켯을때 loss의 변화를 통해 gradient dW를 구함
W => W+h => dW (use sometimes for debugging - gradient check)
analytic gradient : use calculus, fast, exact, but error-prone
in practice : derive analytic gradient, check your implementation with numerical gradient
we know the gradient then, use GRADIENT DESCENT
weight += - step_size * weight_grad
- : minimum 방향으로, step_size : learning rate
Stochastic Gradient Descent(SGD)
매번 W를 업데이트하면 너무 느림. minibatch(32,64,128..)를 두어서 데이터 개수를 잘라서 n^i개로 W업뎃, ...반복
update W using sum of gradient descent of each minibatch
For Images
1. Color Histogram
2. Histogram of Oriented Gradient (HoG)
3. bag of words
이미지를 잘라서 비지도학습/ 클러스터 등으로 돌려버리면 각도, 색깔등이 뽑힘 => 새로운 이미지가 들어오면 기존것과 비교해 어떤 특징이 있는지 비교
CNN: 특징을 뽑아내서 사용하는 것이 아니라, 입력된 이미지에서 스스로 특징을 뽑아내도록 사용.
'AI > CS231n' 카테고리의 다른 글
CS231n- Lec6. Training Neural Networks 1 (0) | 2023.09.10 |
---|---|
CS231n - Lec5. Convolutional Neural Networks (0) | 2023.09.08 |
CS231n - Lec4. Backpropagation and Neural Network (0) | 2023.09.08 |
CS231n - Lec2. Image Classification (0) | 2023.09.05 |
CS231n - Lec1. Intro (0) | 2023.09.01 |