본문 바로가기

반응형

전체 글

(57)
CS231n - Lec9. CNN Architecture LeNet input => conv & pooling => FC layer conv filter = 5x5 , stride=1 pooling layer = 2x2, stride=2 Case Studies AlexNet 2012 ILSVRC'12 why so much nicer than before: fist deep learning and conv net approach First Large scale CNN ImageNet classification task를 잘 수행함 Input : 227x227x3 FirstLayer(Conv1) : 96 11x11 filters, stride=4 output vol - [55x55x96] parameters - (11x11x3)x96 = 35k SecondLaye..
CS231n-Lec8. Deep Learning Software CPU vs GPU CUDA (NVIDIA only) write C-like code that runs directly on the GPU highter-level APIs: cuDNN, cuBLAS, etc.. OpenCL similar to CUDA, but on anything slow cuDNN much faster than unoptimized CUDA Deep Learning Frameworks Caffe / Caffe2 Theano/TensorFlow Torch/PyTorch The Point of deep learning frameworks (1) Easily build big computational graphs (2) Easily compute gradients in computatio..
CS231n - Lec7. Training Neural Networks 2 Fancier Optimization Stochastic Gradient Descent while True: while True: weight_grad = evaluate_gradient(loss_fun, data, weights) weight += -step_size * weights_grad Problems with SGD (1) Loss function has condition number : ratio of largest to smallest singular value of the Hessian matrix is large (sensitive to one direction and not sensitive to the other direction) very slow progress along sha..
CS231n- Lec6. Training Neural Networks 1 Part 1 Activation Functions what if input of neural is always positive? x * weight and pass through activation function dL/df(activation func) dF/dW = x W가 항상 같은 방향으로만 움직임 Data Preprocessing zero mean normalize => every dimension contribute fairly image: zero-centering only, not normalization does data preprocessing can solve problem of sigmoid? only at the first step Weight Initialization if we..
CS231n - Lec5. Convolutional Neural Networks neural network convolution neural network history: perceptron 구현 => Mark 1 perceptron machine (update rule different from back prop) Adaline and Mandaline (first Multilayer perceptron network) (no backprop yet) Rumelhart suggest first backprop => chain rule & update rule Yann LeCun's CNN, backprop, gradient based learning for NN 2012 Hinton lab acoustic modeling, speech recognition 2012 AlexNet:..
CS231n - Lec4. Backpropagation and Neural Network How to compute once we can express using a computational graph, we can use backpropagation recursively use the chain rule in order to compute the gradient how does backpropagation work? ex) f(x,y,z) = (x+y)z (x=-2, y=5, z=-4) df/dz = 3 df/dy = df/dq * dq/dy = -4 df/dx = df/dq * dq/dx = -4 hidden layer => W2 로 분류 근데 학생들이 어디서 말인지를 확인하냐? 는 질문을 하던데 무슨 소리인지.. 다양한 activation function 사용 가능
CS231n - Lec3. Loss Functions and Optimization Loss Function Loss function (cost function) : loss function is a method of evaluating how well your machine learning algorithm models your featured data set. measurement of how good your model is at predicting the expected outcome Linear Classifier Todo! 1. Define a loss function that quantify our unhappiness with the scores across the training data 2. Come up with a way of efficiently finding t..
CS231n - Lec2. Image Classification First classifier : Nearest Neighbor Classifier Train O(1) Predict O(N) Setting hyperparameters #1 Choose hyperparameters that work best on the data #2 Split data into train and test, choose hyperparameters that work best on test data #3 Split data into train, val, and test; choose hyperparameters on val and evaluate on test #4 Cross-Validation: split data into folds, try each fold as validation ..

반응형