Machine Learning & Deep Learning
Measuring performance for classification confusion matrix
Measuring performance for classification confusion matrix
Estimating future performance Holdout method 일반적으로, 전체데이터의 2/3 => training, 1/3 => testing holdout을 여러 번 반복하여 best model을 취함 test data는 model 생성에 영향을 미치지 않아야 함 하지만, random하게 잡은 training data에 대하여 다수의 model을 생성한 후, test data에 대 하여 best model을 찾는 것이어서, hold-out 기법에서의 test performance는 공정하지 않음
Estimating future performance Holdout 기법의 문제점을 해소하기 위해 전체 데이터 집합을 => training, test, validation 집합으로 나눔 Validation data: model 개선 및 최종 선택 시 활용 Test data: 미래 예측 (또는 분류)에 대한 최종 평가단계에서 “1회” 사용
Performance Evaluation
Neural Networks
Neural Networks
Logistic Regression vs. Neural Networks
Neural Networks
AND/OR problem
Multilayer Perceptrons No one on earth had found a viable way to train. Marvin Minsky, 1969
Backpropagation
Backpropagation A dataset Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
Backpropagation Training the neural network Fields class 1.4 2.7 1.9 0 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
Backpropagation 초기 weight값은 random하게 설정 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
Backpropagation Training data를 하나씩 입력 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 1.4 2.7 1.9
Backpropagation 각 노드의 activation 결과에 따라 출력값 계산 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 1.4 2.7 0.8 1.9
Backpropagation 계산된 출력값과 실제 정답 출력값을 비교 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 1.4 2.7 0.8 1.9 error 0.8
Backpropagation Error값에 따라 weight 조정 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 1.4 2.7 0.8 1.9 error 0.8
Backpropagation 또 새로운 training data를 입력 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 6.4 2.8 1.7
Backpropagation 각 노드의 activation 결과에 따라 출력값 계산 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 6.4 2.8 0.9 1.7
Backpropagation 1 계산된 출력값과 실제 정답 출력값을 비교 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 6.4 2.8 0.9 1 1.7 error -0.1
Backpropagation 1 Error값에 따라 weight 조정 Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 6.4 2.8 0.9 1 1.7 error -0.1
Backpropagation 1 Training data Fields class 1.4 2.7 1.9 0 6.4 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … 6.4 2.8 0.9 1 1.7 error -0.1 Error 가 임계점 이하로 떨어질 때까지 weight 조정을 반복
Backpropagation 노드의 연산 입력 노드: 받은 신호를 단순히 전달 출력 노드: 합 계산과 활성 함수 계산 2019-04-24
Backpropagation 예) 이 퍼셉트론은 w=(1,1)T, b=-0.5 따라서 결정 직선은 나머지 b, c, d는? 이 퍼셉트론은 w=(1,1)T, b=-0.5 따라서 결정 직선은 2019-04-24
Backpropagation 퍼셉트론 학습 예) AND 분류 문제 1 x1 ? y x2 a b c d a=(0,0)T b=(1,0)T c=(0,1)T d=(1,1)T ta= -1 tb= -1 tc= -1 td=1 1 x1 ? y x2 a b c d 2019-04-24
Backpropagation 단계 1 단계 2 식 (4.2) 매개변수 집합 Θ={w, b} 분류기 품질을 측정하는 J(Θ)를 어떻게 정의할 것인가? Y: 오분류된 샘플 집합 J(Θ)는 항상 양수 Y가 공집합이면 J(Θ)=0 |Y|가 클수록 J(Θ) 큼 2019-04-24
Artificial Neural Networks 단계 3 J(Θ)=0인 Θ를 찾아라. Gradient descent method (내리막 경사법) 현재 해를 방향으로 이동 학습률 ρ를 곱하여 조금씩 이동 2019-04-24
Backpropagation 알고리즘 스케치 알고리즘에 필요한 수식들 초기해를 설정한다. 멈춤조건이 만족될 때까지 현재 해를 방향으로 조금씩 이동시킨다. 알고리즘에 필요한 수식들 Learning rate 내리막 방향으로 조금씩 이동 2019-04-24
Artificial Neural Networks w(0)=(-0.5,0.75)T, b(0)=0.375 ① d(x)= -0.5x1+0.75x2+0.375 Y={a, b} ② d(x)= -0.1x1+0.75x2+0.375 Y={a} 2019-04-24
Artificial Neural Networks Deep Networks An abstracted feature Non-output layer = Auto-encoder Input layer Output layer Hidden layer Hierarchical feature layer output layer쪽으로 갈수록 Feature abstraction이 강해짐
Artificial Neural Networks Deep Networks Learning Multi-layer network 학습을 한꺼번에 하지 않고, 각 layer별로 단계 적으로 수행
Feature detectors
what is each of nodes doing?
Hidden layer nodes become self-organised feature detectors 1 5 10 15 20 25 … … 1 strong +ve weight low/zero weight 63
What does this unit detect? 1 5 10 15 20 25 … … 1 strong + weight low/zero weight Top row에 있는 pixel에 강하게 반응하는 feature 63
What does this unit detect? 1 5 10 15 20 25 … … 1 strong + weight low/zero weight Top left corner의 dark 영역에 강하게 반응하는 feature 63
Deep Neural Networks etc … etc … Feature abstraction v 특정 위치의 line을 layer etc … Feature abstraction Line-level feature들을 이용하여 윤곽을 탐지하는 feature들의 layer etc … v
Deep Neural Networks Feature abstraction
Backpropagation
Breakthrough in 2006 & 2007 by Hinton & Bengio
Breakthrough
Breakthrough
Image Recognition Demo Toronto Deep Learning - http://deeplearning.cs.toronto.edu/
Speech Recognition
Deep Learning Vision Students Practitioner Not too late to be a world expert Not too complicated Practitioner Accurate enough to be used in practice Many read-to-use tools such as TensorFlow Many easy & simple programming languages such as Python
Activation function problem Deep Learning의 문제 Activation function problem Backpropagation과정과 연관 Weight initialization
Solving the XOR problem
Solving the XOR problem How can we get W & b from the training data?
Solving the XOR problem
Backpropagation w가 cost함수에 미치는 영향 w=? x Cost = ^y - y
Backpropagation: chain rule 활용
Backpropagation: chain rule 활용
Activation function: sigmoid ?
Deep network -> poor result
Vanishing gradient Gradient 값을 back propagate 시키게 되면 input layer 방향으로 진행될 수록 값이 미약해짐 ? Sigmoid function이 문제
Vanishing gradient: sigmoid function? 1 ReLU: Rectified Linear Unit max {0, z}
Performance
Activation Functions Leaky ReLU
Performance [Mishkim et al. 2015]
Weight Initialization
Weight Initialization Hinton et al. (2006) “A Fast Learning Algorithm for Deep Belief Nets” => Restricted Boltzmann Machine encoding decoding
RBM Deep Learning : pre-training step
RBM Deep Learning : fine tuning step
Weight Initialization Xavier/He initialization Makes sure the weights are “just right”, not too small, not too big Using the number of input (fan_in) and output (fan_out)
Avoiding overfitting Regularization Dropout Target function = cost + 𝑤 2 Dropout Learning 시에만 dropout Prediction 시에는 모든 노드 사용
Deep Network의 설계 Forward NN Convolutional NN Recurrent NN ??? NN
Convolutional NN
Convolutional NN
Convolutional NN 6
Convolutional NN
Recurrent NN For sequence data (or time series data) We understand the sentences based on the previous words + current word NN/CNN cannot learn the sequence data
Recurrent NN
Recurrent NN
Recurrent NN
Recurrent NN
RNN applications Language modeling Speech recognition Machine translation Conversation modeling Image/Video captioning Image/Music/Dance generation
RNN structures
RNN structures Training RNNs is very challenging !