Machine Learning & Deep Learning

Machine Learning & Deep Learning

Measuring performance for classification
confusion matrix

Estimating future performance
Holdout method 일반적으로, 전체데이터의 2/3 => training, 1/3 => testing holdout을 여러 번 반복하여 best model을 취함 test data는 model 생성에 영향을 미치지 않아야 함 하지만, random하게 잡은 training data에 대하여 다수의 model을 생성한 후, test data에 대 하여 best model을 찾는 것이어서, hold-out 기법에서의 test performance는 공정하지 않음

Estimating future performance
Holdout 기법의 문제점을 해소하기 위해 전체 데이터 집합을 => training, test, validation 집합으로 나눔 Validation data: model 개선 및 최종 선택 시 활용 Test data: 미래 예측 (또는 분류)에 대한 최종 평가단계에서 “1회” 사용

Performance Evaluation

Neural Networks

Logistic Regression vs. Neural Networks

Neural Networks

AND/OR problem

Multilayer Perceptrons
No one on earth had found a viable way to train. Marvin Minsky, 1969

Backpropagation

Backpropagation A dataset Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0
etc …

Backpropagation Training the neural network Fields class 1.4 2.7 1.9 0
etc …

Backpropagation 초기 weight값은 random하게 설정 Training data Fields class
etc …

Backpropagation Training data를 하나씩 입력 Training data Fields class
etc … 1.4 2.7 1.9

Backpropagation 각 노드의 activation 결과에 따라 출력값 계산 Training data
Fields class etc … 1.4 1.9

Backpropagation 계산된 출력값과 실제 정답 출력값을 비교 Training data Fields class
etc … 1.4 error 0.8

Backpropagation Error값에 따라 weight 조정 Training data Fields class
etc … 1.4 error 0.8

Backpropagation 또 새로운 training data를 입력 Training data Fields class
etc … 6.4 2.8 1.7

Backpropagation 각 노드의 activation 결과에 따라 출력값 계산 Training data
Fields class etc … 6.4 1.7

Backpropagation 1 계산된 출력값과 실제 정답 출력값을 비교 Training data Fields class
etc … 6.4 1 error -0.1

Backpropagation 1 Error값에 따라 weight 조정 Training data Fields class
etc … 6.4 1 error -0.1

Backpropagation 1 Training data Fields class 1.4 2.7 1.9 0 6.4
etc … 6.4 1 error -0.1 Error 가 임계점 이하로 떨어질 때까지 weight 조정을 반복

Backpropagation 노드의 연산 입력 노드: 받은 신호를 단순히 전달 출력 노드: 합 계산과 활성 함수 계산

Backpropagation 예) 이 퍼셉트론은 w=(1,1)T, b=-0.5 따라서 결정 직선은
나머지 b, c, d는? 이 퍼셉트론은 w=(1,1)T, b=-0.5 따라서 결정 직선은

Backpropagation 퍼셉트론 학습 예) AND 분류 문제 1 x1 ? y x2 a b c d
a=(0,0)T b=(1,0)T c=(0,1)T d=(1,1)T ta= tb= tc= td=1 1 x1 ? y x2 a b c d

Backpropagation 단계 1 단계 2 식 (4.2) 매개변수 집합 Θ={w, b}
분류기 품질을 측정하는 J(Θ)를 어떻게 정의할 것인가? Y: 오분류된 샘플 집합 J(Θ)는 항상 양수 Y가 공집합이면 J(Θ)=0 |Y|가 클수록 J(Θ) 큼

Artificial Neural Networks
단계 3 J(Θ)=0인 Θ를 찾아라. Gradient descent method (내리막 경사법) 현재 해를 방향으로 이동 학습률 ρ를 곱하여 조금씩 이동

Backpropagation 알고리즘 스케치 알고리즘에 필요한 수식들 초기해를 설정한다.
멈춤조건이 만족될 때까지 현재 해를 방향으로 조금씩 이동시킨다. 알고리즘에 필요한 수식들 Learning rate 내리막 방향으로 조금씩 이동

w(0)=(-0.5,0.75)T, b(0)=0.375 ① d(x)= -0.5x1+0.75x Y={a, b} ② d(x)= -0.1x1+0.75x Y={a}

Deep Networks An abstracted feature Non-output layer = Auto-encoder Input layer Output layer Hidden layer Hierarchical feature layer output layer쪽으로 갈수록 Feature abstraction이 강해짐

Deep Networks Learning Multi-layer network 학습을 한꺼번에 하지 않고, 각 layer별로 단계 적으로 수행

Feature detectors

what is each of nodes doing?

Hidden layer nodes become self-organised feature detectors
… … 1 strong +ve weight low/zero weight 63

What does this unit detect?
… … 1 strong + weight low/zero weight Top row에 있는 pixel에 강하게 반응하는 feature 63

What does this unit detect?
… … 1 strong + weight low/zero weight Top left corner의 dark 영역에 강하게 반응하는 feature 63

Deep Neural Networks etc … etc … Feature abstraction v 특정 위치의 line을
layer etc … Feature abstraction Line-level feature들을 이용하여 윤곽을 탐지하는 feature들의 layer etc … v

Deep Neural Networks Feature abstraction

Backpropagation

Breakthrough in 2006 & 2007 by Hinton & Bengio

Breakthrough

Image Recognition Demo
Toronto Deep Learning -

Speech Recognition

Deep Learning Vision Students Practitioner
Not too late to be a world expert Not too complicated Practitioner Accurate enough to be used in practice Many read-to-use tools such as TensorFlow Many easy & simple programming languages such as Python

Activation function problem
Deep Learning의 문제 Activation function problem Backpropagation과정과 연관 Weight initialization

Solving the XOR problem

How can we get W & b from the training data?

Backpropagation w가 cost함수에 미치는 영향 w=? x Cost = ^y - y

Backpropagation: chain rule 활용

Activation function: sigmoid ?

Deep network -> poor result

Vanishing gradient Gradient 값을 back propagate 시키게 되면 input layer 방향으로 진행될 수록 값이 미약해짐 ? Sigmoid function이 문제

Vanishing gradient: sigmoid function?
1 ReLU: Rectified Linear Unit max {0, z}

Performance

Activation Functions Leaky ReLU

Performance [Mishkim et al. 2015]

Weight Initialization

Hinton et al. (2006) “A Fast Learning Algorithm for Deep Belief Nets” => Restricted Boltzmann Machine encoding decoding

RBM Deep Learning : pre-training step

RBM Deep Learning : fine tuning step

Xavier/He initialization Makes sure the weights are “just right”, not too small, not too big Using the number of input (fan_in) and output (fan_out)

Avoiding overfitting Regularization Dropout
Target function = cost +  𝑤 2 Dropout Learning 시에만 dropout Prediction 시에는 모든 노드 사용

Deep Network의 설계 Forward NN Convolutional NN Recurrent NN ??? NN

Convolutional NN

Convolutional NN 6

Convolutional NN

Recurrent NN For sequence data (or time series data)
We understand the sentences based on the previous words + current word NN/CNN cannot learn the sequence data

Recurrent NN

RNN applications Language modeling Speech recognition
Machine translation Conversation modeling Image/Video captioning Image/Music/Dance generation

RNN structures

RNN structures Training RNNs is very challenging !

Machine Learning & Deep Learning

Similar presentations

Presentation on theme: "Machine Learning & Deep Learning"— Presentation transcript:

Similar presentations

About project

지원

로그인

Auth with social network:

Machine Learning & Deep Learning

Similar presentations

Presentation on theme: "Machine Learning & Deep Learning"— Presentation transcript:

Similar presentations

About project

지원