Introduction of Deep Learning

Introduction of Deep Learning
Dong-Hyun Kwak

Table of Contents Artificial Neural Networks Perceptron
Rate coding / Spiking Perceptron XOR Non linear Problem Multi-layer Perceptron Universal Function Approximator Non Linear Activation Function Logistic Regression Gradient Descent Momentum: Per-dimension Learning Rate Error Back-propagation Chain Rule Why Deep? Gradient Vanishing Problem RBM layer-wise pretraining ReLU Regression / Binary Classification / Multi Classification Linear Regression + Least Mean Square / Softmax + Cross Entropy

Artificial Neural Networks – Spiking Neuron
포아송 프로세스가 아웃풋임. Computational Neuro-Science에서 주로 연구하는 모델. 여기는 사람의 뉴런을 분석하기 위해 모델링을 함

Artificial Neural Networks – Rate Coding Neuron
근데 frequency 도메인으로 보면 처리할 수 있는 정보량이 같음

https://blog.dbrgn.ch/2013/3/26/perceptrons-in-python/
Perceptron

Perceptron - XOR

Perceptron Linearly Non-Separable

Multi-layer Perceptron

Multi-layer Perceptron Universal Function Approximation
In the mathematical theory of artificial neural networks, the universal approximation theorem states[1] that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions

Multi-layer Perceptron Universal Function Approximation
[여창준a] [오후 1:54] measurable function [여창준a] [오후 1:54] lebesgue integral 같은거 할줄알아야 [여창준a] [오후 1:54] 아 그리고 borel set이 뭔지도 알아야되고 [여창준a] [오후 1:54] Lp space가 뭐고 [여창준a] [오후 1:54] 그 위에서 적분은 또 어떻게 하고 [여창준a] [오후 1:54] 이런거 다 알아야 저 논문 보는데

Activation Function

Sigmoid Function 확률적 해석이 가능함

Hidden Layer에 Activation Function이 없으면?  2층 네트워크 == Logistic Regression과 같음

Gradient Descent Loss Function 을 W(parameter)로 편미분해서 W에 대한 Gradient를 구한다. Gradient를 이용해서 W를 업데이트 한다. W* = W − λ Loss'(W)

Gradient Descent

Gradient Descent의 문제점 1) Local Optima  Momentum
(사실은 per-dimension learning rate) Divergence  Gradient Decaying V = µV' + λ Loss'(W) W* = W - V - 모멘텀은, 원래 가던 방향을 남겨놓기 때문에, 발산하는걸 상쇄시켜줌 adagrad는 글로벌하게 디맨전마다 러닝레이트를 다르게 주는 것임. 그래서 Gradient decaying은 어떤 optimizer와도 같이 사용이 가능함.

Gradient Descent의 문제점 느림  Stochastic Gradient Descent
(또한 랜덤한 요소의 작용으로 더 수렴이 좋음. 그러나 subset이 전체의 분포를 충분히 반영해야함)

Error Back-propagation
Chain Rule Delta = node’s error

Deep Layer

Gradient Vanshing

Gradient Vanishing Layer-wise Pretraining

Gradient Vanishing 의미 1) Global optima와 더 가까운 Initial Weight 제공
의미 2) Layer간의 긴밀함이 증가해서 gradient가 더 잘 전파됨

Gradient Vanishing 2) ReLU

Output Node 1) Regression  Weighted Sum + Least Mean Square
2) Classification  Softmax + Cross-Entropy

THANK YOU

Introduction of Deep Learning

Similar presentations

Presentation on theme: "Introduction of Deep Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Войти

Auth with social network:

Introduction of Deep Learning

Similar presentations

Presentation on theme: "Introduction of Deep Learning"— Presentation transcript:

Similar presentations

About project

Feedback