Machine Learning to Deep Learning

Slides:



Advertisements
Similar presentations
김수연 Capstone Design Realization Cost Reduction through Deep Artificial Neural Network Analysis.
Advertisements

인공지능 소개 부산대학교 인공지능연구실. 인공 + 지능 인공지능이란 ? 2.
DNN 을 활용한 메일 분류 임영욱 백창훈 정강우.
Segmentation based on Deep learning
Introduction of Deep Learning
Regularization 과적합(overfitting) 문제 Machine Learning.
Training Neural Networks
Machine Learning to Deep Learning_2
Multiple features Linear Regression with multiple variables (다변량 선형회귀)
스테레오 비젼을 위한 3장 영상의 효율적인 영상정렬 기법
Lec 4 Backpropagation & Neural Network
Neural Network - Perceptron
Dialogue System Seminar
Chapter 5. Q-LEARNING & DEEP SARSA
Mesh Saliency 김 종 현.
REINFORCEMENT LEARNING
소형화된 인공두뇌의 제작과 생물학적 이용에 관한 탐구
제4장 자연언어처리, 인공지능, 기계학습.
7장 : 캐시와 메모리.
EPS Based Motion Recognition algorithm Comparison
Numerical Analysis - preliminaries -
Word2Vec Tutorial 박 영택 숭실대학교.
CHAPTER 21 UNIVARIATE STATISTICS
Computational Finance
Genetic Algorithm 신희성.
Technological Forecasting & social change(2014)
A Survey of Affect Recognition Methods :
제 3 장 신경회로망 (Neural Networks)
Cluster Analysis (군집 분석)
5. 비제약 최적설계의 수치해법 (Numerical Methods for Unconstrained Optimum Design)
숭실대학교 마이닝연구실 김완섭 2009년 2월 8일 아이디어  - 상관분석에 대한 연구
Semi-supervised Document classification (probabilistic model and EM)
for Robust Facial Landmark Localization
머신 러닝 2 ㈜ 퀀트랩.
인공 신경망의 종류 Hopfield Self-Organizing Map Perceptron
Medical Instrumentation
4-1 Gaussian Distribution
Parallel software Lab. 박 창 규
PCA Lecture 9 주성분 분석 (PCA)
Lab Assignment 3 Deep Learning 1 1.
뉴런과 인공 신경망 2018 경기오산고등학교 이규성.
Data Mining Final Project
AI 전문 인력 양성 교육 교육명 : Embedded Deep Learning (CNN을 이용한 영상인식)
[15984] 딥 러닝 기술 및 응용 University of Science and Technology
Modeling one measurement variable against another Regression analysis (회귀분석) Chapter 12.
정보 추출기술 (Data Mining Techniques ) : An Overview
Inferences concerning two populations and paired comparisons
딥러닝의 개념들.
좋은징조 담당교수 : 조성제 김도엽 김현일 이상훈.
Progress Seminar 신희안.
: 부정(negative)의 의미를 나타내는 접두사
Statistical inference I (통계적 추론)
The normal distribution (정규분포)
Sentiment Analysis 유재명.
MR 댐퍼의 동특성을 고려한 지진하중을 받는 구조물의 반능동 신경망제어
Internet Computing KUT Youn-Hee Han
Modeling one measurement variable against another Regression analysis (회귀분석) Chapter 12.
점화와 응용 (Recurrence and Its Applications)
Neural Networks & Deep Learning
Definitions (정의) Statistics란?
The general form of 0-1 programming problem based on DNA computing
K Nearest Neighbor.
다층 퍼셉트론의 학습과 최적화 알고리즘 부산대학교 전자전기컴퓨터공학과 김민호.
Hongik Univ. Software Engineering Laboratory Jin Hyub Lee
Progress Seminar 선석규.
Progress Seminar 선석규.
[ 딥러닝 기초 입문 ] 2. 통계적이 아니라 시행착오적 회귀분석 by Tensorflow - Tensorflow를 사용하는 이유, 신경망 구조 -
Python 라이브러리 딥러닝 강의소개 정성훈 연락처 : 이메일 :
Deep Learning Basics Junghwan Goh (Kyung Hee University)
Model representation Linear regression with one variable
Presentation transcript:

Machine Learning to Deep Learning Tutorial code: https://github.com/leejaymin/TensorFlowLecture 2017. 04. 20 Jemin Lee (leejaymin@cnu.ac.kr) Hompage: https://leejaymin.github.io/index.html 역사 알고리즘 약간은 깊이 있는 이해 그것을 통한 실제 구현 과 응용 시스템 Area에서의 Deep-Learning

머신 러닝 (Machine Learning)

머신러닝이란 “머신 러닝 또는 기계 학습은 인공 지능의 한 분야로, 컴퓨터가 학습할 수 있도록 하는 알고리즘과 기술을 개발하는 분야를 말한다.” - 위키피디아 “Field of study that gives computers the ability to learn without being explicitly programmed.” - Arthur Samuel, 1959 데이터 마이닝은 지식이나 인사이트를 얻는다. 머신러닝은 새로운 데이터를 처리하는 데 촛점을 맞추므로 어떻게 해결되었는 지 적절하게 설명하기 어려울 수 있다. 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

리소스 분류 Statistics Computer Science Data Mining Statistical ML 확률 인공지능 회귀 뉴럴 네트워크 베이지안 네트워크 Machine Learning 컴퓨터 비전 자연어처리 통계학의 알고리즘에 많이 의존함. 통계학적 머신러닝이라고 부르기도 함. 컴퓨터 과학에 좀 더 실용적 접근이 많음. 추천 패턴인식 연관분석 빅데이터 Data Mining 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

𝐀𝐈⊃𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠⊃𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 AI는 언어학, 뇌의학, 검색, 로봇틱스 포함 아직 일반지능이 아님 딥러닝은 뉴럴 네트워크를 사용한 머신러닝 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

시간에 따른 변화 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나 2012년 제프리힌튼 팀의 이미지 분류 논문(ImageNet Classification with Deep Convolutional Neural Networks) 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

머신러닝 분류

학습 방법 지도학습, 감독학습 비지도학습, 비감독학습 강화학습 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나 훈련데이터/학습데이터 지도학습: 주가예측, 스팸메일 분류 비지도학습: 고객 분류 딥 러닝은 알고리즘의 하나로 학습방법이 아님. 딥 리인포스먼트 러닝 비지도학습, 비감독학습 강화학습 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

Table of Contents Fundamental Machine Learning Linear Regression: Gradient Descent Algorithm (optimization) Logistic Regression (Single Neuron=Perceptron): Sigmoid (Logistic function), Convexity, Cross Entropy, Decision Boundary Multiple Perceptron (Hidden Layer): Backpropagation algorithm Deep Neural Network Breakthrough Rebirth of Neural Network, renamed DNN DNN, ReLU, Pre-training, Dropout Convolutional Neural Network (CNN) Recurrent Neural Network (RNN)

Linear Regression: 이론 Gradient Descent

Which hypothesis is better 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Cost function[1/2] How fit the line to our (training) data 비용 함수 (Cost Function) == 손실 함수 (Loss Function) == 목적 함수 (Objective Function) == 오차 함수 (Error Function) 𝐻 𝑥 −𝑦 평균 제곱 오차(Mean Square Error)* 오차의 제곱 미분후 깔끔을 위해 2로 나눔 모든 훈련 데이터의 오차 제곱을 더함 훈련 데이터 갯수로 나눔

Cost function[2/2] 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Goal: Minimize cost 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Gradient Descent Algorithm 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Gradient Descent Algorithm Minimize cost function For a given cost function, Cost(W,b), it will find W,b to minimize cost It can be applied to more general function: cost(w1, w2,…) Where you start can determine which minimum you end up Perfect convexity 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Formal definition 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Gradient Descent Intuition[1/2] Coursera, Machine Learning, Andrew Ng

Gradient Descent Intuition[2/2] Coursera, Machine Learning, Andrew Ng

Linear Regression: 예제

Linear Regression with One Variable Suppose you are the CEO of restaurant franchise and are considering different cities for opening a new branch. You'd like to use this data to help you select which city to expand to next. The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates loss. Coursera, Machine Learning, Andrew Ng

Data distribution

Regression result

(Single Neuron, Perceptron, Logistic Regression) 하나의 뉴런 (Single Neuron, Perceptron, Logistic Regression)

Perceptron(2/2) http://cs231n.stanford.edu/slides/winter1516_lecture4.pdf

Perceptron(2/2) 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

Decision Boundary > 0.5: circle < 0.5: triangle

왜 비선형 함수 Sigmoid이 필요한가? 𝑦 = 𝑤 2 × 𝑡+ 𝑏 2 = w 2 × 𝑤 1 × 𝑥+ 𝑏 1 + 𝑏 2 𝑦 = 𝑤 2 × 𝑡+ 𝑏 2 = w 2 × 𝑤 1 × 𝑥+ 𝑏 1 + 𝑏 2 = 𝑤 2 × 𝑤 1 × 𝑥+ 𝑏 1 + 𝑏 2 =𝑤 × 𝑥+𝑏 𝑦 = 𝑤 2 × 𝝋 𝑡 + 𝑏 2 미분가능한 비선형함수: 시그모이드(Sigmoid), 렐루(ReLU), 하이퍼볼릭탄젠트(tanh) 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

Cost Function: Cross Entropy[1/2]

Cost Function: Cross Entropy[2/2] Coursera, Machine Learning, Andrew Ng

Cross Entropy and Gradient Descent Cost Function 합치기 𝐶𝑜𝑠𝑡=− 1 𝑚 𝑖=1 𝑚 [𝑦𝑙𝑜𝑔 𝑦 + 1−𝑦 log⁡(1− 𝑦 )] Gradient Descent Logistic Cost Function 미분 = 선형 회귀 비용함수 미분과 동일 𝜕𝐶𝑜𝑠𝑡 𝜕𝑤 =− 1 𝑚 𝑖=1 𝑚 𝑦− 𝑦 𝑥 Coursera, Machine Learning, Andrew Ng

하나의 뉴런: 예제

해결하고자 하는 문제 학생들이 대학에 입학 할 수 있는지를 예측 과거 지원자의 두 과목 시험 점수와 결과 Ex: 100, 99 = 합격(1) 50, 70 = 불합격(0) 코드 위치 https://github.com/leejaymin/TensorFlowLecture/blob/master/0.1.Fundamental_Neural_Netwo rk/ex2/ex2_jemin.ipynb Coursera, Machine Learning, Andrew Ng

Result

출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Multi-layer Neural Network Multi-layer Perceptron: 이론 Backpropagation

출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

First Dark Age (1969~1986, 17) 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Neural-Network: 두 번째 봄 (Second Spring)

Neural Network Presentation[1/2] Layer 1 has 2 input nodes and Layer 2 has 4 activation nodes Dimension of 𝜃 (1) is going to be 4x3 𝑠 𝑗 =2 𝑠 𝑗+1 =4 𝑠 𝑗+1 × (𝑠 𝑗 +1)=4×3 Coursera, Machine Learning, Andrew Ng

Neural Network Presentation[2/2]

Coursera, Machine Learning, Andrew Ng

New Cost function Coursera, Machine Learning, Andrew Ng

Backpropagation: gradient computation Coursera, Machine Learning, Andrew Ng

딥러닝 6. Fully Connected Neural Network, 홍정모 교수님

딥러닝 6. Fully Connected Neural Network, 홍정모 교수님

딥러닝 6. Fully Connected Neural Network, 홍정모 교수님

동국대 홍정모 교수님, http://blog.naver.com/atelierjpro/220773276384

Multi-layer Perceptron: 예시

MNIST Database Digit number data 학습 데이터 60,000 테스팅 데이터 10,000 Deep Learning, Top-guru, Yann LeCun, 뉴욕대 교수 학습 데이터 60,000 테스팅 데이터 10,000 전처리 완료 데이터 (size, label, color) http://yann.lecun.com/exdb/mnist/ 코드 위치 https://github.com/leejaymin/TensorFlowLecture/blob/master/0.1.Fundamental_Neural_Netwo rk/ex4/ex4_jemin.ipynb

Facebook AI Research (FAIR)

Neural Network Model Three layer Neural Net. Input layer= 20x20+1 Hidden layer= 25 Output layer= 10

Result Training set accuracy: 97%

Go deeper: Deep Neural Network a.k.a Deep Learning

Table of Contents Fundamental Machine Learning Linear Regression: Gradient Descent Algorithm (optimization) Logistic Regression (Single Neuron=Perceptron): Sigmoid (Logistic function), Convexity, Cross Entropy, Decision Boundary Multiple Perceptron (Hidden Layer): Backpropagation algorithm Deep Neural Network Breakthrough Rebirth of Neural Network, renamed DNN DNN, ReLU, Pre-training, Dropout Convolutional Neural Network (CNN) Recurrent Neural Network (RNN)

A BIG problem Backpropagation just did not work well for normal neural networks with many layers Other rising machine Learning algorithms: SVM, Radom-Forest, EM, etc. Statistical method Chou-Fasman method ~ 50% Nearest neighbors ~50~60% Neural networks ~75% (Breakthrough!), 1986 (Dark age for 20 years) Support Vector Machine (SVM) ~ 75% HMM ~ 75% Random Forest ~75% Deep Learning ~88%, 2006 (rebirth of neural net. with big data).

Rebirth of Neural Network: Third Spring 포기하지 않는 연구: 20년간의 겨울 (1986-2016) 인공지능과 딥러닝 빅데이터 안고 부활

Deep Learning의 혁신[1/4]: Google Intern Phonemes: 어떤 언어에서 의미 구별 기능을 갖는 음성상의 최소 단위

Deep Learning의 혁신[2/4]: Competition 우승

Deep Learning의 혁신[3/4]

Deep Learning의 혁신[4/4] ImageNet Classification Top-rank: Krizhevsky and hinton et al.(convent) 16.4% error (top-5) Second-rank: Next best (non-convent) 26.2% error

문제점과 해결점 Algorithm적 문제 Training Data의 부족 Computing Power 문제 Solution Vanishing Gradient -> ReLU (new activation function) Local minimum problem -> pre-training, Restricted-Boltzmann-Machine(RBM), batch normalization (ICML’15) Overfitting -> Dropout, Big-Data Training Data의 부족 Smartphone IoT Computing Power 문제 GPGPU, NVIDA Cloud Computing, AWS, Azure

Algorithm: Vanishing Gradient Sigmoid의 도함수 𝑧=𝑤𝑥+𝑏 𝑔 ′ 𝑧 =𝑔 𝑧 1−𝑔 𝑧 𝑔 𝑧 = 1 1+exp⁡(−𝑧) 0.25 Local Gradient의 Maximum이 0.25 Layer 1개를 지날 때마다 magnitude는 항상 one quarter 이하로 작아진다. Yes you should understand backprop, Andrej Karpathy

Algorithm: Rectified linear Unit (ReLU) 𝑓 𝑥 = 0, 𝑰𝒇 𝑥<0 𝑥, 𝑰𝒇 𝑥 ≥0 도함수 𝑓 𝑥 = 0, 𝑰𝒇 𝑥<0 1, 𝑰𝒇 𝑥 ≥0 ReLU Sigmoid Yes you should understand backprop, Andrej Karpathy

Algorithm: Restricted-Boltzmann-Machine(RBM) 즉, 이 모델은 RBM을 맨 아래 data layer부터 차근차근 stack으로 쌓아 가면서 전체 parameter를 update하는 모델이다. 이 모델을 그림으로 표현하면 아래와 같은 그림이 된다. Restricted Boltzmann Machines - Ep. 6 (Deep Learning SIMPLIFIED)

Algorithm: Restricted-Boltzmann-Machine(RBM)

Algorithm: Pre-training No need to use complicated RBM for weight initializations Simple methods are fine Xavier initialization [1] [1] Understanding the difficulty of training deep feedforward neural networks, 210

Weight Initialization Truncated normal initialization 사용

Algorithm: Overfitting[1/2] Dropout It is an extremely effective, simple and recently introduced regularization technique by Srivastava et al [1]. While training, dropout is implemented by only keeping a neuron active with some probability p (a hyper-parameter), or setting it to zero otherwise. [1] Journal of Machine Learning Research, Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Algorithm: Overfitting[2/2] BigData Real Word == Collected Data 데이터가 많다면 Overfitting을 하는 경향은 최대의 장점이 된다.

Deep Learning의 혁신

NIPS 2015-2016 3755 attendees at NIPS2015! 1838 paper submissions, 1524 reviewers producing 10,625 reviews resulting in 403 papers (21.9%). Only 20 oral talks are selected among 403 accepted papers (single- track). 7pm-12am (5hr) poster session every day. https://twitter.com/Hassan_Sawaf/status/674012958981165056

Deep Learning’s Pioneers Google Scholar 피인용 지수: Yann LeCun 44K, Geoffrey Hinton 160K, Yoshua Bengio 56K, Andrew Ng 72K https://github.com/leehaesung/DeepLearningPioneers/wiki

TensorFlow playground http://playground.tensorflow.org/

Deep Learning Framework

출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

Coursera, Machine Learning, Andrew Ng

Diagnosing bias vs. variance High Bias problem == underfitting High variance problem == overfitting Coursera, Machine Learning, Andrew Ng

Coursera, Machine Learning, Andrew Ng

Learning curves[1/3] Coursera, Machine Learning, Andrew Ng

Learning curves[2/3] Coursera, Machine Learning, Andrew Ng

Learning curves[3/3] Coursera, Machine Learning, Andrew Ng

Summarized bias vs. variance Coursera, Machine Learning, Andrew Ng

Coursera, Machine Learning, Andrew Ng

Table of Contents Fundamental Machine Learning Linear Regression: Gradient Descent Algorithm (optimization) Logistic Regression (Single Neuron=Perceptron): Sigmoid (Logistic function), Convexity, Cross Entropy, Decision Boundary Multiple Perceptron (Hidden Layer): Backpropagation algorithm Deep Neural Network Breakthrough Rebirth of Neural Network, renamed DNN DNN, ReLU, Pre-training, Dropout Convolutional Neural Network (CNN) Recurrent Neural Network (RNN)

CS231n: Convolutional Neural Networks for Visual Recognition Image Classification A core task in Computer Vision Assume given set of discrete lablels {dog, cat, truck, plane, …} Cat CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition Challenges Hard to classify labels CS231n: Convolutional Neural Networks for Visual Recognition

Computer Vision Taksks CS231n: Convolutional Neural Networks for Visual Recognition

Convolutional Neural Nets 이미지 분류 패턴 인식을 통해 기존 정보를 일반화하여 다른 환경의 이미지에 대해 서도 잘 분류함. Sparse Connectivity 한 특징이 있음. Lee Seungeun, CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)

8-1. Convolutional Neural network, 홍정모 교수님

Filter Filter는 detect하고자 하는 feature에 대한 내용이 담긴 사각형 kernel, patch Receptive Field는 이미지 위를 filter size로 rolling하며 detect되는 실 제 feature Lee Seungeun, CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)

Activation Map (Feature Map) 해당 receptive field에 fiter에서 detect하고자 하는 feature가 있는지 없는지 알려줌 = 6600 있다! (50x30)+(20x30)+(50x30)+(50x30)+(50x30) = 0 없다! Lee Seungeun, CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)

CS231n: Convolutional Neural Networks for Visual Recognition Preview CS231n: Convolutional Neural Networks for Visual Recognition

출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

http://cs231n.github.io/convolutional-networks/

Let’s look at other areas with the same filter(w) How many numbers can we get? 출처: 모두를 위한 머신러닝 시즌1, 김성훈 교수님

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

Output Image 크기 계산[1/2] 𝑜= 𝑖−𝑓 𝑠 +1= 4−3 1 +1=2 입력(i): 4x4 필터(f): 3x3 𝑜= 𝑖−𝑓 𝑠 +1= 4−3 1 +1=2 입력(i): 4x4 필터(f): 3x3 스트라이드(s): 1 https://github.com/vdumoulin/conv_arithmetic

Output Image 크기 계산[2/2] 𝑜= 𝑖−𝑓 𝑠 +1= 5−3 2 +1=2 입력(i): 5x5 필터(f): 3x3 𝑜= 𝑖−𝑓 𝑠 +1= 5−3 2 +1=2 입력(i): 5x5 필터(f): 3x3 스트라이드(s): 2 https://github.com/vdumoulin/conv_arithmetic

Padding 계산 𝑜= 𝑖−𝑓+2𝑝 𝑠 +1= 5−4+2×2 1 +1=6 입력(i): 5x5 필터(f): 4x4 𝑜= 𝑖−𝑓+2𝑝 𝑠 +1= 5−4+2×2 1 +1=6 입력(i): 5x5 필터(f): 4x4 스트라이드(s): 1 https://github.com/vdumoulin/conv_arithmetic

CS231n: Convolutional Neural Networks for Visual Recognition Activation Map 생성 CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

Polling (subsampling) 가중치를 곱하거나 바이어스를 더하는 것이 없음 입력 맵에서 읽은 데이터를 재 가공함 보통 풀링 크기와 스트라이드 크기가 같음

CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

Fully Connected Layer (FC Layer) Contains neurons that connect to the entire input volume, as in ordinary Neural Networks CS231n: Convolutional Neural Networks for Visual Recognition

평균 풀링 𝑜= 𝑖−𝑓 𝑠 +1 = 5−3 1 +1=3 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나 𝑜= 𝑖−𝑓 𝑠 +1 = 5−3 1 +1=3 y 햇: h, f 등으로 쓰기도 함. 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

맥스 풀링 𝑜= 𝑖−𝑓 𝑠 +1 = 5−3 1 +1=3 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나 𝑜= 𝑖−𝑓 𝑠 +1 = 5−3 1 +1=3 y 햇: h, f 등으로 쓰기도 함. 출처: 해커에게 전해들은 머신러닝, 한빛미디어 리얼타임 세미나

MNIST 예시

구현할 CNN 구조 코드 위치: 일반코드: https://github.com/leejaymin/TensorFlowLecture/blob/master/5.CNN/CNNfor MNIST.ipynb TensorBoard 적용: https://github.com/leejaymin/TensorFlowLecture/blob/master/5.CNN/CNNfor MNIST-tensorboard.ipynb FC(2048,625) FC(625,10) Input(28x28) C(28x28) POOL(14x14) C(14x14) POOL(7x7) C(7x7) POOL(4x4) .... 9 3x3x1 .. 3x3x1 3x3x1 Activation Map 32 Activation Map 32 Activation Map 64 Activation Map 64 Activation Map 128 Activation Map 128

Parameter updates

구현 내용 기본적인 MLP 구현 5-layer MLP with ReLU Pre-training and Dropout One hidden layer Testing accuracy: ~92% 5-layer MLP with ReLU ReLU, Sigmoid, Softmax Testing accuracy: ~97% Pre-training and Dropout Xavier init. and new regularization Testing accuracy: ~99% Convolutional Neural Network Testing accuracy: 100%

Break down the CNN in more detail Input: 28x28x1, memory: 784, weights: 0 Conv-32: 28x28x32, memory: 25k , weights: 3*3*1*32 = 288 Pool2- 14x14x32 , memory: 6k , weights: 0 Conv-64: 14x14x64 , memory: 12k , weights: 3*3*1*64 = 576 Pool2: 7x7x64 , memory: 3k , weights: 0 Conv-128: 7x7x128 , memory: 6k , weights: 3*3*1*128 = 1,152 Pool2: 4x4x128 , memory: 2k , weights: 0 FC: 1x1x2048 , memory: 2k , weights: 4*4*128*2048 = 4,194,304 FC: 1x1x10 , memory: 10 , weights: 2048*10 = 20,480 Total memory: 56.794k * 4bytes ~=227.176k (only forward per a image) Total params: 4,216,800 (4M) parameters

구현 내용 기본적인 MLP 구현 Convolutional Neural Network One hidden layer Testing accuracy: ~92% Memory: (784 + (784*256)+10) * 4bytes = 805,992 (804k) Params: [256,28x28+1] + [10,256+1]=203,530 (203k) parameters Convolutional Neural Network Testing accuracy: 100% Total memory: 56.794k * 4bytes ~=227.176k (only forward per a image) Total params: 4,216,800 (4M) parameters Accuracy 8% 향상 vs. memory 5배 사용 + 엄청난 연산

속도 Processor FLOPS 속도 I7-4790k 34 GFLOPS 105분 31초, 1x GTX-745 AWS EC2, K520 2448 *2 GFLOPS 22분 4초, 4.78x GTX-970 3494 GFLOPS 9분 10초, 11.5x GTX-1080 8990 TFLOPS 5분 27초, 19.3x

VGGNet 16 ImageNet 2013 우승 모델

Image net challenge winner https://culurciello.github.io/tech/2016/06/04/nets.html

Table of Contents Fundamental Machine Learning Linear Regression: Gradient Descent Algorithm (optimization) Logistic Regression (Single Neuron=Perceptron): Sigmoid (Logistic function), Convexity, Cross Entropy, Decision Boundary Multiple Perceptron (Hidden Layer): Backpropagation algorithm Deep Neural Network Breakthrough Rebirth of Neural Network, renamed DNN DNN, ReLU, Pre-training, Dropout Convolutional Neural Network (CNN) Recurrent Neural Network (RNN) and LSTM

RNN vs. CNN or DNN Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

RNN: usage Image Captioning Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

RNN: usage Document classificiation Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

RNN: usage Classifying video on frame level Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

RNN: usage Machine translation Forecasting weather, stock, or price Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

Recurrent Networks offer a lot of flexibility: e.g. Sentiment Classification Sequence of words -> sentiment Vanilla Neural Networks e.g. Video classification on frame level e.g. Image Captioning Image -> sequence of words e.g. Machine Translation seq of words -> seq of words CS231n: Convolutional Neural Networks for Visual Recognition

Sequence data We don’t understand one word only We understand based on the previsou words + this word (Time series).

Recurrent Neural Network CS231n: Convolutional Neural Networks for Visual Recognition

Recurrent Neural Network CS231n: Convolutional Neural Networks for Visual Recognition

(Vanilla) Recurrent Neural Network The state consists of a single “hidden” vector h: CS231n: Convolutional Neural Networks for Visual Recognition

Recurrent Neural Networks Inputs and outputs are independent Recurrent Neural Networks Sequential inputs and outputs 𝑜 𝑜 𝑜 𝑜 𝑡−1 𝑜 𝑡 𝑜 𝑡+1 ... ... 𝑠 𝑠 𝑠 𝑠 𝑠 𝑠 𝑠 𝑠 𝑥 𝑥 𝑥 𝑥 𝑡−1 𝑥 𝑡 𝑥 𝑡+1 Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

Recurrent Neural networks (RNN) 𝒙 𝒕 : the input at time step 𝑡 𝒔 𝒕 : the hidden state at time 𝑡 𝒐 𝒕 : the output state at time 𝑡 Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

Overall procedure: RNN Initialization All zeros Random values (dependent on activation function) Xavier initialization [1]: Random values in the interval from − 1 𝑛 , 1 𝑛 , where n is the number of incoming connections from the previous layer Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

Overall procedure: RNN Initialization Forward Propagation 𝑠 𝑡 =𝑓 𝑈 𝑥 𝑡 +𝑊 𝑠 𝑡−1 Function 𝑓 usually is a nonlinearity such as tanh or ReLU 𝑜 𝑡 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉 𝑠 𝑡 Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

Overall procedure: RNN Initialization Forward Propagation Calculating the loss 𝑦: the labeled data 𝑜: the output data Cross-entropy loss: 𝐿 𝑦,𝑜 =− 1 𝑁 𝑛∈𝑁 𝑦 𝑛 log⁡( 𝑜 𝑛 ) Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

Overall procedure: RNN Initialization Forward Propagation Calculating the loss Stochastic Gradient Descent (SGD) Push the parameters into a direction that reduced the error The directions: the gradients on the loss : 𝜕𝐿 𝜕𝑈 , 𝜕𝐿 𝜕𝑉 , 𝜕𝐿 𝜕𝑊 Notice: the same function and the same set of parameters are used at every time step Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

Overall procedure: RNN Initialization Forward Propagation Calculating the loss Stochastic Gradient Descent (SGD) Backpropagation Through Time (BPTT) Long-term dependencies → vanishing/exploding gradient problem Taegyun Jeon, Electricity price forecasting with Recurrent Neural Networks

(Vanilla) Recurrent Neural Network The same function and the same set of parameters are used at every time step http://colah.github.io/posts/2015-08-Understanding-LSTMs/

(Vanilla) Recurrent Neural Network http://colah.github.io/posts/2015-08-Understanding-LSTMs/

(Vanilla) Recurrent Neural Network

(Vanilla) Recurrent Neural Network

(Vanilla) Recurrent Neural Network

The problem of long-term dependencies http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Solution Exploding Gradient Vanishing Gradient Gradient clipping ReLU Grating unit LSTM (Long Short Term Memory) GRU (Gated Recurrent Unit)

Standard RNN Simple tanh layer http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long Short-Term Memory (LSTM) http://colah.github.io/posts/2015-08-Understanding-LSTMs/

LSTM 예시

Many to one Data dim: 5 Hidden dim: 10 Time steps: 7 Output: 1 𝑜 𝑡 𝑥 1 LSTM LSTM LSTM LSTM LSTM LSTM LSTM … LSTM 10 … … … … LSTM LSTM LSTM … LSTM LSTM LSTM LSTM LSTM 5 5 5 5 𝑥 1 𝑥 2 𝑥 3 … 𝑥 7

Stock prediction Alphabet Inc. [step: 499] loss: 0.49794697761535645 RMSE: 0.024303380399942398 https://github.com/hunkim/DeepLearningZeroToAll/

감사합니다. 역사 알고리즘 약간은 깊이 있는 이해 그것을 통한 실제 구현 과 응용 시스템 Area에서의 Deep-Learning

Backpropagation Through Time (BPTT) RNN Loss function