Segmentation based on Deep learning

Slides:



Advertisements
Similar presentations
김수연 Capstone Design Realization Cost Reduction through Deep Artificial Neural Network Analysis.
Advertisements

1 Chapter 2 Basic Physics of Semiconductors  2.1 Semiconductor materials and their properties  2.2 PN-junction diodes  2.3 Reverse Breakdown.
인공지능 소개 부산대학교 인공지능연구실. 인공 + 지능 인공지능이란 ? 2.
Digital Image Processing
Machine Learning to Deep Learning_2
Lec 4 Backpropagation & Neural Network
Neural Network - Perceptron
Dialogue System Seminar
Chapter 5. Q-LEARNING & DEEP SARSA
정 의 학습의 일반적 정의 기계학습(Machine Learning)의 정의
Application of Acoustic Sensing and Signal Processing for PD Detection in GIS 20003년 05월 10일 이 찬 영.
REINFORCEMENT LEARNING
Internet Computing KUT Youn-Hee Han
12. 데이터베이스 설계.
Sharpening Filter (High-Pass Filter)
EPS Based Motion Recognition algorithm Comparison
Multimedia Programming 06: Point Processing3
포항공과대학교 COMPUTER VISION LAB. 석박통합과정 여동훈
Word2Vec Tutorial 박 영택 숭실대학교.
Accelerometer Data Collection and Preprocessing
Computational Finance
Genetic Algorithm 신희성.
Technological Forecasting & social change(2014)
A Survey of Affect Recognition Methods :
Multimedia Programming 10: Point Processing 5
제 3 장 신경회로망 (Neural Networks)
숭실대학교 마이닝연구실 김완섭 2009년 2월 8일 아이디어  - 상관분석에 대한 연구
for Robust Facial Landmark Localization
머신 러닝 2 ㈜ 퀀트랩.
웨이브렛 프레임과 공간 정보를 이용한 질감 영상 분할 Texture Segmentation Using Wavelet Frame and Spatial Information 지도교수: 조 석 제 예 병 길 제어계측공학과.
Machine Learning to Deep Learning
Medical Instrumentation
Parallel software Lab. 박 창 규
Lab Assignment 3 Deep Learning 1 1.
Multimedia Programming 10: Unsharp Masking/ Histogram Equalization
Data Mining Final Project
Progress Seminar 선석규.
AI 전문 인력 양성 교육 교육명 : Embedded Deep Learning (CNN을 이용한 영상인식)
[15984] 딥 러닝 기술 및 응용 University of Science and Technology
4차 산업혁명 시대의 창의성과 인공지능기술(AI)의 초등교육 적용 제주서초등학교 교사 전지희.
정보 추출기술 (Data Mining Techniques ) : An Overview
좋은징조 담당교수 : 조성제 김도엽 김현일 이상훈.
Chapter 12 Memory Organization
시각(Vision) 인지(Cognition)의 중요성 컴퓨터의 시각(Vision)
Signature, Strong Typing
MR 댐퍼의 동특성을 고려한 지진하중을 받는 구조물의 반능동 신경망제어
Internet Computing KUT Youn-Hee Han
Signature, Strong Typing
IBM Corporation {haoxing, eleve, kravets,
히스토그램 그리고 이진화 This course is a basic introduction to parts of the field of computer vision. This version of the course covers topics in 'early' or 'low'
Word2Vec.
Word Embedding.
Neural Networks & Deep Learning
                              아키텍처 분석과 설계 – 아키텍처 스타일 (SI 트랙)                              
The general form of 0-1 programming problem based on DNA computing
Bug Localization Based on Code Change Histories and Bug Reports
Hongik Univ. Software Engineering Laboratory Jin Hyub Lee
Progress Seminar 선석규.
Progress Seminar 선석규.
이유한 학력 활동 학부 : 부산대학교 화공생명공학부 졸업 ( ~ )
Progress Seminar 신희안.
Progress Seminar 선석규.
Progress Seminar 선석규.
Progress Seminar 이준녕.
Progress Seminar 선석규.
Progress Seminar 선석규.
Python 라이브러리 딥러닝 강의소개 정성훈 연락처 : 이메일 :
Deep Learning Basics Junghwan Goh (Kyung Hee University)
Progress Seminar 선석규.
Presentation transcript:

Segmentation based on Deep learning 이 용 근

1. A fully convolutional networks for semantic segmentation, 2014

[2014] fully convolutional networks for semantic segmentation VGG-FCN 3

Recap - Krizhevsky’s work(2012) L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Recap - Krizhevsky’s work(2012) 1.2 million ImageNet LSVRC-2010 dataset with 1000 classes ReLU based non-linearity 5 convolutional, 3 fully connected layers 60 million parameters GPU based training 4

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Deals with the problem of deeper architectures, building upon the work of Krizhevsky(2012) and Ziegler(2013). Very “experimental” paper. 5

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 11 to 19 layers! 3 fully connected and the rest are convolutional 6

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Problems - solutions Vanishing gradient problem - partly handled by ReLUs Overfitting - augmentation, dropout Enormous training time - smart initialization of the network, GPU & ReLUs 7

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 8

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Method VOC-2007 (mean AP) VOC-2012 Caltech-101 (mean CR) Caltech-256 (mean CR) Zeiler & Fergus (Zeiler & Fergus, 2013) - 79.0 86.5 ± 0.5 74.2 ± 0.3 Chatfield et al. (Chatfield et al., 2014) 82.4 83.2 88.4 ± 0.6 77.6 ± 0.1 He et al. (He et al., 2014) 93.4 ± 0.5 Wei et al. (Wei et al., 2014) 81.5 81.7 VGG Net-D (16 layers) 89.3 89.0 91.8 ± 1.0 85.0 ± 0.2 VGG Net-E (19 layers) 92.3 ± 0.5 85.1 ± 0.3 VGG Net-D & Net-E 89.7 92.7 ± 0.5 86.2 ± 0.3 9

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 10

[2014] fully convolutional networks for semantic segmentation VGG-FCN 11

[2014] fully convolutional networks for semantic segmentation VGG-FCN 12

Skip Layer : boundary detail 을 살리기 위함. [2014] fully convolutional networks for semantic segmentation Skip Layer : boundary detail 을 살리기 위함. Pool3  Pool3 prediction, Pool4  Pool4 prediction 할 때 channel size 변경할 때 1*1 conv 사용. 이때 파라미터도 back prop. 할 때 함께 학습 13

Fully Convolutional Network (구현 Detail) VGG 16 network 구성하고 weight 는 pre-train 된 데이터 load (GTX Titan Black 4개로 3주 훈련) 14

VGG + 4*4*34*(8*8) + 4*4*34*(8*8) + 16*16*34*(32*32) L Fully Convolutional Network (구현 Detail) 64*64*34, stride : 32 4*4*34, stride : 2 16*16*34, stride : 8 4*4*34, stride : 2 FCN32 FCN8 Parameter VGG + 64*64*34 VGG + 4*4*34 + 4*4*34 + 16*16*34 VGG + 217 VGG + 29 + 29 + 213 Computational cost VGG + 64*64*34*(8*8) VGG + 4*4*34*(8*8) + 4*4*34*(8*8) + 16*16*34*(32*32) VGG + 34*218 VGG + 34*(210 + 210 + 218) 파라미터 수 FCN8 이 훨씬 적음 계산량은 비슷 15

Fully Convolutional Network 결과 – natural image 16

Fully Convolutional Network 결과 – natural image 17

2. Semantic Image Segmentation with deep convolutional nets and fully connected CRFs, 2014 (DeepLab-v1)

Detail 한 boundary 를 추출하는 fine tuning 을 위해 CRF 사용. [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs Detail 한 boundary 를 추출하는 fine tuning 을 위해 CRF 사용. [2011] Efficient inference in fully connected crfs with Gaussian edge potentials 알고리즘 cost 에 CNN 결과를 integration 한 논문. 19

기존의 CRF 는 Markov chain 을 이용한 Conditional MRF 의 개념. [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs [2011] Efficient inference in fully connected crfs with Gaussian edge potentials 기존의 CRF 는 Markov chain 을 이용한 Conditional MRF 의 개념. Long-range CRF (Fully connected CRF, dense CRF)는 computational cost 와 memory cost, 수학적인 문제 때문에 사용하기 어려웠음. Exact inference is not feasible Using approximate mean field inference 이를 message passing 방법으로 효과적으로 계산한 논문. 20

Sigma는 Gaussian kernel size [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs P(x) 는 DCNN 에 의해서 나온 결과 Pott model p 는 position I 는 image intensity Sigma는 Gaussian kernel size 21

L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 22

L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 23

L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 24

[2014] Conditional Random Fields as Recurrent Neural Networks 25

[2014] Conditional Random Fields as Recurrent Neural Networks 26

[2014] Conditional Random Fields as Recurrent Neural Networks 27

3.1. Learning Deconvolution Network for Semantic Segmentation, 201505

[201505] Learning Deconvolution Network for Semantic Segmentation 29

[201505] Learning Deconvolution Network for Semantic Segmentation 여기 까지는 VGG16 과 똑같음

[2015] Learning Deconvolution Network for Semantic Segmentation BN 모두 붙임 (중요!!) BN 모두 붙임 (중요!!) 여긴 max pooling Deconvolution 여긴 unpooling 아님 convolution 여긴 pooling 아님 31

Deconvolution network 구현 32

Deconvolution network 구현 b 는 deconv 결과, c 는 b 를 unpooling 한 결과, d 는 c 를 deconv 한 결과. b 는 deconv 결과, c 는 b 를 unpooling 한 결과, d 는 c 를 deconv 한 결과. 보면 d는 c 를 그냥 conv 한거 같은데…. Deconv 라고 한 이유는?

Deconvolution network 구현 Unpooling 의 역할 : captures example-specific structures by tracing the original locations with strong activations back to image space  it effectively reconstructs the detailed structure of an object in finer resolutions Deconvolution 의 역할 : learned filters in deconvolutional layers tend to capture class-specific shapes. Through deconvolutions, the activations closely related the target classes are amplified while noisy activations from other regions are suppressed effectively. 34

3.2. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, 201511

기존의 approach 들의 encoder 과정은 대부분 유사함. (VGG16) L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 기존의 approach 들의 encoder 과정은 대부분 유사함. (VGG16) Decoder 과정이 각 model 별로 다르고, 장,단점이 있음 FCN8 DeconvNet SegNet 36

하지만 동시에 spatial resolution 에 loss 가 발생함 [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Max pooling 은 more translation invariance 하고 signal noise reduction 으로 classification 에 robust 한 model 을 만듦 하지만 동시에 spatial resolution 에 loss 가 발생함 이 lossy image representation 은 segmentation 성능 저하에 치명적 임 따라서 encoding 과정에서 boundary information 을 capture 하고 store 해야함. 그 방법들이 skip layer, max pooling indices store 임. 37

Decoder 에서 Max pooling indices 사용할 때 이점 Boundary delineation 이 향상됨 [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Decoder 에서 Max pooling indices 사용할 때 이점 Boundary delineation 이 향상됨 End-to-end training 의 parameter 수가 적음. (다른 model 은 upsampling 시 weight parameter 필요) 다른 model 과 큰 수정없이 결합이 쉬움. DeconvNet SegNet 38

DeconvNet 과 SegNet의 차이점 1 L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation DeconvNet 과 SegNet의 차이점 1 “Their encoder network (DeconvNet) consists of the fully Connected layers from the VGG-16 network which consists of About 90% of the parameters of their entire network. This makes Training of their network very difficult and thus require additional Aids such as the use of region proposals to enable training.” (encoder parameter 수 : 134M  14.7M)  Memory 효율 증가, inference 속도 향상, 학습도 잘됨 39

본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Memory, Accuracy, inference time,을 Application 에 따라서 적합한 Decoder variant (model) 을 선택하자 40

본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Median frequency balancing : training set 에서 나오는 class 빈도에 따라서 loss func. 에 weight 를 준다. 가장 작게 나오는 class 에 가장 큰 weight 준다. Natural frequency balancing : uniform weight 41

본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Analysis [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Analysis The best performance is achieved when encoder feature maps are stored in full. When memory during inference is constrained, then compressed forms of encoder feature maps (dimensionality reduction, max pooling indices) can be stored and used with an appropriate decoder (e.g. SegNet type) to improve performance. Larger decoders increase performance for a given encoder network Decoder는 반드시 learning 해야 함. (Bilinear interpolation 성능 가장 낮음) SegNet이 FCN 보다 메모리를 훨씬 적게 씀. Encoder feature maps 을 저장하기 때문 (11배 정도) 하지만 FCN 이 SegNet 보다 inference 빠름. Decoder에서 convolution 거의 없기 때문. FCN-Basic-NoAddition-NoDimReduction vs. SegNet-Basic 을 비교 해봤을 때 using a larger decoder is not enough but it is also important to capture encoder feature map information to learn better. 42

3.*. [for including contextual information] ReNet : A recurrent neural network based Alternative to convolutional networks, 201505

L [201505] ReNet : A recurrent neural network based Alternative to convolutional networks 44

3.*. [for including contextual information] ParseNet : Looking wider to see better, 201506

[201506] ParseNet : Looking wider to see better 46

3.*. [for including contextual information] Inside-Outside Net: Detection in Context with Skip Pooling and Recurrent Neural Networks, 201512

L [201512] Inside-Outside Net: Detection in Context with Skip Pooling and Recurrent Neural Networks 48

4. Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform, 201511

L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 50

Coarse segmentation 결과는 DCNN 의 결과임 L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform Recursive filtering 을 이용한 Domain Transform 의 2개 input 을 Coarse segmentation 결과와 Edge map 으로 받음 Coarse segmentation 결과는 DCNN 의 결과임 Edge map 은 Coarse segmentation을 얻는 intermediate results 들을 upsampling + concatenation 한 결과를 input 으로 한 conv-layer 로 prediction 한 결과임 Domain Transform 의 recursive 한 특성을 이용하여 GRU RNN 으로 대체함. 51

Bilinear interpolation [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 1*1 conv-layer Bilinear interpolation [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 52

L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 53

Segmentation Prediction L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform Segmentation Prediction Edge Prediction 54

[2011] Domain Transform for Edge-aware Image and Video Processing L [2011] Domain Transform for Edge-aware Image and Video Processing Sigmas : input spatial domain filter kernel size 결정 di : determines the amount of diffusion/smoothing 매우 작으면, full diffusion 매우 크면, diffusion stop Filtering equation 이 asymmetric 하므로, 이를 극복하기 위해서 1D filter 를 두번씩 함. 좌 우, 상 하 Sigmar: reference edge map filter kernel size 결정 gi 는 reference edge, 클수록 edge 일 확률 높고, di 는 크고, wi 는 거의 0  diffusion 안함. 55

L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 56

The Gated recurrent unit(GRU) RNN architecture L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform The Gated recurrent unit(GRU) RNN architecture 57

5.1. Multi-scale context aggregation by dilated convolution, 201511

5.2. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, 201606 (DeepLab-v2)

6. Learning to Segment Object Candidates, 201506 “Facebook AI Research (FAIR)” [DeepMask]

7. Learning to Refine Object Segment, 201603 “Facebook AI Research (FAIR)” [SharpMask]

8. A MultiPath Network for Object Detection, 201604 “Facebook AI Research (FAIR)” [MultiPathNet]