Segmentation based on Deep learning

Segmentation based on Deep learning
이 용 근

1. A fully convolutional networks for semantic segmentation, 2014

[2014] fully convolutional networks for semantic segmentation
VGG-FCN 3

Recap - Krizhevsky’s work(2012)
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Recap - Krizhevsky’s work(2012) 1.2 million ImageNet LSVRC-2010 dataset with 1000 classes ReLU based non-linearity 5 convolutional, 3 fully connected layers 60 million parameters GPU based training 4

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Deals with the problem of deeper architectures, building upon the work of Krizhevsky(2012) and Ziegler(2013). Very “experimental” paper. 5

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 11 to 19 layers! 3 fully connected and the rest are convolutional 6

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Problems - solutions Vanishing gradient problem - partly handled by ReLUs Overfitting - augmentation, dropout Enormous training time - smart initialization of the network, GPU & ReLUs 7

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 8

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Method VOC-2007 (mean AP) VOC-2012 Caltech (mean CR) Caltech (mean CR) Zeiler & Fergus (Zeiler & Fergus, 2013) - 79.0 86.5 ± 0.5 74.2 ± 0.3 Chatfield et al. (Chatfield et al., 2014) 82.4 83.2 88.4 ± 0.6 77.6 ± 0.1 He et al. (He et al., 2014) 93.4 ± 0.5 Wei et al. (Wei et al., 2014) 81.5 81.7 VGG Net-D (16 layers) 89.3 89.0 91.8 ± 1.0 85.0 ± 0.2 VGG Net-E (19 layers) 92.3 ± 0.5 85.1 ± 0.3 VGG Net-D & Net-E 89.7 92.7 ± 0.5 86.2 ± 0.3 9

L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 10

VGG-FCN 11

VGG-FCN 12

Skip Layer : boundary detail 을 살리기 위함.
[2014] fully convolutional networks for semantic segmentation Skip Layer : boundary detail 을 살리기 위함. Pool3  Pool3 prediction, Pool4  Pool4 prediction 할 때 channel size 변경할 때 1*1 conv 사용. 이때 파라미터도 back prop. 할 때 함께 학습 13

Fully Convolutional Network (구현 Detail)
VGG 16 network 구성하고 weight 는 pre-train 된 데이터 load (GTX Titan Black 4개로 3주 훈련) 14

VGG + 4*4*34*(8*8) + 4*4*34*(8*8) + 16*16*34*(32*32)
L Fully Convolutional Network (구현 Detail) 64*64*34, stride : 32 4*4*34, stride : 2 16*16*34, stride : 8 4*4*34, stride : 2 FCN32 FCN8 Parameter VGG + 64*64*34 VGG + 4*4*34 + 4*4* *16*34 VGG + 217 VGG Computational cost VGG + 64*64*34*(8*8) VGG + 4*4*34*(8*8) + 4*4*34*(8*8) + 16*16*34*(32*32) VGG + 34*218 VGG + 34*( ) 파라미터 수 FCN8 이 훨씬 적음 계산량은 비슷 15

Fully Convolutional Network 결과 – natural image
16

Fully Convolutional Network 결과 – natural image
17

2. Semantic Image Segmentation with deep convolutional nets and fully connected CRFs, 2014
(DeepLab-v1)

Detail 한 boundary 를 추출하는 fine tuning 을 위해 CRF 사용.
[2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs Detail 한 boundary 를 추출하는 fine tuning 을 위해 CRF 사용. [2011] Efficient inference in fully connected crfs with Gaussian edge potentials 알고리즘 cost 에 CNN 결과를 integration 한 논문. 19

기존의 CRF 는 Markov chain 을 이용한 Conditional MRF 의 개념.
[2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs [2011] Efficient inference in fully connected crfs with Gaussian edge potentials 기존의 CRF 는 Markov chain 을 이용한 Conditional MRF 의 개념. Long-range CRF (Fully connected CRF, dense CRF)는 computational cost 와 memory cost, 수학적인 문제 때문에 사용하기 어려웠음. Exact inference is not feasible Using approximate mean field inference 이를 message passing 방법으로 효과적으로 계산한 논문. 20

Sigma는 Gaussian kernel size
[2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs P(x) 는 DCNN 에 의해서 나온 결과 Pott model p 는 position I 는 image intensity Sigma는 Gaussian kernel size 21

L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 22

[2014] Conditional Random Fields as Recurrent Neural Networks
25

26

27

3.1. Learning Deconvolution Network for Semantic Segmentation, 201505

[201505] Learning Deconvolution Network for Semantic Segmentation
29

여기 까지는 VGG16 과 똑같음

BN 모두 붙임 (중요!!) BN 모두 붙임 (중요!!) 여긴 max pooling Deconvolution 여긴 unpooling 아님 convolution 여긴 pooling 아님 31

Deconvolution network 구현
32

b 는 deconv 결과, c 는 b 를 unpooling 한 결과, d 는 c 를 deconv 한 결과. b 는 deconv 결과, c 는 b 를 unpooling 한 결과, d 는 c 를 deconv 한 결과. 보면 d는 c 를 그냥 conv 한거 같은데…. Deconv 라고 한 이유는?

Unpooling 의 역할 : captures example-specific structures by tracing the original locations with strong activations back to image space  it effectively reconstructs the detailed structure of an object in finer resolutions Deconvolution 의 역할 : learned filters in deconvolutional layers tend to capture class-specific shapes. Through deconvolutions, the activations closely related the target classes are amplified while noisy activations from other regions are suppressed effectively. 34

3.2. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, 201511

기존의 approach 들의 encoder 과정은 대부분 유사함. (VGG16)
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 기존의 approach 들의 encoder 과정은 대부분 유사함. (VGG16) Decoder 과정이 각 model 별로 다르고, 장,단점이 있음 FCN8 DeconvNet SegNet 36

하지만 동시에 spatial resolution 에 loss 가 발생함
[201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Max pooling 은 more translation invariance 하고 signal noise reduction 으로 classification 에 robust 한 model 을 만듦 하지만 동시에 spatial resolution 에 loss 가 발생함 이 lossy image representation 은 segmentation 성능 저하에 치명적 임 따라서 encoding 과정에서 boundary information 을 capture 하고 store 해야함. 그 방법들이 skip layer, max pooling indices store 임. 37

Decoder 에서 Max pooling indices 사용할 때 이점 Boundary delineation 이 향상됨
[201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Decoder 에서 Max pooling indices 사용할 때 이점 Boundary delineation 이 향상됨 End-to-end training 의 parameter 수가 적음. (다른 model 은 upsampling 시 weight parameter 필요) 다른 model 과 큰 수정없이 결합이 쉬움. DeconvNet SegNet 38

DeconvNet 과 SegNet의 차이점 1
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation DeconvNet 과 SegNet의 차이점 1 “Their encoder network (DeconvNet) consists of the fully Connected layers from the VGG-16 network which consists of About 90% of the parameters of their entire network. This makes Training of their network very difficult and thus require additional Aids such as the use of region proposals to enable training.” (encoder parameter 수 : 134M  14.7M)  Memory 효율 증가, inference 속도 향상, 학습도 잘됨 39

본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음.
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Memory, Accuracy, inference time,을 Application 에 따라서 적합한 Decoder variant (model) 을 선택하자 40

본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음.
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Median frequency balancing : training set 에서 나오는 class 빈도에 따라서 loss func. 에 weight 를 준다. 가장 작게 나오는 class 에 가장 큰 weight 준다. Natural frequency balancing : uniform weight 41

본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Analysis
[201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Analysis The best performance is achieved when encoder feature maps are stored in full. When memory during inference is constrained, then compressed forms of encoder feature maps (dimensionality reduction, max pooling indices) can be stored and used with an appropriate decoder (e.g. SegNet type) to improve performance. Larger decoders increase performance for a given encoder network Decoder는 반드시 learning 해야 함. (Bilinear interpolation 성능 가장 낮음) SegNet이 FCN 보다 메모리를 훨씬 적게 씀. Encoder feature maps 을 저장하기 때문 (11배 정도) 하지만 FCN 이 SegNet 보다 inference 빠름. Decoder에서 convolution 거의 없기 때문. FCN-Basic-NoAddition-NoDimReduction vs. SegNet-Basic 을 비교 해봤을 때 using a larger decoder is not enough but it is also important to capture encoder feature map information to learn better. 42

3.*. [for including contextual information] ReNet : A recurrent neural network based Alternative to convolutional networks,

L [201505] ReNet : A recurrent neural network based Alternative to convolutional networks 44

3.*. [for including contextual information] ParseNet : Looking wider to see better, 201506

[201506] ParseNet : Looking wider to see better
46

3.*. [for including contextual information] Inside-Outside Net: Detection in Context with Skip Pooling and Recurrent Neural Networks,

L [201512] Inside-Outside Net: Detection in Context with Skip Pooling and Recurrent Neural Networks 48

4. Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform,

L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 50

Coarse segmentation 결과는 DCNN 의 결과임
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform Recursive filtering 을 이용한 Domain Transform 의 2개 input 을 Coarse segmentation 결과와 Edge map 으로 받음 Coarse segmentation 결과는 DCNN 의 결과임 Edge map 은 Coarse segmentation을 얻는 intermediate results 들을 upsampling + concatenation 한 결과를 input 으로 한 conv-layer 로 prediction 한 결과임 Domain Transform 의 recursive 한 특성을 이용하여 GRU RNN 으로 대체함. 51

Bilinear interpolation
[201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 1*1 conv-layer Bilinear interpolation [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 52

Segmentation Prediction
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform Segmentation Prediction Edge Prediction 54

[2011] Domain Transform for Edge-aware Image and Video Processing
L [2011] Domain Transform for Edge-aware Image and Video Processing Sigmas : input spatial domain filter kernel size 결정 di : determines the amount of diffusion/smoothing 매우 작으면, full diffusion 매우 크면, diffusion stop Filtering equation 이 asymmetric 하므로, 이를 극복하기 위해서 1D filter 를 두번씩 함. 좌 우, 상 하 Sigmar: reference edge map filter kernel size 결정 gi 는 reference edge, 클수록 edge 일 확률 높고, di 는 크고, wi 는 거의 0  diffusion 안함. 55

The Gated recurrent unit(GRU) RNN architecture
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform The Gated recurrent unit(GRU) RNN architecture 57

5.1. Multi-scale context aggregation by dilated convolution, 201511

5.2. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, (DeepLab-v2)

6. Learning to Segment Object Candidates, 201506
“Facebook AI Research (FAIR)” [DeepMask]

7. Learning to Refine Object Segment, 201603
“Facebook AI Research (FAIR)” [SharpMask]

8. A MultiPath Network for Object Detection, 201604
“Facebook AI Research (FAIR)” [MultiPathNet]

Segmentation based on Deep learning

Similar presentations

Presentation on theme: "Segmentation based on Deep learning"— Presentation transcript:

Similar presentations

About project

지원

로그인

Auth with social network:

Segmentation based on Deep learning

Similar presentations

Presentation on theme: "Segmentation based on Deep learning"— Presentation transcript:

Similar presentations

About project

지원