Download presentation
Presentation is loading. Please wait.
1
Segmentation based on Deep learning
이 용 근
2
1. A fully convolutional networks for semantic segmentation, 2014
3
[2014] fully convolutional networks for semantic segmentation
VGG-FCN 3
4
Recap - Krizhevsky’s work(2012)
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Recap - Krizhevsky’s work(2012) 1.2 million ImageNet LSVRC-2010 dataset with 1000 classes ReLU based non-linearity 5 convolutional, 3 fully connected layers 60 million parameters GPU based training 4
5
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Deals with the problem of deeper architectures, building upon the work of Krizhevsky(2012) and Ziegler(2013). Very “experimental” paper. 5
6
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 11 to 19 layers! 3 fully connected and the rest are convolutional 6
7
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Problems - solutions Vanishing gradient problem - partly handled by ReLUs Overfitting - augmentation, dropout Enormous training time - smart initialization of the network, GPU & ReLUs 7
8
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 8
9
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition Method VOC-2007 (mean AP) VOC-2012 Caltech (mean CR) Caltech (mean CR) Zeiler & Fergus (Zeiler & Fergus, 2013) - 79.0 86.5 ± 0.5 74.2 ± 0.3 Chatfield et al. (Chatfield et al., 2014) 82.4 83.2 88.4 ± 0.6 77.6 ± 0.1 He et al. (He et al., 2014) 93.4 ± 0.5 Wei et al. (Wei et al., 2014) 81.5 81.7 VGG Net-D (16 layers) 89.3 89.0 91.8 ± 1.0 85.0 ± 0.2 VGG Net-E (19 layers) 92.3 ± 0.5 85.1 ± 0.3 VGG Net-D & Net-E 89.7 92.7 ± 0.5 86.2 ± 0.3 9
10
L [2014] Very Deep Convolutional Networks for Large-Scale Image Recognition 10
11
[2014] fully convolutional networks for semantic segmentation
VGG-FCN 11
12
[2014] fully convolutional networks for semantic segmentation
VGG-FCN 12
13
Skip Layer : boundary detail 을 살리기 위함.
[2014] fully convolutional networks for semantic segmentation Skip Layer : boundary detail 을 살리기 위함. Pool3 Pool3 prediction, Pool4 Pool4 prediction 할 때 channel size 변경할 때 1*1 conv 사용. 이때 파라미터도 back prop. 할 때 함께 학습 13
14
Fully Convolutional Network (구현 Detail)
VGG 16 network 구성하고 weight 는 pre-train 된 데이터 load (GTX Titan Black 4개로 3주 훈련) 14
15
VGG + 4*4*34*(8*8) + 4*4*34*(8*8) + 16*16*34*(32*32)
L Fully Convolutional Network (구현 Detail) 64*64*34, stride : 32 4*4*34, stride : 2 16*16*34, stride : 8 4*4*34, stride : 2 FCN32 FCN8 Parameter VGG + 64*64*34 VGG + 4*4*34 + 4*4* *16*34 VGG + 217 VGG Computational cost VGG + 64*64*34*(8*8) VGG + 4*4*34*(8*8) + 4*4*34*(8*8) + 16*16*34*(32*32) VGG + 34*218 VGG + 34*( ) 파라미터 수 FCN8 이 훨씬 적음 계산량은 비슷 15
16
Fully Convolutional Network 결과 – natural image
16
17
Fully Convolutional Network 결과 – natural image
17
18
2. Semantic Image Segmentation with deep convolutional nets and fully connected CRFs, 2014
(DeepLab-v1)
19
Detail 한 boundary 를 추출하는 fine tuning 을 위해 CRF 사용.
[2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs Detail 한 boundary 를 추출하는 fine tuning 을 위해 CRF 사용. [2011] Efficient inference in fully connected crfs with Gaussian edge potentials 알고리즘 cost 에 CNN 결과를 integration 한 논문. 19
20
기존의 CRF 는 Markov chain 을 이용한 Conditional MRF 의 개념.
[2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs [2011] Efficient inference in fully connected crfs with Gaussian edge potentials 기존의 CRF 는 Markov chain 을 이용한 Conditional MRF 의 개념. Long-range CRF (Fully connected CRF, dense CRF)는 computational cost 와 memory cost, 수학적인 문제 때문에 사용하기 어려웠음. Exact inference is not feasible Using approximate mean field inference 이를 message passing 방법으로 효과적으로 계산한 논문. 20
21
Sigma는 Gaussian kernel size
[2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs P(x) 는 DCNN 에 의해서 나온 결과 Pott model p 는 position I 는 image intensity Sigma는 Gaussian kernel size 21
22
L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 22
23
L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 23
24
L [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 24
25
[2014] Conditional Random Fields as Recurrent Neural Networks
25
26
[2014] Conditional Random Fields as Recurrent Neural Networks
26
27
[2014] Conditional Random Fields as Recurrent Neural Networks
27
28
3.1. Learning Deconvolution Network for Semantic Segmentation, 201505
29
[201505] Learning Deconvolution Network for Semantic Segmentation
29
30
[201505] Learning Deconvolution Network for Semantic Segmentation
여기 까지는 VGG16 과 똑같음
31
[2015] Learning Deconvolution Network for Semantic Segmentation
BN 모두 붙임 (중요!!) BN 모두 붙임 (중요!!) 여긴 max pooling Deconvolution 여긴 unpooling 아님 convolution 여긴 pooling 아님 31
32
Deconvolution network 구현
32
33
Deconvolution network 구현
b 는 deconv 결과, c 는 b 를 unpooling 한 결과, d 는 c 를 deconv 한 결과. b 는 deconv 결과, c 는 b 를 unpooling 한 결과, d 는 c 를 deconv 한 결과. 보면 d는 c 를 그냥 conv 한거 같은데…. Deconv 라고 한 이유는?
34
Deconvolution network 구현
Unpooling 의 역할 : captures example-specific structures by tracing the original locations with strong activations back to image space it effectively reconstructs the detailed structure of an object in finer resolutions Deconvolution 의 역할 : learned filters in deconvolutional layers tend to capture class-specific shapes. Through deconvolutions, the activations closely related the target classes are amplified while noisy activations from other regions are suppressed effectively. 34
35
3.2. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, 201511
36
기존의 approach 들의 encoder 과정은 대부분 유사함. (VGG16)
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 기존의 approach 들의 encoder 과정은 대부분 유사함. (VGG16) Decoder 과정이 각 model 별로 다르고, 장,단점이 있음 FCN8 DeconvNet SegNet 36
37
하지만 동시에 spatial resolution 에 loss 가 발생함
[201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Max pooling 은 more translation invariance 하고 signal noise reduction 으로 classification 에 robust 한 model 을 만듦 하지만 동시에 spatial resolution 에 loss 가 발생함 이 lossy image representation 은 segmentation 성능 저하에 치명적 임 따라서 encoding 과정에서 boundary information 을 capture 하고 store 해야함. 그 방법들이 skip layer, max pooling indices store 임. 37
38
Decoder 에서 Max pooling indices 사용할 때 이점 Boundary delineation 이 향상됨
[201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Decoder 에서 Max pooling indices 사용할 때 이점 Boundary delineation 이 향상됨 End-to-end training 의 parameter 수가 적음. (다른 model 은 upsampling 시 weight parameter 필요) 다른 model 과 큰 수정없이 결합이 쉬움. DeconvNet SegNet 38
39
DeconvNet 과 SegNet의 차이점 1
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation DeconvNet 과 SegNet의 차이점 1 “Their encoder network (DeconvNet) consists of the fully Connected layers from the VGG-16 network which consists of About 90% of the parameters of their entire network. This makes Training of their network very difficult and thus require additional Aids such as the use of region proposals to enable training.” (encoder parameter 수 : 134M 14.7M) Memory 효율 증가, inference 속도 향상, 학습도 잘됨 39
40
본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음.
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Memory, Accuracy, inference time,을 Application 에 따라서 적합한 Decoder variant (model) 을 선택하자 40
41
본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음.
L [201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Median frequency balancing : training set 에서 나오는 class 빈도에 따라서 loss func. 에 weight 를 준다. 가장 작게 나오는 class 에 가장 큰 weight 준다. Natural frequency balancing : uniform weight 41
42
본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Analysis
[201511] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation 본 논문의 Contribution 중 하나가 SegNet과 FCN8 의 많은 실험 비교를 하였음. Analysis The best performance is achieved when encoder feature maps are stored in full. When memory during inference is constrained, then compressed forms of encoder feature maps (dimensionality reduction, max pooling indices) can be stored and used with an appropriate decoder (e.g. SegNet type) to improve performance. Larger decoders increase performance for a given encoder network Decoder는 반드시 learning 해야 함. (Bilinear interpolation 성능 가장 낮음) SegNet이 FCN 보다 메모리를 훨씬 적게 씀. Encoder feature maps 을 저장하기 때문 (11배 정도) 하지만 FCN 이 SegNet 보다 inference 빠름. Decoder에서 convolution 거의 없기 때문. FCN-Basic-NoAddition-NoDimReduction vs. SegNet-Basic 을 비교 해봤을 때 using a larger decoder is not enough but it is also important to capture encoder feature map information to learn better. 42
43
3.*. [for including contextual information] ReNet : A recurrent neural network based Alternative to convolutional networks,
44
L [201505] ReNet : A recurrent neural network based Alternative to convolutional networks 44
45
3.*. [for including contextual information] ParseNet : Looking wider to see better, 201506
46
[201506] ParseNet : Looking wider to see better
46
47
3.*. [for including contextual information] Inside-Outside Net: Detection in Context with Skip Pooling and Recurrent Neural Networks,
48
L [201512] Inside-Outside Net: Detection in Context with Skip Pooling and Recurrent Neural Networks 48
49
4. Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform,
50
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 50
51
Coarse segmentation 결과는 DCNN 의 결과임
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform Recursive filtering 을 이용한 Domain Transform 의 2개 input 을 Coarse segmentation 결과와 Edge map 으로 받음 Coarse segmentation 결과는 DCNN 의 결과임 Edge map 은 Coarse segmentation을 얻는 intermediate results 들을 upsampling + concatenation 한 결과를 input 으로 한 conv-layer 로 prediction 한 결과임 Domain Transform 의 recursive 한 특성을 이용하여 GRU RNN 으로 대체함. 51
52
Bilinear interpolation
[201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 1*1 conv-layer Bilinear interpolation [2014] Semantic Image Segmentation with deep convolutional nets and fully connected CRFs 52
53
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 53
54
Segmentation Prediction
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform Segmentation Prediction Edge Prediction 54
55
[2011] Domain Transform for Edge-aware Image and Video Processing
L [2011] Domain Transform for Edge-aware Image and Video Processing Sigmas : input spatial domain filter kernel size 결정 di : determines the amount of diffusion/smoothing 매우 작으면, full diffusion 매우 크면, diffusion stop Filtering equation 이 asymmetric 하므로, 이를 극복하기 위해서 1D filter 를 두번씩 함. 좌 우, 상 하 Sigmar: reference edge map filter kernel size 결정 gi 는 reference edge, 클수록 edge 일 확률 높고, di 는 크고, wi 는 거의 0 diffusion 안함. 55
56
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform 56
57
The Gated recurrent unit(GRU) RNN architecture
L [201511] Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform The Gated recurrent unit(GRU) RNN architecture 57
58
5.1. Multi-scale context aggregation by dilated convolution, 201511
59
5.2. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, (DeepLab-v2)
60
6. Learning to Segment Object Candidates, 201506
“Facebook AI Research (FAIR)” [DeepMask]
61
7. Learning to Refine Object Segment, 201603
“Facebook AI Research (FAIR)” [SharpMask]
62
8. A MultiPath Network for Object Detection, 201604
“Facebook AI Research (FAIR)” [MultiPathNet]
Similar presentations