기존의 자료들이 paper review만 하는 informatic slide가 아니라 풀려고 하는 문제를 어떻게 접근했고, 그 접근 방식이 어떻게 발전했는지 보여주면서 자연스럽게 자신이 고민하고 있는 문제를 어떻게 풀어갈지 Inference 할 수 있게 도와주는 slide인 거 같다.
overview
Instance Segmentation
Outline
Introduction
Network Architecture
- FCN-driven Methods (Segmentation-first)
Instancecut CVPR17
Deep watershed CVPR17
Pixelwise instance segmentation with a dynamically instantiated network.CVPR17
SGN CVPR17
- RCNN-driven Methods (Instance-first)
DeepMask NIPS15
MNC ECCV16
(InstanceFCN ECCV16)
Learning to Refine Object Segments ECCV16
(FCIS CVPR17)
Mask R-CNN
- Advanced Works
Cascade HTC
Sliding Window TensorMask
Panoptic Segmentation Panoptic FPN
Efficiency
YOLACT
Centermask
Augmentation & Regularization
InstaBoost
Mask Scoring R-CNN
Introduction
Definition
- Image Classification: Image level Classification
- Object Detection: Multi-object Localization + Classification
- Semantic Segmentation: Pixel-level Classification
- Instance Segmentation: Detection + Instance-level Classification
Some Differences from Semantic Segmentation
- Differentiates the objects individuals in the same class.
- Essential to tasks such as counting the number of objects.
Some Differences from Object Detection
- A bounding box is a very coarse object boundary, many pixels irrelevant to the detected object are also included in the bounding box.
Network Architecture
RCNN-driven Methods
MNC
Mask R-CNN
Mask Scoring R-CNN
HTC
TensorMask
Panoptic FPN
FCN-driven Methods
Deep Mask
InstanceFCN
FCIS
1.Contribution (non-technical/technical 한 측면에서의 paper의 가치 예를들어 1)coco challenge 2017 1등 방법론이다, 2)이 페이퍼 이후 ~ 의 웤들이 이 프레임워크를 사용했다 3) 처음으로 deep learning으로 ~를 풀었다 등등 )
2.Key idea (좀더 technical 한 측면에서 novelty 요약 scoring head를 덧붙여서 기존에 잘 못했던것을 풀어냈다.)
3.Detailed method (technical detail)
4.Results
Deep Mask: Learning to Segment Object Candidates_NIPS15
#FAIR, #483 cited #earliest_instance #Review: DeepMask
1. Contribution
- One of the earliest CNN approach for Instance Segmentation
- DeepMask is object proposal based instance segmentation that beats other methods by a large margin while considering a smaller number of proposals.
- The generalization capabilities for unseen categories
- The 2015 NIPS paper with more than 480 citations
2. Key Idea
- Unlike all previous approaches for generating segmentation proposals, this work does not rely on edges, superpixels, or any other form of low-level segmentation but, the first to learn to generate segmentation proposals directly from raw image data. (기존의 방법들과 다르게 raw image에서 바로 segmentation proposal을 생성 - edges, superpixels,등과 같은 low-level segmentation의 형태를 거치지 않는다.)
- DeepMask jointly predicts the class-agnostic mask including the object score
3. Technical Details
3.1. Network Architecture
The above image illustrates an overall view of our model, which we call DeepMask. The top branch is
responsible for predicting a high-quality object segmentation mask and the bottom branch predicts
the likelihood that an object is present and satisfies the following two constraints:
- the patch contains an object roughly centered in the input patch
- the object is fully contained in the patch and in a given scale range
3.2. Joint Learning
The network is trained to jointly learn the pixel-wise segmentation map fsegm(xk)at each location (i,j) and the predicted object score fscore(xk). Given an input patch xk, the model is trained to jointly infer a pixel-wise segmentation mask and n object score. The loss function is a sum of binary logistic regression losses, one for each location of the segmentation network and one for the object score, over all training triplets (xk, mk, yk):
4. Results
3.1. MS COCO (Boxes & Segmentation Masks)
The above table show Results on the MS COCO dataset for both bounding box and segmentation proposals. This report AR at the different numbers of proposals (10, 100 and 1000) and also AUC (AR averaged across all proposal counts). For segmentation proposals, we report overall AUC and also AUC at different scales (small/medium/large objects indicated by superscripts S/M/L). See the text for details.
3.2. Fast R-CNN results on PASCAL
The above figure shows the mean average precision (mAP) for Fast R-CNN with varying number of proposals. Most notably, with just 100 DeepMask proposals Fast R-CNN achieves mAP of 68.2% and outperforms the best results obtained with 2000 SelectiveSearch proposals (mAP of 66.9%). We emphasize that with 20× fewer proposals
DeepMask outperforms SelectiveSearch
MNC: Instance-aware Semantic Segmentation via Multi-task Network Cascades_CVPR16
#MSRA #He #712cited #oral #Review: MNC
1. Contribution
- Three Stages: Differentiating Instances, Estimating Masks, and Categorizing Objects.
- MNC has won the 1st place in 2015 COCO segmentation challenge
- The 2016 CVPR paper with more than 710 citations
2. Key Idea
InstanceFCN: Instance-sensitive Fully Convolutional Networks_ECCV16
#MSRA #He #228cited #Review: InstanceFCN
1. Contribution
- Fully Convolutional Network (FCN), With Instance-Sensitive Score Maps, Better than DeepMask, Competitive with MNC(Multi-task Network Cascade)
- By using Fully Convolutional Network (FCN), Instance-Sensitive Score Maps are introduced and all Fully Connected (FC) layers are removed. Competitive results of instance segment proposal on both PASCAL VOC and MS COCO are obtained.
- The 2016 ECCV with more than 220 citations
2. Key Idea
- Fully Convolutional Network (FCN), With Instance-Sensitive Score Maps, Better than DeepMask, Competitive with MNC(Multi-task Network Cascade)
- By using Fully Convolutional Network (FCN), Instance-Sensitive Score Maps are introduced and all Fully Connected (FC) layers are removed. Competitive results of instance segment proposal on both PASCAL VOC and MS COCO are obtained.
3. Technical Details
3.1. Network Architecture
- On top of the feature map, there are two fully convolutional branches, one for estimating segment instances and the other for scoring the instances.
- The idea is very similar to that of positive-sensitive score maps in R-FCN. But R-FCN uses positive-sensitive score maps for object detection while InstanceFCN uses instance-sensitive score maps for generating proposals.
3.2. Instance-Sensitive Score Maps
3.2.1. Compared with FCN
- In FCN (Top), when two persons are too close, the score map generated is difficult to make them separate.
- However, using InstanceFCN (Bottom), each score map is responsible for capturing the relative position of object instance. For example, the top-left score map is responsible for capturing the top-left part of object instance. After assembling, a separated person mask can be generated.
- Some examples of instance masks with k=3 as shown below:
3.2.2. Compared with DeepMask
- In DeepMask, FC layers are used, which makes model large.
- In InstanceFCN, there are no FC layers which makes model more compact.
4. Result - MS COCO Validation Set
1. Contribution
2. Key Idea
3. Technical Details
3.1. Network Architecture
3.2. Joint Learning
4. Results
FCIS: Fully Convolutional Instance-aware Semantic Segmentation_CVPR17
#MSRA, #CVPR2017spotlight #406cited #Review: FCIS
1. Implication
- the FIRST fully convolutional end-to-end solution for instance segmentation
- By introducing the Position-Sensitive Inside/Outside Score Maps, convolutional representation is fully shared for both detection and segmentation sub-tasks. High accuracy and efficiency are obtained.
- FCIS won the 1st place in the 2016 COCO segmentation challenge, outperform the second-place entry by 12% in accuracy relatively. It also ranked 2nd in the 2016 COCO detection leaderboard at that moment.
- Much faster than previous winner work(MNC: 1.4s/image, FCIS: 0.24s/image)
2. Position-Sensitive Inside/Outside Score Maps
- R-FCN produces Positive-Sensitive Score Maps for object detection while InstanceFCN produces Instance-Sensitive Score Maps for generating segment proposals. And it is easier to understand the Position-Sensitive Inside/Outside Score Maps if you have understood R-FCN & InstanceFCN.
- FCIS, where position-sensitive inside/outside score maps are used to perform object segmentation and detection jointly and simultaneously
- Each score map is responsible for predicting the relative position of the object instance. Each score map is responsible for capturing relative position of object instance. For example: the top-left score map is responsible for capturing top-left part of object instance. After assembling, a separated person mask can be generated.
- Different from R-FCN & InstanceFCN, there are two sets of score maps.
- To assemble a ROI inside map, the top-left, top-center, top-right, … and bottom-right parts are captured at each of the positive-sensitive inside score map. Similar for positive-sensitive outside score map.
- Finally, two score maps are generated. One is ROI inside map. One is ROI outside map.
- Based on these two maps, there are two pathways, one is for instance mask, pixel-wise softmax is used for the segmentation loss. One is for category likelihood, detection score is obtained by average pooling over all pixels’ likelihood. Thus, convolutional representation is fully shared for both detection and segmentation sub-tasks.
- Some examples:
3. Network Architecture
- During training, ROI is positive if IoU with the nearest ground-truth is larger than 0.5. There are 3 loss terms: A softmax detection loss over C+1 categories, a softmax segmentation loss of ground-truth category only, and a bbox regression loss. The latter two are only effective on positive ROIs.
4. Result
- FAIRCNN: Actually it is the team name of MultiPathNet, 2nd place in 2015.
- MNC+++: MNC submission results which won the 1st place in 2015.
- G-RMI: 2nd place in 2016, by Google Research and Machine Intelligence team. (The approach is not the one won in object detection challenge.)
- FCIS baseline: It’s already better than MultiPathNet and MNC.
- +Multi-scale testing: using pyramid of testing images, where the shorter sides are of {480, 576, 688, 864, 1200, 1400} pixels for testing.
- +horizontal flip: Flip the image horizontally and test again, then average the results.
- +multi-scale training: multi-scale training at the same scales as in multi-scale inference is applied.
- +ensemble: 6 networks are ensembled.
- Finally, FCIS with above tricks is 3.8% (11% relatively) higher than G-RMI.
Mask R-CNN_ICCV17
#FAIR, #5290cited, #BestPaper, #Taeoh's GoodSlide, #he's iccv17tutorial
1. Implication
- Mask R-CNN = Faster R-CNN with FCN on RoIs
- COCO challenges의 모든 tasks (instance segmentation, bounding-box object detection, person keypoint detection)에서 이전 모델보다 높은 성능을 보여준다.
- 정확한 spatial location 정보를 유지하면서 학습하기 위해 RoIAlign(vs RoIPool)을 제안하였다. => 성능 향상
- Mask prediction과 Class prediction을 decoupling 하고, Class-agnostic binary mask를 추론한다. => 성능 향상
2. Network Architecture
3. Result
3.1 Ablation Study: Multinomial vs Binary Masks
3.2 Ablation Study: RoIPool vs RoIAlign
3.3 Instance Segmentation Results in COCO
Mask Scoring R-CNN_CVPR19
1. Implication
- Previous methods including Mask R-CNN treat the confidence of instance classification the same as the mask quality (measured with IoU, Intersection-over-Union) although they are usually not well correlated.
- The new method uses a network to learn the quality of the predicted instance masks via regression (measured with a MaskIoU score) and then penalize the instance mask score if the classification score is high while the actual mask quality is low.
- Mask Scoring R-CNN demonstrates new SOTA results, consistently outperforming Mask R-CNN on the COCO benchmark for instance segmentation.
2. Network Architecture
3. Result
- The results show that no matter what backbone network is used, MS R-CNN can always outperform Mask R-CNN by more than one percent.
HTC: Hybrid Task Cascade for Instance Segmentation_CVPR19
#MMDet
1. Implication
- Hybrid Task Cascade (HTC) which is a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.
- The 1st in the COCO 2018 Challenge Object Detection Task
2. Network Architecture
- Hybrid Task Cascade (HTC), a new cascade architecture for instance segmentation. It interweaves box and mask branches for joint multi-stage processing and adopts a semantic segmentation branch to provide spatial context.
- This framework progressively refines mask predictions and integrates complementary features together in each stage.
3. Result: COCO test-dev
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting_ICCV19
#augmentation
1. Implication
- This paper proposes a random InstaBoost augmentation technique that pastes objects in neighboring of its original position.
- and appearance consistency heatmap guided InstaBoost: a probability map representing reasonable placement that aligns with real-world experience.
- This method is simple to implement and does not increase the computational complexity and easily integrated into the training pipeline of any instance segmentation model
2. Result
2.1 InstaBoost-Demo
2.2 Instance Segmentation result on COCO test-dev
2.3 Object Detection results on COCO test-dev
TensorMask: A Foundation for Dense Object Segmentation_ICCV19
#FAIR
1. Implication
- TensorMask establishes the first dense sliding-window instance segmentation system that achieves result near to Mask R-CNN
- TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.
- Enabled by the TensorMask framework, we develop a pyramid structure over a scale-indexed list of 4D tensors, which we call a tensor bipyramid
2. Comparison with Mask R-CNN for instance segmentation on COCO test-dev
These results demonstrate that dense sliding-window methods can close the gap to ‘detect-then-segment’ systems
CenterMask: Real-Time Anchor-Free Instance Segmentation_CVPR20
1. Implication
2. Network Architecture
3. Result
2.1 *_CenterMask *_instance segmentation and detection performance on COCO tes-dev2017
2.2 CenterMask with other backbones on COCO val2017.
Panoptic Segmentation(network, module)
PointRend: Image Segmentation as Rendering_arxiv
#FAIR
1. Implication
2. Network Architecture
3. Result
Panoptic Feature Pyramid Networks_CVPR19
#FAIR
1. Implication
2. Network Architecture
3. Result
UPSNet: A Unified Panoptic Segmentation Network_CVPR19
#Uber ATG
1. Implication
2. Network Architecture
3. Result
SOGNet: Scene Overlap Graph Network for Panoptic Segmentation_arxiv
1. Implication
2. Network Architecture
3. Result
'CV > Paper' 카테고리의 다른 글
Segmentation-Aware Convolutional Networks Using Local Attention Masks (ICCV2017) (0) | 2020.02.12 |
---|