Instance segmentation survey

기존의 자료들이 paper review만 하는 informatic slide가 아니라 풀려고 하는 문제를 어떻게 접근했고, 그 접근 방식이 어떻게 발전했는지 보여주면서 자연스럽게 자신이 고민하고 있는 문제를 어떻게 풀어갈지 Inference 할 수 있게 도와주는 slide인 거 같다.
overview

Instance Segmentation

Outline

Introduction

Network Architecture

- FCN-driven Methods (Segmentation-first)

Instancecut CVPR17
Deep watershed CVPR17
Pixelwise instance segmentation with a dynamically instantiated network.CVPR17
SGN CVPR17

- RCNN-driven Methods (Instance-first)

DeepMask NIPS15
MNC ECCV16
(InstanceFCN ECCV16)
Learning to Refine Object Segments ECCV16

(FCIS CVPR17)
Mask R-CNN

- Advanced Works

Cascade HTC
Sliding Window TensorMask
Panoptic Segmentation Panoptic FPN

Efficiency

YOLACT
Centermask

Augmentation & Regularization

InstaBoost
Mask Scoring R-CNN

Introduction

Definition

Image Classification: Image level Classification
Object Detection: Multi-object Localization + Classification
Semantic Segmentation: Pixel-level Classification
Instance Segmentation: Detection + Instance-level Classification

Some Differences from Semantic Segmentation

Differentiates the objects individuals in the same class.
Essential to tasks such as counting the number of objects.

Some Differences from Object Detection

A bounding box is a very coarse object boundary, many pixels irrelevant to the detected object are also included in the bounding box.

Network Architecture

RCNN-driven Methods

MNC
Mask R-CNN
Mask Scoring R-CNN
HTC
TensorMask
Panoptic FPN

FCN-driven Methods

Deep Mask
InstanceFCN
FCIS

1.Contribution (non-technical/technical 한 측면에서의 paper의 가치 예를들어 1)coco challenge 2017 1등 방법론이다, 2)이 페이퍼 이후 ~ 의 웤들이 이 프레임워크를 사용했다 3) 처음으로 deep learning으로 ~를 풀었다 등등 )
2.Key idea (좀더 technical 한 측면에서 novelty 요약 scoring head를 덧붙여서 기존에 잘 못했던것을 풀어냈다.)
3.Detailed method (technical detail)
4.Results

Deep Mask: Learning to Segment Object Candidates_NIPS15

#FAIR, #483 cited #earliest_instance #Review: DeepMask

1. Contribution

One of the earliest CNN approach for Instance Segmentation
DeepMask is object proposal based instance segmentation that beats other methods by a large margin while considering a smaller number of proposals.
The generalization capabilities for unseen categories
The 2015 NIPS paper with more than 480 citations

2. Key Idea

Unlike all previous approaches for generating segmentation proposals, this work does not rely on edges, superpixels, or any other form of low-level segmentation but, the first to learn to generate segmentation proposals directly from raw image data. (기존의 방법들과 다르게 raw image에서 바로 segmentation proposal을 생성 - edges, superpixels,등과 같은 low-level segmentation의 형태를 거치지 않는다.)
DeepMask jointly predicts the class-agnostic mask including the object score

3. Technical Details

3.1. Network Architecture

Model Architecture (Top), Positive Samples (Green, Left Bottom), Negative Samples (Red, Right Bottom)

The above image illustrates an overall view of our model, which we call DeepMask. The top branch is
responsible for predicting a high-quality object segmentation mask and the bottom branch predicts
the likelihood that an object is present and satisfies the following two constraints:

the patch contains an object roughly centered in the input patch
the object is fully contained in the patch and in a given scale range

3.2. Joint Learning

The network is trained to jointly learn the pixel-wise segmentation map fsegm(xk)at each location (i,j) and the predicted object score fscore(xk). Given an input patch xk, the model is trained to jointly infer a pixel-wise segmentation mask and n object score. The loss function is a sum of binary logistic regression losses, one for each location of the segmentation network and one for the object score, over all training triplets (xk, mk, yk):

4. Results

3.1. MS COCO (Boxes & Segmentation Masks)

Average Recall (AR) Detection Boxes (Left) and Segmentation Masks (Right) on MS COCO Validation Set (AR@n: the AR when n region proposals are generated. AUCx: x is the size of objects)

The above table show Results on the MS COCO dataset for both bounding box and segmentation proposals. This report AR at the different numbers of proposals (10, 100 and 1000) and also AUC (AR averaged across all proposal counts). For segmentation proposals, we report overall AUC and also AUC at different scales (small/medium/large objects indicated by superscripts S/M/L). See the text for details.

3.2. Fast R-CNN results on PASCAL

The above figure shows the mean average precision (mAP) for Fast R-CNN with varying number of proposals. Most notably, with just 100 DeepMask proposals Fast R-CNN achieves mAP of 68.2% and outperforms the best results obtained with 2000 SelectiveSearch proposals (mAP of 66.9%). We emphasize that with 20× fewer proposals
DeepMask outperforms SelectiveSearch