Need help with read-paper-list?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

160 Stars 34 Forks MIT License 122 Commits 0 Opened issues


Image segmentation-Object detection-Light-weight network

Services available


Need anything else?

Contributors list

# 668,004
122 commits


semantic segmentation/object detection/light-weight network/instance segmentation


  • ImageNet Classification with Deep Convolutional Neural Networks(AlexNet)
  • Very Deep Convolutional Networks For Large-Scale Image Recognition(VGG)
  • Network In Network(NIN)
  • Going Deeper with Convolutions(GoogleNet)
  • Deep Residual Learning for Image Recognition(ResNet)
  • Densely Connected Convolutional Networks(DenseNet)
  • Squeeze-and-Excitation Networks(SENet)
  • Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks(GENet)
  • Non-local Neural Networks
  • Convolutional Neural Networks with layer reuse(LruNet)
  • GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond(GCNet)
  • Rethinking ImageNet Pre-training
  • Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

light-weight network

  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size(SqueezeNet)
  • Mobilenets: Efficient convolutional neural networks for mobile vision applications(Mobilenet V1)
  • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices(ShuffleNet V1)
  • Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation(Mobilenet V2)
  • SqueezeNext: Hardware-Aware Neural Network Design(SqueezeNext)
  • CondenseNet: An Efficient DenseNet using Learned Group Convolutions(CondenseNet)
  • Pelee: A Real-Time Object Detection System on Mobile Devices(PeleeNet)
  • ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design(ShuffleNet V2)
  • ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation(ESPNet)
  • ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions(ChannelNets)
  • ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network(ESPNetV2)
  • Interleaved Group Convolutions for Deep Neural Networks(IGCV1)
  • IGCV2: Interleaved Structured Sparse Convolutional Neural Networks(IGCV2)
  • IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks(IGCV3)
  • MnasNet: Platform-Aware Neural Architecture Search for Mobile(MnasNet)
  • FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search(FBNet)
  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks(EfficientNet)
  • DiCENet: Dimension-wise Convolutions for Efficient Networks(DiCENet)
  • Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition
  • An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

semantic segmentation

  • Fully Convolutional Networks for Semantic Segmentation(FCN)
  • SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation(SegNet)
  • U-Net: Convolutional Networks for Biomedical Image Segmentation(UNet)
  • Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs(Deeplab v1)
  • DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,and Fully Connected CRFs(Deeplab v2)
  • Understanding Convolution for Semantic Segmentation(DUC)
  • Pyramid Scene Parsing Network(PSPNet)
  • Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network(GCN)
  • Rethinking Atrous Convolution for Semantic Image Segmentation(Deeplab v3)
  • DenseASPP for Semantic Segmentation in Street Scenes(DenseASPP
  • Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation(Deeplab v3plus
  • Context Encoding for Semantic Segmentation(EncNet)
  • Learning a Discriminative Feature Network for Semantic Segmentation(DFN)
  • Smoothed Dilated Convolutions for Improved Dense Prediction(SDC)
  • Pyramid Attention Network for Semantic Segmentation(PAN)
  • Exploring Context with Deep Structured models for Semantic Segmentation(FeatMap-Net)
  • ExFuse: Enhancing Feature Fusion for Semantic Segmentation(ExFuse)
  • Dilated Residual Networks(DRN)
  • Dual Attention Network for Scene Segmentation(DANet)
  • OCNet:Object Context Network for Scene Parsing(OCNet)
  • RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation(RefineNet)
  • Dense Relation Network: Learning Consistent And Context-Aware Prepresentation For Semantic Image Segmentation(DRN)
  • CCNet: Criss-Cross Attention for Semantic Segmentation(CCNet)
  • Unified Perceptual Parsing for Scene Understanding(UPerNet)
  • Tree-structured Kronecker Convolutional Networks for Semantic Segmentation(TKNet)
  • NeuroIoU: Learning a Surrogate Loss for Semantic Segmentation(NeuroIoU)
  • Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregation
  • GFF: Gated Fully Fusion for Semantic Segmentation(GFF
  • Learning Fully Dense Neural Networks for Image Semantic Segmentation(FDNet
  • ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation(ZigZagNet)
  • Adaptive Pyramid Context Network for Semantic Segmentation(APCNet)
  • Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation
  • ACFNet: Attentional Class Feature Network for Semantic Segmentation(ACFNet)
  • Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images
  • Dual Graph Convolutional Network for Semantic Segmentation
  • Global Aggregation then Local Distribution in Fully Convolutional Networks
  • Dynamic Multi-scale Filters for Semantic Segmentation
  • Unifying Training and Inference for Panoptic Segmentation
  • Semantic Flow for Fast and Accurate Scene Parsing
  • AlignSeg: Feature-Aligned Segmentation Networks
  • Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
  • Context Prior for Scene Segmentation
  • Learning Statistical Texture for Semantic Segmentation

fast/real-time segmentation

  • ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation(ENet)
  • ICNet for Real-Time Semantic Segmentation(ICNet)
  • BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation(BiSeNet)
  • LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation(LinkNet)
  • Rtseg: Real-Time Semantic Segmentation Comparative Study
  • Shuffleseg: Real-Time Semantic Segmentation Network(Shuffleseg)
  • ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation(ESPNet)
  • Light-Weight RefineNet for Real-Time Semantic Segmentation(Light-Weight RefineNet)
  • LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation(LinkNet)
  • D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction(D-LinkNet)
  • CGNet: A Light-weight Context Guided Network for Semantic Segmentation(CGNet)
  • Efficient ConvNet for Real-time Semantic Segmentation
  • A Comparative Study of Real-time Semantic Segmentation for Autonomous Driving
  • ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time(ContextNet)
  • ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network(ESPNetV2)
  • ShelfNet for Real-time semantic segmentation(ShelfNet)
  • ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation(ERFNet)
  • Concentrated-Comprehensive Convolutions for lightweight semantic segmentation(CCCNet)
  • DSNet for Real-Time Driving Scene Semantic Segmentation(DSNet)
  • Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation(EDANet)
  • Fast-SCNN: Fast Semantic Segmentation Network(Fast-SCNN)
  • Guided Upsampling Network for Real-Time Semantic Segmentation(GUN)
  • In Defense of Pre-trained ImageNet Architecturesfor Real-time Semantic Segmentation of Road-driving Images(SwiftNetRN)
  • Residual Pyramid Learning for Single-Shot Semantic Segmentation(RPNet)
  • DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation(DFANet)
  • DSNet: An Efficient CNN for Road Scene Segmentation(DSNet)
  • Spatial Sampling Network for Fast Scene Understanding
  • LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation
  • Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
  • Customizable Architecture Search for Semantic Segmentation
  • Semantic Flow for Fast and Accurate Scene Parsing
  • BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
  • ASNet: Aggregated Scale Transformations for Real-Time Semantic Segmentation

Deep object detection

  • Rich feature hierarchies for accurate object detection and semantic segmentation(R-CNN)
  • SSD: Single Shot MultiBox Detector(SSD)
  • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(Faster R-CNN)
  • Feature Pyramid Networks for Object Detection(FPN)
  • Is Faster R-CNN Doing Well for Pedestrian Detection?(RPN_BF)
  • Training Region-based Object Detectors with Online Hard Example Mining(OHEM)
  • Receptive Field Block Net for Accurate and Fast Object Detection(RFBNet)
  • Focal Loss for Dense Object Detection(RetinaNet)
  • Single-Shot Refinement Neural Network for Object Detection(RefinDet)
  • PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection(PVANET)
  • Multi-label learning of part detectors for heavily occluded pedestrian detection(JL-TopS)
  • Graininess-aware Deep Feature Learning for Pedestrian Detection(GDFL)
  • M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network(M2Det)
  • CFENet: An Accurate and Efficient Single-Shot Object Detector for Autonomous Driving(CFENet)
  • ScratchDet: Training Single-Shot Object Detectors from Scratch(ScratchDet)
  • Pooling Pyramid Network for Object Detection(PPN
  • ThunderNet: Towards Real-time Generic Object Detection(ThunderNet)
  • Light-Weight RetinaNet for Object Detection
  • CornerNet: Detecting Objects as Paired Keypoints(CornerNet)
  • Bottom-up Object Detection by Grouping Extreme and Center Points(ExtremeNet)
  • RepPoints: Point Set Representation for Object Detection(RepPoints)
  • FCOS: Fully Convolutional One-Stage Object Detection(FCOS)
  • Mask-Guided Attention Network for Occluded Pedestrian Detection
  • Learning Rich Features at High-Speed for Single-Shot Object Detection.
  • Dynamic Anchor Feature Selection for Single-Shot Object Detection.
  • Contextual Attention for Hand Detection in the Wild
  • Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
  • Multiple Anchor Learning for Visual Object Detection
  • NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection
  • Is Sampling Heuristics Necessary in Training Deep Object Detectors?
  • Rethinking Classification and Localization for Object Detection
  • Multiple Anchor Learning for Visual Object Detection
  • Learning from Noisy Anchors for One-stage Object Detection
  • Learning a Unified Sample Weighting Network for Object Detection∗
  • D2Det: Towards High Quality Object Detection and Instance Segmentation
  • AugFPN: Improving Multi-scale Feature Learning for Object Detection
  • Scale-Equalizing Pyramid Convolution for Object Detection

Face Detection

  • S3FD: Single Shot Scale-invariant Face Detector(SFD)
  • FaceBoxes: A CPU Real-time Face Detector with High Accuracy(FaceBoxes)
  • Detecting Face with Densely Connected Face Proposal Network(DCFPN)
  • SSH: Single Stage Headless Face Detector(SSH)
  • DSFD: Dual Shot Face Detector(DSFD)
  • Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks(MTCNN)
  • PyramidBox: A Context-assisted Single Shot Face Detector(PyramidBox)
  • SRN:Selective Refinement Network for High Performance Face Detection(SRN)
  • Single Shot Attention-Based Face Detector(AFN)
  • Improved Selective Refinement Network for Face Detection(ISRN)
  • PyramidBox++: High Performance Detector for Finding Tiny Face(PyramidBox++)
  • RetinaFace: Single-stage Dense Face Localisation in the Wild(RetinaFace)

Instance segmentation

  • Fully Convolutional Instance-aware Semantic Segmentation(FCIS)
  • Instance-aware Semantic Segmentation via Multi-task Network Cascades(MNC)
  • Mask R-CNN
  • Mask Scoring R-CNN
  • Path Aggregation Network for Instance Segmentation(PANet)
  • RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free(RetinaMask)
  • YOLACT Real-time Instance Segmentation(YOLACT)
  • Parsing R-CNN for Instance-Level Human Analysis(Parsing R-CNN)
  • BlitzNet: A Real-Time Deep Network for Scene Understanding(BlitzNet)
  • Hybrid Task Cascade for Instance Segmentation(HTC)
  • Triply Supervised Decoder Networks for Joint Detection and Segmentation(TripleNet)
  • ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation(ZigZagNet)
  • Bounding Box Embedding for Single Shot Person Instance Segmentation
  • Shape-aware Feature Extraction for Instance Segmentation
  • Real-Time Panoptic Segmentation from Dense Detections
  • EmbedMask: Embedding Coupling for One-stage Instance Segmentation
  • PolyTransform: Deep Polygon Transformer for Instance Segmentation
  • SOLO: Segmenting Objects by Locations
  • RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation
  • SSAP: Single-Shot Instance Segmentation With Affinity Pyramid
  • YOLACT++:Better Real-time Instance Segmentation
  • SAIS: Single-stage Anchor-free Instance Segmentation
  • PolarMask: Single Shot Instance Segmentation with Polar Representation
  • BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation

Mutil-task learning

  • End-to-End Multi-Task Learning with Attention.
  • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics.
  • BlitzNet: A Real-Time Deep Network for Scene Understanding.
  • Triply Supervised Decoder Networks for Joint Detection and Segmentation
  • Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving.
  • Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation.
  • GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Network
  • MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
  • MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning
  • Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems
  • MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning
  • AP-MTL: Attention Pruned Multi-task Learning Model for Real-time Instrument Detection and Segmentation in Robot-assisted Surgery

non-deep object detection

  • Robust Real-Time Face Detection(Haar+Adaboost)
  • Integral Channel Features(ICF)
  • The Fastest Pedestrian Detector in the West(FPDW)
  • Fast Feature Pyramids for Object Detection(ACF)
  • Local Decorrelation for Improved Pedestrian Detection(LDCF)
  • Convolutional Channel Features(CCF)
  • Informed Haar-like Features Improve Pedestrian Detection(InformedHaar)
  • Fast Pedestrian Detection for Mobile Devices(FastCF)
  • Pedestrian detection at 100 Frames Per Second(VeryFast)
  • To Boost or Not to Boost? On the Limits of Boosted Trees for Object Detection(ACF+/LDCF+)
  • Filtered channel features for pedestrian detection(Checkerboard)
  • Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry(NNNF)
  • Aggregate Channel Features for Multi-view Face Detection(ACFFace)
  • Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning(SpatialPooling+)
  • BAdaCost: Multi-class Boosting with Costs(BAdaCost)
  • Exploring Prior Knowledge for Pedestrian Detection(SCCPriors)
  • A Fast, Modular Scene Understanding System using Context-Aware Object Detection(SC-ACF
  • Ten Years of Pedestrian Detection,What Have We Learned?(Katamari)
  • How Far are We from Solving Pedestrian Detection?
  • What Can Help Pedestrian Detection?
  • Taking a Deeper Look at Pedestrians
  • Semantic Channels for Fast Pedestrian Detection(MRFC+Semantic)
  • Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features
  • Learning Multilayer Channel Features for Pedestrian Detection
  • Fast and Robust Object Detection Using Visual Subcategories
  • Learning to Detect Vehicles by Clustering Appearance Patterns(Subcat)
  • Looking at Pedestrians at Different Scales: A Multiresolution Approach and Evaluations(MR-ACF)
  • Multiresolution models for object detection
  • Face Detection without Bells and Whistles
  • Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework
  • An Exploration of Why and When Pedestrian Detection Fails
  • Discriminative Sub-categorization

Image Stitching

  • Automatic Panoramic Image Stitching Using Invariant Features(IJCV2007)
  • As-Projective-As-Possible Image Stitching with Moving DLT(APAP)
  • Shape-Preserving Half-Projective Warps for Image Stitching(SPHP)
  • Adaptive As-Natural-As-Possible Image Stitching(AANAP)
  • MAGSAC: marginalizing sample consensus
  • MAGSAC++, a fast, reliable and accurate robust estimator
  • An Evaluation of Feature Matchers for Fundamental Matrix Estimation
  • GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature correspondence
  • Vanishing Point Guided Natural Image Stitching
  • Warping Residual Based Image Stitching for Large Parallax

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.