A list of papers and other resources on computer vision and deep learning.
- A Survey on Deep Learning Techniques for Stereo-based Depth Estimation. arXiv202006
- Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review. arXiv202005
- A Gentle Introduction to Deep Learning for Graphs. arXiv201912 [Note]
- A Comprehensive Survey on Graph Neural Networks. arXiv201912 [Note]
- Research Guide: Model Distillation Techniques for Deep Learning, Derrick Mwiti, 2019.11 [Blog]
- Graph Neural Networks: A Review of Methods and Applications, arXiv2019.7 [Intro-Chinese]
- A Review on Deep Learning in Medical Image Reconstruction, arXiv2019.6
- MNIST-C: A Robustness Benchmark for Computer Vision, arXiv2019.6 [Code&Dataset]
- Going Deep in Medical Image Analysis: Concepts, Methods, Challenges and Future Directions, arXiv2019.2
- Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art. arXiv201704 [Resourses]
- [2019TIV] A Survey of Autonomous Driving: Common Practices and Emerging Technologies
- [2014JMLR] Do we need hundreds of classifiers to solve real world classification problems
SegLoss: A collection of loss functions for medical image segmentation
Efficient-Segmentation-Networks
三维语义分割概述及总结 [Page]
Unpooling/unsampling deconvolution [Note]
Some basic points: align_corners
Review
- A Survey on Instance Segmentation: State of the art, arXiv202007
- Unsupervised Domain Adaptation in Semantic Segmentation: a Review, arXiv202005
- Image Segmentation Using Deep Learning: A Survey. arXiv202001
- Recent progress in semantic image segmentation, Artificial Intelligence Review, 2019
- Review of Deep Learning Algorithms for Image Semantic Segmentation, 2018 [Blog]
arXiv
- Divided We Stand: A Novel Residual Group Attention Mechanism for Medical Image Segmentation, arXiv2019.12
- Hard Pixels Mining: Learning Using Privileged Information for Semantic Segmentation, arXiv2019.11
- Hierarchical Attention Networks for Medical Image Segmentation, arXiv2019.11 [eye line seg]
- Multi-scale guided attention for medical image segmentation, arXiv2019.10 [Code]
- Adaptive Class Weight based Dual Focal Loss for Improved Semantic Segmentation, arXiv2019.10
- ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation, arXiv2019.6
- ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation, arXiv2019.6 [Code]
- FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation, arXiv2019.3 [Proj] [Code] [Note] [JPU: Joint Pyramid Upsampling]
- ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, arXiv2016.6 [Code]
Journal/Proceedings
-
[2019IJCV] AdapNet++: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation [Code]
-
[2019NIPS] Zero-Shot Semantic Segmentation [Code]
-
[2019NIPS] Grid Saliency for Context Explanations of Semantic Segmentation [github]
-
[2019NIPS] Region Mutual Information Loss for Semantic Segmentation
-
[2019NIPS] Improving Semantic Segmentation via Dilated Affinity
-
[2019NIPS] Correlation Maximized Structural Similarity Lossfor Semantic Segmentation
-
[2019NIPS] Multi-source Domain Adaptation for Semantic Segmentation
-
[2019ICCV] Boundary-Aware Feature Propagation for Scene Segmentation
-
[2019ICCV] [Adaptive-sampling] Efficient Segmentation: Learning Downsampling Near Semantic Boundaries [github] (Reference: LIP: Local Importance-based Pooling, ICCV2019 [github] [Notes])
-
[2019ICCV] Selectivity or Invariance: Boundary-aware Salient Object Detection [Proj&Code]
-
[2019ICCV] Recurrent U-Net for Resource-Constrained Segmentation
-
[2019ICCV] Gated-SCNN: Gated Shape CNNs for Semantic Segmentation [Code] [Proj]
-
[2019ICCV] Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery
-
[2019ICCV] ACE: Adapting to Changing Environments for Semantic Segmentation
-
[2019ICCV] Asymmetric Non-local Neural Networks for Semantic Segmentation
-
[2019ICCV] DADA: Depth-Aware Domain Adaptation in Semantic Segmentation
-
[2019ICCV] ACFNet: Attentional Class Feature Network for Semantic Segmentation
-
[2019ICCV] [EMANet] Expectation-Maximization Attention Networks for Semantic Segmentation [github]
-
[2019ICCV] CCNet : Criss-Cross Attention for Semantic Segmentation [github]
-
[2019ICCV] Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
-
[2019CVPR] ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network [Code]
-
[2019CVPR] Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection
-
[2019CVPR] Beyond Gradient Descent for Regularized Segmentation Losses [Code]
-
[2019CVPR] Co-occurrent Features in Semantic Segmentation
-
[2019CVPR] Context-aware Spatio-recurrent Curvilinear Structure Segmentation [line structure seg]
-
[2019CVPR] Dual attention network for scene segmentation
-
[2019CVPR] Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation.
-
[2019AAAI] Learning Fully Dense Neural Networks for Image Semantic Segmentation
-
[2019MICCAI] ET-Net: A Generic Edge-Attention Guidance Network for Medical Image Segmentation [Code]
-
[2019MICCAI] Attention Guided Network for Retinal Image Segmentation [Code]
-
[2019MICCAIW] CU-Net: Cascaded U-Net with Loss Weighted Sampling for Brain Tumor Segmentation
-
[2018CVPR] [EncNet] Context Encoding for Semantic Segmentation (oral) [Code-Pytorch] [Slides]
-
[2018CVPR] Learning a Discriminative Feature Network for Semantic Segmentation
-
[2018CVPR] DenseASPP for Semantic Segmentation in Street Scenes [Code]
-
[2018CVPR] Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation
-
[2018ECCV] ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
-
[2018ECCV] ICNet for Real-Time Semantic Segmentation on High-Resolution Images [Proj] [Code]
-
[2018ECCV] PSANet: Point-wise Spatial Attention Network for Scene Parsing
-
[2018ECCV] Bisenet: Bilateral segmentation network for real-time semantic segmentation [Code]
-
[2018ECCV] [DeepLabv3+] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [Code]
-
[2018BMVC] Pyramid Attention Network for Semantic Segmentation
-
[2018DLMIA] UNet++: A Nested U-Net Architecture for Medical Image Segmentation [Code]
-
[2018MIDL] Attention U-Net: Learning Where to Look for the Pancreas
-
[2017arXiv] [DeepLabv3] Rethinking Atrous Convolution for Semantic Image Segmentation
-
[2017PAMI] [DeepLabv2] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
-
[2017PAMI] SegNet: A deep convolutional encoder-decoder architecture for image segmentation
-
[2017CVPR] [GCN] Large Kernel Matters-Improve Semantic Segmentation by Global Convolutional Network [Code] [Note]
-
[2017CVPR] [PSPNet] Pyramid Scene Parsing Network
-
[2017CVPR] RefineNet: Multi-path refinement networks for high-resolution semantic segmentation
-
[2017CVPR] [FCIS] Fully convolutional instance-aware semantic segmentation
-
[2017CVPR] [FRRN] Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes [Code]
-
[2017CVPRW] The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation [Code]
-
[2017ICRA] AdapNet: Adaptive semantic segmentation in adverse environmental conditions [Code]
-
[2016ICLR] Multi-Scale Context Aggregation by Dilated Convolutions
-
[2016ICLR] ParseNet: Looking Wider to See Better
-
[2016CVPR] Instance-aware semantic segmentation via multi-task network cascades
-
[2016CVPR] Attention to Scale: Scale-Aware Semantic Image Segmentation
-
[2016ECCV] What's the Point: Semantic Segmentation with Point Supervision
-
[2016ECCV] Instance-sensitive fully convolutional networks
-
[2016DLMIA] [UNet+ResNet] The Importance of Skip Connections in Biomedical Image Segmentation
-
[2015ICLR] [DeepLabv1] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
-
[2015ICCV] Conditional random fields as recurrent neural networks
-
[2015ICCV] [DeconvNet] Learning Deconvolution Network for Semantic Segmentation
-
[2015MICCAI] U-Net: Convolutional networks for biomedical image segmentation [Note]
-
[2015CVPR/2017PAMI] [FCN] Fully convolutional networks for semantic segmentation
PanopticSeg
-
-
Real-Time Panoptic Segmentation from Dense Detections, arXiv2019.12
-
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation, arXiv2019.12
-
PanDA: Panoptic Data Augmentation, arXiv2019.11
-
Learning Instance Occlusion for Panoptic Segmentation, arXiv2019.11
-
Panoptic Edge Detection, arXiv2019.6
-
[2020ICRA] DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation through SwaftNet for Surrounding Sensing [Code]
-
[2020AAAI] SOGNet: Scene Overlap Graph Network for Panoptic Segmentation
-
[2019CVPR] Panoptic Segmentation
-
[2019CVPR] Attention-guided Unified Network for Panoptic Segmentation
-
[2019CVPR] Panoptic Feature Pyramid Networks (oral) [unofficial code] [detectron2]
-
[2019CVPR] UPSNet: A Unified Panoptic Segmentation Network [Code]
-
[2019CVPR] [OANet] An End-to-end Network for Panoptic Segmentation
-
[2019CVPR] DeeperLab: Single-Shot Image Parser (oral) [project] [code]
-
[2019CVPR] Interactive Full Image Segmentation by Considering All Regions Jointly
-
[2019CVPR] Seamless Scene Segmentation [code]
awesome image-based 3D reconstruction
[Blog] 基于单目视觉的三维重建算法综述
[Bolg] 三维视觉、SLAM方向全球顶尖实验室汇总
-
BigSFM: Reconstructing the World from Internet Photos, summary of Noah Snavely works [Proj&Code] (Bundler, 1DSfM, sfm-dismbig, DISCO, LocalSymmetry, dataset ...)
-
A Survey on Deep Leaning Architectures for Image-based Depth Reconstruction, arXiv2019.6
-
[2019PAMI] Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era
-
[2017Robot] Keyframe-based monocular SLAM: design, survey, and future directions, Robotics and Autonomous Systems
- [2017CVPR] Geometric loss functions for camera pose regression with deep learning [Proj-with PoseNet+Modelling]
- [2016ICRA] Modelling Uncertainty in Deep Learning for Camera Relocalization
- [2015ICCV] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
- [2016ECCV] [LineSfM] Robust and Accurate Line- and/or Point-Based Pose Estimation without Manhattan Assumptions [Code]
- [2020PAMI] SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis [Code]
- [2020ICLR] Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving, arXiv2019.8 [Code]
- [2019NIPS] [SC-SfMLearner] Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video [Proj] [Code]
- [2019ICCV] How do neural networks see depth in single images? [Note]
- [2019ICCV] DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch [Code]
- [2019CVPR] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving [Code]
- [2019CVPR] DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image
- [2019CVPR] [R-MVSNet] Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference [Code]
- [2019ToG] 3D Ken Burns Effect from a Single Image [Homepage] [Code]
- [2019IROS] SuMa++: Efficient LiDAR-based Semantic SLAM [Code]
- [2019ICCVW] Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency
- [2019WACV] SfMLearner++: Learning Monocular Depth & Ego-Motion using Meaningful Geometric Constraints
- [2018CVPR] Automatic 3D Indoor Scene Modeling From Single Panorama
- [2018CVPR] LEGO: Learning Edge with Geometry all at Once by Watching Videos (spotlight) [Code]
- [2018CVPR] [vid2depth] Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints [Proj&Code]
- [2018CVPR] GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [Code]
- [2018CVPR] DeepMVS: Learning Multi-View Stereopsis [Proj] [Code]
- [2018ECCV] MVSNet: Depth Inference for Unstructured Multi-view Stereo
- [2017ICCV] SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis [Code]
- [2017CVPR] [SfMLearner] Unsupervised Learning of Depth and Ego-Motion from Video, Oral [Proj] [TF] [Pytorch] [ClassProj]
- [2017CVPR] SGM-Nets: Semi-Global Matching With Neural Networks
- [2016JMLR] [MC-CNN] Stereo matching by training a convolutional neural network to compare image patches [Code]
- [ICCV15/IJCV17] Global, Dense Multiscale Reconstruction for a Billion Points [Proj] [Code]
- [2014ECCV] Let there be color! Large-scale texturing of 3D reconstructions [Code]
-
[2017WACV] Pano2CAD: Room Layout From A Single Panorama Image
-
[2014ECCV] PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding, Oral [Homepage&Code] [PanoBasic]
-
Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping, arXiv2019.12 [Code]
-
Rotation Invariant Point Cloud Classification: Where Local Geometry Meets Global Topology, arXiv2019.11
-
SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving, arXiv2019.9 [Code]
-
Going Deeper with Point Networks, arXiv2019.7 [Code]
-
[2020GRSM] A Review of Point Cloud Semantic Segmentation
-
[2019NIPS] [PVCNN] Point-Voxel CNN for Efficient 3D Deep Learning (Spotlight) [Proj] [Code]
-
[2019IROS] RangeNet++: Fast and Accurate LiDAR Semantic Segmentation [Code]
-
[2019ICCV] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
-
[2019ICCV] Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation
-
[2019ICCV] Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion (oral)
-
[2019CVPR] ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis
-
[2018NIPS] PointCNN: Convolution On X-Transformed Points [Code]
-
[2018ECCV] Efficient Semantic Scene Completion Network with Spatial Group Convolution [Code]
-
[2017NIPS] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space [Code]
-
[2017CVPR] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [Code]
Tutorial&Reviews
- ICCV2019 Tutorial: Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision, Michael S. Brown [Homepage] [Slides]
- CVPR2016 Tutorial: Understanding the In-Camera Image Processing Pipeline for Computer Vision, Michael S. Brown [Slides]
- NIPS2011 Tutorial: Modeling the Digital Camera Pipeline: From RAW to sRGB and Back, Michael S Brown [Slides]
RAW
- [2018IJCV] RAW Image Reconstruction Using a Self-contained sRGB–JPEG Image with Small Memory Overhead [Michael S. Brown]
- [2016CVPR] RAW Image Reconstruction using a Self-Contained sRGB-JPEG Image with only 64 KB Overhead
- [2014CVPR] Raw-to-raw: Mapping between image sensor color responses
Super-Resolution
-
[Blog] [深入浅出深度学习超分辨率](https://mp.weixin.qq.com/s/o-I6T8f4AcETJqlDNZs9ug
-
A Deep Journey into Super-resolution: A survey, arXiv2019.9
-
[2020PAMI] Deep Learning for Image Super-resolution: A Survey
-
[2019IJAC] Deep Learning Based Single Image Super-resolution: A Survey
-
Densely Residual Laplacian Super-resolution, arXiv2019.7 [Code]
-
Lightweight Image Super-Resolution with Adaptive Weighted Learning Network, arXiv2019.4 [Code]
-
[2019SIGG] Handheld Multi-Frame Super-Resolution
-
[2019CVPR] Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels
-
[2019CVPR] Zoom To Learn, Learn To Zoom [ProjPage] [Code]
-
[2019CVPR] Towards Real Scene Super-Resolution with Raw Images [Code]
-
[2019CVPR] 3D Appearance Super-Resolution with Deep Learning [Code]
-
[2019CVPR] Learning Parallax Attention for Stereo Image Super-Resolution [Code]
-
[2019CVPR] Meta-SR: A Magnification-Arbitrary Network for Super-Resolution [github]
-
[2019CVPRW] Hierarchical Back Projection Network for Image Super-Resolution [Code]
-
[2019ICCVW] Edge-Informed Single Image Super-Resolution [Code]
-
[2017CVPRW] Enhanced Deep Residual Networks for Single Image Super-Resolution [Code]
-
[2016PAMI] [SRCNN] Image Super-Resolution Using Deep Convolutional Networks
-
[2016NIPS] Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections
-
[2016CVPR] [ESPCN] Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
-
[2016CVPR] [VDSR] Accurate Image Super-Resolution Using Very Deep Convolutional Networks
-
[2016ECCV] [FSRCNN] Accelerating the Super-Resolution Convolutional Neural Network
-
[2014ECCV] [SRCNN] Learning a Deep Convolutional Network for Image Super-Resolution
Enhancement
- Diving Deeper into Underwater Image Enhancement: A Survey, arXiv2019.7
- [2018CVPR] Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs, [Homepage] [Code]
- [2018CVPR] Classification-Driven Dynamic Image Enhancement
- [2017CVPR] Forget Luminance Conversion and Do Something Better
- [2016CVPR] Two Illuminant Estimation and User Correction Preference
- Low-light Enhancement Repo [github]
- 基于深度学习的低光照图像增强方法总结(2017-2019) [Note]
- Learning to see, Antonio Torralba, 2016 [Slides]
- Attention-guided Low-light Image Enhancement, arXiv2019.8
- Low-light Image Enhancement Algorithm Based on Retinex and Generative Adversarial Network, arXiv2019.6
- LED2Net: Deep Illumination-aware Dehazing with Low-light and Detail Enhancement, arXiv2019.6
- EnlightenGAN: Deep Light Enhancement without Paired Supervision, arXiv2019.6 [Code]
- Kindling the Darkness: A Practical Low-light Image Enhancer, arXiv2019.5
- MSR-net: Low-light Image Enhancement Using Deep Convolutional Network, arXiv2017.11
- [2019TOG] Handheld Mobile Photography in Very Low Light
- [2019ICCV] Learning to See Moving Objects in the Dark
- [2019CVPR] Underexposed Photo Enhancement using Deep Illumination Estimation [Code]
- [2019CVPR] All-Weather Deep Outdoor Lighting Estimation
- [2019MMM] Progressive Retinex: Mutually Reinforced Illumination-Noise Perception Network for Low Light Image Enhancement
- [2018TIP] Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images
- [2018TMM] Naturalness preserved nonuniform illumination estimation for image enhancement based on retinex
- [2018PRL] LightenNet: A Convolutional Neural Network for weakly illuminated image enhancement
- [2018CVPR] Learning to See in the Dark
- [2018BMVC] MBLLEN: Low-light Image/Video Enhancement Using CNNs
- [2018BMVC] Deep Retinex Decomposition for Low-Light Enhancement
- [2018BMVC] Deep Retinex Decomposition for Low-Light Enhancement (Oral) [Proj] [Code]
- [2017TIP] LIME: Low-light image enhancement via illumination map estimation
- [2017PR] LLNet: A deep autoencoder approach to natural low-light image enhancement [Code] [Code2]
- [2017CVPR] Deep Outdoor Illumination Estimation
- [2016TIP] LIME: Low-light Image Enhancement via Illumination Map Estimation
- [2016ECCV] Deep Specialized Network for Illuminant Estimation
Reflection Removal
- [2019CVPR] Single Image Reflection Removal Beyond Linearity
- [2019CVPR] Reflection Removal Using A Dual-Pixel Sensor
- [2013ICCV] Exploiting Reflection Change for Automatic Reflection Removal
Denoising
- Deep Learning on Image Denoising: An overview, arXiv2020.1 [Proj]
- [2020NN] Attention-guided CNN for image denoising [Code]
- [2019CVPR] Toward Convolutional Blind Denoising of Real Photographs
Deblurring
- [SelfDeblur] Neural Blind Deconvolution Using Deep Priors, arXiv2019. 8 [Code]
- [2019ICCV] DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better [Code]
- [2018CVPR] DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks [Code]
- [2018CVPR] Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
Deraining
- Single Image Deraining Rain Removal
- [2019CVPR] Single Image Deraining: A Comprehensive Benchmark Analysis
- [2018ECCV] Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining [Code]
Completion
- Image inpainting: A review, arXiv2019.9
- Consistent Generative Query Networks, arXiv2019.4 [Proj]
- [2019Scirobotics] Emergence of exploratory look-around behaviors through active observation completion [Proj]
- [2019ICCV] An Internal Learning Approach to Video Inpainting [Homepage] [Code] [Note]
- [2019ICCV] StructureFlow: Image Inpainting via Structure-aware Appearance Flow [Code]
- [2018Science] [GQN] Neural scene representation and rendering, DeepMind [Code] [Note]
- [2018CVPR] Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks [Code]
- [2018CVPR] Deep Image Prior [github] [Note]
- [2018Proj] Painting outside the box: image outpainting with GANs, Mark Sabini, Stanford CS230 Project, arXiv2018.8 [Code] [PDF] [Model] [Note]
Image/Video Transfer
-
Style Transfer Scholar: Dongdong Chen Dmitry Ulyanov
-
[2018TOG] Progressive Color Transfer with Dense Semantic Correspondences ⭐️⭐️⭐️⭐️
-
[2017CVPR] Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis
-
[2016ICML] Texture Networks: Feed-forward Synthesis of Textures and Stylized Images [IN] [Code] [Slides]
-
[2016CVPR] Image Style Transfer Using Convolutional Neural Networks, Gatys [Code]
-
[2016ECCV] Perceptual Losses for Real-Time Style Transfer and Super-Resolution
-
[2015] A neural algorithm of artistic style, Gatys, arXiv2015.9 [Code]
Blending/Fusion
- Deep Image Blending, arXiv201910 [Code]
- [2019MMM] GP-GAN: Towards Realistic High-Resolution Image Blending [Code] [Homepage]
- [2018ECCV] Learning to Blend Photos [Homepage]
- [2018SIGGA] Deep Blending for Free-Viewpoint Image-Based Rendering [Homepage]
PedestrainDetection
-
-
Deep Learning for Person Re-identification: A Survey and Outlook, arXiv2020.1 [Code]
-
Pedestrain Attribute Recognition: A Survey, arXiv2019.1 [Proj]
-
CrowdHuman: A Benchmark for Detecting Human in a Crowd, arXiv201804 [Proj] [Note]
-
PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes, arXiv2019.9
-
[2020TMM/2019CVPRW] Bag of Tricks and A Strong Baseline for Deep Person Re-identification [Code]
-
[2019ICCV] Mask-Guided Attention Network for Occluded Pedestrian Detection [Code]
-
[2019CVPR] VRSTC: Occlusion-Free Video Person Re-Identification [occlusion]
-
[2018CVPR] Repulsion Loss: Detecting Pedestrians in a Crowd, CVPR2018 [occlusion]
-
[2016ECCV] Stacked Hourglass Networks for Human Pose Estimation
CrowdCounting
-
-
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detection, arXiv2019.6 [Code]
-
W-Net: Reinforced U-Net for Density Map Estimation, arXiv2019.3 [Unofficial Code]
-
[2019TIP] HA-CCN: Hierarchical Attention-based Crowd Counting Network
-
[2019ICCV] Bayesian Loss for Crowd Count Estimation with Point Supervision [Code]
-
[2019ICCV] Crowd Counting with Deep Structured Scale Integration Network (oral) [github]
-
[2019ICCV] Learning Spatial Awareness to Improve Crowd Counting (oral)
-
[2019ICCV] Perspective-Guided Convolution Networks for Crowd Counting [Code] [Dataset]
-
[2019ICCV] Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting
-
[2019ICCV] Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method
-
[2019ICCV] Counting with Focus for Free [Code]
-
[2019ICCVW] Crowd Counting on Images with Scale Variation and Isolated Clusters
-
[2019CVPR] Learning from Synthetic Data for Crowd Counting in the Wild [Homepage] [Dataset]
-
[2019MMM] Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting
-
[2019ICME] Locality-constrained Spatial Transformer Network for Video Crowd Counting
-
[2019SciAdvance] Number detectors spontaneously emerge ina deep neural network designed for visual object recognition [Note]
-
[2019TII] Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks [Code]
-
[2018MICCAIW] Microscopy Cell Counting with Fully Convolutional Regression Networks [Code]
-
[2010NIPS] Learning to count objects in images [Code]
GAN学习路线图:论文、应用、课程、书籍大总结 [Page]
深度学习中最常见GAN模型概览: GAN,DCGAN,CGAN,infoGAN,ACGAN,CycleGAN,StackGAN ...
Training Tricks
-
How to Train a GAN? Tips and tricks to make GANs work [Page]
Start from NIPS2016, 17 GAN tricks, by Soumith Chintala, Emily Denton, Martin Arjovsky, Michael Mathieu. How to Train a GAN, NeurIPS2016
-
Top highlight Advances in Generative Adversarial Networks (GANs): A summary of the latest advances in Generative Adversarial Networks [Page] [Note]
-
Keep Calm and train a GAN. Pitfalls and Tips on training Generative Adversarial Networks [Page]
-
Image Augmentations for GAN Training. arXiv202006
Papers
-
[Blogg] A Beginner's Guide to Generative Adversarial Networks (GANs), 2019
-
Generative Adversarial Networks: A Survey and Taxonomy, arXiv2020.2 [GANReview]
-
A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications, arXiv202001
-
[2019ACMCS] How Generative Adversarial Networks and Their Variants Work: An Overview
-
-
StarGAN v2: Diverse Image Synthesis for Multiple Domains. arXiv201912 [Code]
-
This dataset does not exist: training models from generated images, arXiv2019.11
-
Landmark Assisted CycleGAN for Cartoon Face Generation. arXiv201907
-
Maximum Entropy Generators for Energy-Based Models, arXiv2019.5 [Code]
-
[2019NIPS] Few-shot Video-to-Video Synthesis [Code]
-
[2019NIPS] [vid2vid] Video-to-Video Synthesis [Code]
-
[2019CVPR] Semantic Image Synthesis with Spatially-Adaptive Normalization [Proj] [Code]
-
[2019CVPR] [seg2vid] Video Generation from Single Semantic Label Map [Code]
-
[2019BMVC] The Art of Food: Meal Image Synthesis from Ingredients
-
[2018ICLR] Spectral Normalization for Generative Adversarial Networks [Code] [Supp1] [Supp2]
-
[2018CVPR] [pix2pixHD] High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
-
[2018CVPR] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (oral) [Code]
-
[2018ECCV] [FE-GAN] Fashion Editing with Multi-scale Attention Normalization [Notes]
-
[2018ECCV] Image Inpainting for Irregular Holes Using Partial Convolutions [Code] [Code2] [used for DeepNude]
-
[2017ICCV] [CycleGAN] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [Proj]
-
[2017CVPR] [Pix2Pix] Image-to-Image Translation with Conditional Adversarial Networks [Demo]
-
[2016ICLR] [DCGAN] Unsupervised representation learning with deep convolutional generative adversarial networks
-
[2016ICML] A Theory of Generative ConvNet [S-C Zhu] [Proj/Code]
-
[2014NIPS] Generative Adversarial Nets
-
[YOWO] You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization arXiv201911 [Code]
-
[2019CVPR] Learning Video Representations from Correspondence Proposals
现有的视频深度学习架构通常依赖于三维卷积、自相关、非局部模块等运算,这些运算难以捕捉视频中帧间的长程运动/相关性,该文提出的CPNet学习视频中图片之间的长程对应关系,来解决现有方法在处理视频长程运动中的局限性.
Video Object Detection
- Object Detection in Video with Spatial-temporal Context Aggregation, arXiv2019.7
- Looking Fast and Slow: Memory-Guided Mobile Video Object Detection, arXiv2019.3 [TF] [PyTorch]
- [2019ICCV] [MGAN] Motion Guided Attention for Video Salient Object Detection
- [2019CVPR] Shifting More Attention to Video Salient Objection Detection [Code]
- [2019CVPR] Activity Driven Weakly Supervised Object Detection [Code]
- [2019SysML] AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling
- [2019KDDW] Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings" [Notes]](https://flashgene.com/archives/28803.html)
- [2018CVPR] Mobile Video Object Detection With Temporally-Aware Feature Maps
Video Object segmentation
-
[2016CVPR] A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
-
[2019ICCV] RANet: Ranking Attention Network for Fast Video Object Segmentation
-
[2019CVPR] See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks [Code]
-
[2019CVPR] Improving Semantic Segmentation via Video Propagation and Label Relaxation [Code]
-
Optimization for deep learning: theory and algorithms. arXiv201912 [[OptimizationCourse]](Optimization Theory for Deep Learning)
-
Why Adam Beats SGD for Attention Models. arXiv201912
-
Momentum Contrast for Unsupervised Visual Representation Learning, Kaiming He arXiv2019.11
-
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources, Amazon, arXiv2019.5
-
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, arXiv2018.4 [Notes]
-
[2019NIPS] Uniform convergence may be unable to explain generalization in deep learning
-
[2019NIPS] Understanding the Role of Momentum in Stochastic Gradient Methods
-
[2019NIPS] Lookahead optimizer: k steps forward, 1 step back [Code] [Pytorch] [TF]
-
[2019ICLR] [AdaBound] Adaptive gradient methods with dynamic bound of learning rate [Pytorch] [TF-example]
AdaBound combines SGD and Adam to make it fast as Adam at training start and convergence like SGD later. Usage: require Python 3.6+, and pip install: pip install adabound, and then: optimizer = adabound.AdaBound(model.parameters(), lr=1e-3, final_lr=0.1). Version of TensorFlow is coming.
-
[2019CVPRW] The Indirect Convolution Algorithm
-
[2019ISCAW] Accelerated CNN Training Through Gradient Approximation
Fast training for neural networks, You Yang, Jiangmen Talk [Video]
- Student Specialization in Deep ReLU Networks With Finite Width and Input Dimension, arXiv2019.11
- Accelerating CNN Training by Sparsifying Activation Gradients, arXiv2019.8
- Luck Matters: Understanding Training Dynamics of Deep ReLU Networks, arXiv2019.6 [Code]
- Bag of Freebies for Training Object Detection Neural Networks, Amazon, arXiv2019.4 [[Code]](https://github.com/dmlc/gluon-cv\)
- Deep Double Descent: Where Bigger Models and More Data Hurt, ICLR2020Review
- [2019ICCV] Rethinking ImageNet Pre-training, FAIR [Notes]
- [2019CVPR] Bag of Tricks for Image Classification with Convolutional Neural Networks, Amazon [Code] [Note]
- [2019CVPR] Accelerating Convolutional Neural Networks via Activation Map Compression
- [2019CVPR] RePr: Improved Training of Convolutional Filters [Note]
- [2019BMVC] Dynamic Neural Network Channel Execution for Efficient Training
- [2018ICPP] Imagenet training in minutes
Activation
-
[Blog] 深度学习中的激活函数
Dead Relu [Notes]
-
[2019CVPR] Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem (oral) [Code]
-
[2018] [GELU] Gaussian Error Linear Units (GELUs). arXiv201811 [Note]
# GELU in GPT-2: def gelu(x): return 0.5*x*(1+tf.tanh(np.sqrt(2/np.pi)*(x+0.044715*tf.pow(x, 3))))
-
[2016ICML] [CReLU] Understanding and improving convolutional neural networks via concatenated rectified linear units
-
[2015ICCV] [PReLU-Net/msra Initilization] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Normalization
-
Normalization Scholar: Ping Luo
[Blog] Introduction to Normalization [Page] [Note]
[Blog] Introduction to BN/LN/IN/GN [Page] [Page2]
[Talk] Devils in BatchNorm, Jiangmen Talk, 2019 [Page]
[Blog] An Overview of Normalization Methods in Deep Learning, 2018.11 [Page]
-
Attentive Normalization. [Tianfu Wu] arXiv2019.11 [Code]
-
Network Deconvolution. [a alternative to Batch Normalization]. arXiv2019.9 [Proj]
-
Weight Standardization. arXiv2019.3 [Code]
-
[IN] Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv2017.11 [Code]
-
[LN] Layer Normalization. [Hinton] arXiv2016.7 [Note]
-
[2019NIPS] Understanding and Improving Layer Normalization
-
[2018NIPS] How Does Batch Normalization Help Optimization? [arXiv19v] [Ref]
-
[2018NIPS] [BIN] Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks [Code]
-
[2018ECCV] [GN] Group normalization
-
[2017NIPS] Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
-
[2016NIPS] [WN] Weight normalization: A simple reparameterization to accelerate training of deep neural networks
-
[2015ICML] [BN] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Dropout
-
-
[2014JMLR] Dropout: a simple way to prevent neural networks from overfitting
-
[2012NIPS] ImageNet Classification with Deep Convolutional Neural Networks
Augmentation
-
[Blog] Research Guide: Data Augmentation for Deep Learning. 201910
[Blog] Data Augmentation: How to use Deep Learning when you have Limited Data. 201805 [Page]
-
[2019JBD] A survey on Image Data Augmentation for Deep Learning. [PDF] [Notes]
-
Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data. arXiv2019.11
-
FMix: Enhancing Mixed Sample Data Augmentation arXiv202006 [Code]
-
GridMask Data Augmentation. arXiv202001 [Code] [Note]
-
Let’s Get Dirty: GAN Based Data Augmentation for Soiling and Adverse Weather Classification in Autonomous Driving. arXiv2019.12
-
PanDA: Panoptic Data Augmentation, arXiv2019.11
-
Faster AutoAugment: Learning augmentation strategies using backpropagation. arXiv201911
-
Automatic Data Augmentation by Learning the Deterministic Policy. arXiv201910
-
Greedy AutoAugment, arXiv2019.8
-
Safe Augmentation: Learning Task-Specific Transformations from Data, arXiv2019.7 [Code]
-
Learning Data Augmentation Strategies for Object Detection. arXiv201906 [Code]
-
[2020ICLR] AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty [Code]
-
[2019NIPS] Implicit Semantic Data Augmentation for Deep Networks
-
[2019NIPS] Fast AutoAugment
-
[2019ICML] Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules [Code] [Examples]
-
[2019ICCV] CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features [Code]
-
[2019ICCVW] Occlusions for Effective Data Augmentation in Image Classification
-
[2019ICCVW] Style Augmentation: Data Augmentation via Style Randomization
-
[2019CVPR] AutoAugment: Learning Augmentation Policies from Data [Code]
-
[2018ICLR] Mixup: Beyond empirical risk minimization
-
[2018ACML] RICAP: Random Image Cropping and Patching Data Augmentation for Deep CNNs [Code]
-
[2018ICANN] Further advantages of data augmentation on convolutional neural networks (best paper)
[Blog] 从Softmax到AMSoftmax
[Blog] Convolutional Neural Networks Structure
[Blog] A Survey of the Recent Architectures of Deep Convolutional Neural Networks, 2019
[Blog] CNN下/上采样详析
- A closer look at network resolution for efficient network design. arXiv201909 [Code]
- [2019NIPS] Is Deeper Better only when Shallow is Good? [Code]
- [2015Nature] Deep Learning Review
- [2014BMVC] Return of the Devil in the Details: Delving Deep into Convolutional Nets
Module
-
Pooling:
ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection, arXiv201906
Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation, arXiv201901
[2020AAAI] Revisiting Bilinear Pooling: A coding Perspective [Note]
[2019ICCV] LIP: Local Importance-based Pooling [Code] [Notes]
[2018ECCV] Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification
[2017CVPR] Low-rank bilinear pooling for fine-grained classification
[2016EMNLP] Multimodal compact bilinear pooling for visual question answering and visual grounding
[2016CVPR] Compact bilinear pooling
[2015ICCV] [bilinear pooling] Bilinear CNN Models for Fine-grained Visual Recognition
[2012ECCV] Semantic segmentation with second-order pooling
-
Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference arXiv201912
-
Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator, arXiv201911
-
Rethinking the Number of Channels for the Convolutional Neural Network, arXiv201909
-
AutoGrow: Automatic Layer Growing in Deep Convolutional Networks, arXiv201909 [Code]
-
Mapped Convolutions. [For 2D/3D/Spherical]. arXiv201906 [Code]
-
Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks. arXiv201905 [Code] [Note]
-
[2019ICCV] ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [Code]
-
[2019CVPRW] Convolutions on Spherical Images
-
[2017ICML] Warped Convolutions: Efficient Invariance to Spatial Transformations
-
Attention module
-
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks, arXiv201910 [Code] [Chinese]
-
[2020ICLR] On the Relationship between Self-Attention and Convolutional Layers [Proj] [Code] [Intro]
-
[2019TIP] Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition [Code]
-
[2017CVPR] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Backbone
- Comb Convolution for Efficient Convolutional Architecture. arXiv201911
- [2019ICML] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Code]
- [2019ICCVW] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [Code]
- [2018CSVT] [RoR] Residual Networks of Residual Networks: Multilevel Residual Networks
- [2018CVPR] [SENet] Squeeze-and-excitation networks
- [2017ICLR] FractalNet: Ultra-Deep Neural Networks without Residuals
- [2017CVPR] [PyramidNet] Deep Pyramidal Residual Networks
- [2017CVPR] [DenseNet] Densely Connected Convolutional Networks
- [2017CVPR] [ResNeXt] Aggregated Residual Transformations for Deep Neural Networks
- [2017CVPR] Xception: Deep Learning with Depthwise Separable Convolutions
- [2017CVPR] PolyNet: A Pursuit of Structural Diversity in Very Deep Networks [Slides]
- [2017AAAI] Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- [2016CVPR] [ResNet] Deep Residual Learning for Image Recognition [Note1] [Note2]
- [2016CVPR] [Inception-v3] Rethinking the Inception Architecture for Computer Vision
- [2016ECCV] Good Practices for Deep Feature Fusion
- [2016ECCV] Deep Networks with Stochastic Depth
- [2016ECCV] [Identity ResNet] Identity Mappings in Deep Residual Networks [Over 1000 Layers ]
- [2016ICLRW] ResNet in ResNet: Generalizing Residual Architectures
- [2015NIPS] [STN] Spatial Transformer Networks
- [2015ICML] [BN-Inception /Inception-v2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- [2015CVPR] [GoogLeNet/Inception-v1] Going Deeper with Convolutions
- [2015ICLR] [VGGNet] Very Deep Convolutional Networks for Large-Scale Image Recognition
- [2014ICLR] [NIN] Network in Network
- [2014ECCV] [ZFNet] Visualizing and Understanding Convolutional Networks
- [2014MMM] [CaffeNet] Caffe: Convolutional Architecture for Fast Feature Embedding
- [2012NIPS] [AlexNet] Imagenet classification with deep convolutional neural networks
- [1998ProcIEEE] [LeNet] Gradient-Based Learning Applied to Document Recognition [LeNet Notes]
Light-weightCNN
-
[Blog] Introduction of light-weight CNN
[Blog] Lightweight convolutional neural network: SqueezeNet、MobileNet、ShuffleNet、Xception
-
SeesawNet: Convolution Neural Network With Uneven Group Convolution. arXiv201912 [Code]
-
HGC: Hierarchical Group Convolution for Highly Efficient Neural Network, arXiv201906
-
[2020CVPR] GhostNet: More Features from Cheap Operations [Code]
-
[2019CVPR] ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network [Code]
[2018ECCV] ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
-
[2019CVPRW] Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
-
[2019BMVC] MixNet: Mixed Depthwise Convolutional Kernels [Code] [Notes]
-
[2018NIPS] ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions [Code]
-
[2018NIPS] Learning Versatile Filters for Efficient Convolutional Neural Networks [Code]
-
[2018BMVC] IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks [Code] [Pytorch]
[2018CVPR] IGCV2: Interleaved Structured Sparse Convolutional Neural Networks
[2017ICCV] [IGVC1] Interleaved Group Convolutions for Deep Neural Networks
-
MobileNet Series:
[Blog] Introduction for MobileNet and Its Variants
[2019ICCV] Searching for MobileNetV3. [Note]
[2018CVPR] MobileNetV2: Inverted Residuals and Linear Bottlenecks. [Note]
[2017] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv201704
-
ShuffleNet Series [Note]
[Code] ShuffleNet Series by Megvii: ShuffleNetV1, V2/V2+/V2.Large/V2.ExLarge, OneShot, DetNAS
[2018ECCV] ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
[2018CVPR] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
[Blog] 深度神经网络可解释性方法汇总(附TF代码实现)
- Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey. arXiv2019.11
- [2019NIPS] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent [Code] [Note]
- [2019NIPS] Weight Agnostic Neural Networks (spotlight). [Proj] [Note]
- [2018AAAI] Interpreting CNN Knowledge via An Explanatory Graph
- [2018Acces] Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
-
RetinaFace: Single-stage Dense Face Localisation in the Wild. arXiv201905 [Code-MXNet] [Code-TF]
-
[2019CVPR] Group Sampling for Scale Invariant Face Detection [Note]
-
[2019ICCV] Learning to Paint with Model-based Deep Reinforcement Learning [Code] [Note]
-
[2019ICCV] Fashion++: Minimal Edits for Outfit Improvement (FAIR) [Proj] [Code]
-
[2019ICCV] SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition [Code&Dataset]
-
[2018BMVC] Learning Geo-Temporal Image Features [Proj]
AI+Music
- [2018ISMIR] MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer [Code]
- Music continue: MuseNet Bach-AI-Music-Google Generating Piano Music with Transformer by Google
- On the Measure of Intelligence. arXiv201911 [Intro]
Unsupervised Learning
- A Simple Framework for Contrastive Learning of Visual Representations. arXive202002
Pose
AI+Application
- MetNet: A Neural Weather Model for Precipitation Forecasting, arXiv202003 [Blog] Intro