This repository contains my paper reading notes on deep learning and machine learning. It is inspired by Denny Britz and Daniel Takeshi.
New year resolution for 2020: read at least three paper a week and a high a high quality github repo a month!
If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into this list of papers. I did so (see my notes) and it served me well.
Here is a list of trustworthy sources of papers in case I ran out of papers to read.
- MMAction2 [268 stars]
- Kalman and Bayesian Fitlers [8.7k stars] ipynb book
- simple-faster-rcnn-pytorch (2.1k stars) [Notes]
- YOLACT/YOLACT++ [2.1k stars]
- Yolov3 ultralytic [4.7k stars]
- MonoLoco [131 stars]
- A Baseline for 3D Multi-Object Tracking [548 stars]
- ROLO: recurrent YOLO
- point rend
- Carla data export
- openpilot
- 3D Lane Dataset
- MicroGrad
- OpenVSLAM (2.3k stars)
- ORB SLAM2 and Docker version
- PySLAM v2
I regularly update my blog in Toward Data Science.
- Monocular 3D Lane Line Detection in Autonomous Driving (related paper notes)
- Deep-Learning based Object detection in Crowded Scenes (related paper notes)
- Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving
- Deep Learning in Mapping for Autonomous Driving
- Monocular Dynamic Object SLAM in Autonomous Driving
- Monocular 3D Object Detection in Autonomous Driving — A Review
- Self-supervised Keypoint Learning — A Review
- Single Stage Instance Segmentation — A Review
- Self-paced Multitask Learning — A Review
- Convolutional Neural Networks with Heterogeneous Metadata
- Lifting 2D object detection to 3D in autonomous driving
- Multimodal Regression
- Paper Reading in 2019
- Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles [Notes] ICRA 2011 [traffic light, Sebastian Thrun]
- Towards lifelong feature-based mapping in semi-static environments [Notes] ICRA 2016
- How to Keep HD Maps for Automated Driving Up To Date [Notes] ICRA 2020 [BMW]
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection [Notes] [focal loss]
- Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning [Notes] CVPR 2018 workshop
- Centroid Voting: Object-Aware Centroid Voting for Monocular 3D Object Detection [Notes] IROS 2020 [mono3D, geometry + appearance = distance]
- Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras [Notes] [GM Israel, mono3D]
- DeepPS: Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset TIP 2018 [Parking slot detection, PS2.0 dataset]
- PSDet: Efficient and Universal Parking Slot Detection [Notes] IV 2020 [Zongmu, Parking slot detection]
- PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [Notes] ASPLOS 2020 [pruning]
- DeFCN: End-to-End Object Detection with Fully Convolutional Network [Notes] [Transformer, DETR]
- OneNet: End-to-End One-Stage Object Detection by Classification Cost [Transformer, DETR]
- Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection [Notes] [NMS]
- SplitNet: Divide and Co-training
- Scaled-YOLOv4: Scaling Cross Stage Partial Network [Notes] [yolo]
- Yolov5 by Ultralytics [Notes] [yolo, spatial2channel]
- PP-YOLO: An Effective and Efficient Implementation of Object Detector [Notes] [yolo, paddle-paddle, baidu]
- VoVNet: An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection CVPR 2019 workshop
- Isometric Neural Networks: Non-discriminative data or weak model? On the relative importance of data and model resolution ICCV 2019 workshop [spatial2channel]
- TResNet WACV 2021 [spatial2channel]
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression AAAI 2020 [DIOU, NMS]
- RegNet: Designing Network Design Spaces CVPR 2020 [FAIR]
- On Network Design Spaces for Visual Recognition [FAIR]
- Lane Endpoint Detection and Position Accuracy Evaluation for Sensor Fusion-Based Vehicle Localization on Highways Sensors 2018 [lane endpoints]
- Map-Matching-Based Cascade Landmark Detection and Vehicle Localization IEEE Access 2019 [lane endpoints]
- GCNet: End-to-End Learning of Geometry and Context for Deep Stereo Regression ICCV 2017 [disparity estimation, Alex Kendall, cost volume]
- Traffic Control Gesture Recognition for Autonomous Vehicles IROS 2020 [Daimler]
- Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild ECCV 2020
- OrcVIO: Object residual constrained Visual-Inertial Odometry [dynamic SLAM, very mathematical]
- InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling ECCV 2020
- DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving ECCV 2020
- Towards Lightweight Lane Detection by Optimizing Spatial Embedding ECCV 2020 workshop [LLD]
- Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection ECCV 2020 workshop [lidar]
- DeepIM: Deep iterative matching for 6d pose estimation ECCV 2018 [pose estimation]
- Monocular Depth Prediction through Continuous 3D Loss IROS 2020
- Multi-Task Learning for Dense Prediction Tasks: A Survey [MTL, Luc Van Gool]
- Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems ITSC 2020 oral [MTL]
- NeurAll: Towards a Unified Model for Visual Perception in Automated Driving ITSC 2019 oral [MTL]
- Locating Objects Without Bounding Boxes CVPR 2019
- Deep Evidential Regression NeurIPS 2020 [one-pass aleatoric/epistemic uncertainty]
- Estimating Drivable Collision-Free Space from Monocular Video WACV 2015 [Drivable space]
- Visualization of Convolutional Neural Networks for Monocular Depth Estimation ICCV 2019 [monodepth]
- Differentiable Rendering: A Survey [differentiable rendering, TRI]
- SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware Feature Extraction [monodepth, semantics, Naver labs]
- Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework WACV 2020
- Towards Good Practice for CNN-Based Monocular Depth Estimation WACV 2020
- Self-Supervised Scene De-occlusion CVPR 2020 oral
- TP-LSD: Tri-Points Based Line Segment Detector
- Data Distillation: Towards Omni-Supervised Learning CVPR 2018 [Kaiming He, FAIR]
- MiDas: Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer [monodepth, dynamic object, synthetic dataset]
- Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation [monodepth]
- Towards Lightweight Lane Detection by Optimizing Spatial Embedding ECCV 2020 workshop
- Synthetic-to-Real Domain Adaptation for Lane Detection [GM Israel, LLD]
- PolyLaneNet: Lane Estimation via Deep Polynomial Regression ICPR 2020 [polynomial, LLD]
- 3DSSD: Point-based 3D Single Stage Object Detector CVPR 2020
- Learning Universal Shape Dictionary for Realtime Instance Segmentation
- End-to-End Video Instance Segmentation with Transformers [DETR, transformers]
- Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks CVPR 2020 workshop
- When and Why Test-Time Augmentation Works
- Footprints and Free Space from a Single Color Image CVPR 2020 oral [Parking use, footprint]
- PointPainting: Sequential Fusion for 3D Object Detection [Notes] [nuscenece]
- MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps CVPR 2020 [Unseen moving objects, BEV]
- Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning [BEV, only predict footprint]
- Rethinking Classification and Localization for Object Detection CVPR 2020
- Monocular 3D Object Detection with Sequential Feature Association and Depth Hint Augmentation [mono3D]
- BoxInst: High-Performance Instance Segmentation with Box Annotations [Chunhua Shen, Tian Zhi]
- ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- TSP: Rethinking Transformer-based Set Prediction for Object Detection [Notes] [DETR, transformers, Kris Kitani]
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals [Notes] [DETR, Transformer]
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [transformers]
- Unsupervised Monocular Depth Learning in Dynamic Scenes [Notes] CoRL 2020 [LearnK improved ver, Google]
- MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time [Notes] ICML 2020 [Mono3D, pairwise relationship]
- Argoverse: 3D Tracking and Forecasting with Rich Maps [Notes] CVPR 2019 [HD maps, dataset, CV lidar]
- The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes [Notes] ICRA 2019
- Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection CVPRW 2020 [dataset, Daimler, mono3D]
- NYC3DCars: A Dataset of 3D Vehicles in Geographic Context ICCV 2013
- Towards Fully Autonomous Driving: Systems and Algorithms IV 2011
- Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding [Notes] [mono3D, LID+DepJoint]
- ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection AAAI 2020 oral [mono3D]
- CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection [Notes] WACV 2021 [early fusion, camera, radar]
- 3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation [Notes] NeurIPS 2020 workshop [GM Israel, 3D LLD]
- LSTR: End-to-end Lane Shape Prediction with Transformers [Notes] WACV 2011 [LLD, transformers]
- PIXOR: Real-time 3D Object Detection from Point Clouds [Notes] CVPR 2018 (birds eye view)
- HDNET/PIXOR++: Exploiting HD Maps for 3D Object Detection [Notes] CoRL 2018
- CPNDet: Corner Proposal Network for Anchor-free, Two-stage Object Detection ECCV 2020 [anchor free, two stage]
- MVF: End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds [Notes] CoRL 2019 [Waymo, VoxelNet 1st author]
- Pillar-based Object Detection for Autonomous Driving [Notes] ECCV 2020
- Training-Time-Friendly Network for Real-Time Object Detection AAAI 2020 [anchor-free, fast training]
- Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies [Review of autonomous stack, Yu Huang]
- Dense Monocular Depth Estimation in Complex Dynamic Scenes CVPR 2016
- Probabilistic Future Prediction for Video Scene Understanding
- AB3D: A Baseline for 3D Multi-Object Tracking IROS 2020 [3D MOT]
- Spatial-Temporal Relation Networks for Multi-Object Tracking ICCV 2019 [MOT, feature location over time]
- Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking ICRA 2018 [MOT, IIT, 3D shape]
- ST-3D: Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking CVPR 2020 [Peilinag LI, author of VINS and S3DOT]
- Augment Your Batch: Improving Generalization Through Instance Repetition CVPR 2020
- RetinaTrack: Online Single Stage Joint Detection and Tracking CVPR 2020 [MOT]
- Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots
- Gradient Centralization: A New Optimization Technique for Deep Neural Networks ECCV 2020 oral
- Depth Completion via Deep Basis Fitting WACV 2020
- BTS: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation [monodepth, supervised]
- The Edge of Depth: Explicit Constraints between Segmentation and Depth CVPR 2020 [monodepth, Xiaoming Liu]
- On the Continuity of Rotation Representations in Neural Networks CVPR 2019 [rotational representation]
- VDO-SLAM: A Visual Dynamic Object-aware SLAM System IJRR 2020
- Dynamic SLAM: The Need For Speed
- Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction ECCV 2020
- Traffic Light Mapping and Detection [Notes] ICRA 2011 [traffic light, Google, Chris Urmson]
- Traffic light recognition exploiting map and localization at every stage [Notes] Expert Systems 2017 [traffic light, 鲜于明镐,徐在圭,郑浩奇]
- Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars [Notes] IJCNN 2019 [traffic light, Espirito Santo Brazil]
- TSM: Temporal Shift Module for Efficient Video Understanding [Notes] ICCV 2019 [Song Han, video, object detection]
- Waymo Dataset: Scalability in Perception for Autonomous Driving: Waymo Open Dataset [Notes] CVPR 2020
- Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection [Notes] NeurIPS 2020 [classification as regression]
- A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection NeurIPS 2020 spotlight
- Rethinking the Value of Labels for Improving Class-Imbalanced Learning NeurIPS 2020
- RepLoss: Repulsion Loss: Detecting Pedestrians in a Crowd [Notes] CVPR 2018 [crowd detection, Megvii]
- Adaptive NMS: Refining Pedestrian Detection in a Crowd [Notes] CVPR 2019 oral [crowd detection, NMS]
- AggLoss: Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd [Notes] ECCV 2018 [crowd detection]
- CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions [Notes] CVPR 2020 oral [crowd detection, Megvii]
- R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing [Notes] CVPR 2020
- Double Anchor R-CNN for Human Detection in a Crowd [Notes] [head-body bundle]
- Review: AP vs MR
- SKU110K: Precise Detection in Densely Packed Scenes [Notes] CVPR 2019 [crowd detection, no occlusion]
- GossipNet: Learning non-maximum suppression CVPR 2017
- TLL: Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation ECCV 2018
- Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels GCPR 2020 [mono3D, Daniel Cremers, TUM]
- CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection [Notes] [mono3D, depth AE pretraining]
- Deformable DETR: Deformable Transformers for End-to-End Object Detection [Notes] ICLR 2021 [Jifeng Dai, DETR]
- ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Notes] ICLR 2021
- BYOL: Bootstrap your own latent: A new approach to self-supervised Learning [self-supervised]
- SDFLabel: Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors [Notes] CVPR 2020 oral [TRI, differentiable rendering]
- DensePose: Dense Human Pose Estimation In The Wild [Notes] CVPR 2018 oral [FAIR]
- NOCS: Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation CVPR 2019
- monoDR: Monocular Differentiable Rendering for Self-Supervised 3D Object Detection [Notes] ECCV 2020 [TRI, mono3D]
- Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D [Notes] ECCV 2020 [BEV-Net, Utoronto, Sanja Fidler]
- Implicit Latent Variable Model for Scene-Consistent Motion Forecasting ECCV 2020 [Uber ATG, Rachel Urtasun]
- FISHING Net: Future Inference of Semantic Heatmaps In Grids [Notes] CVPRW 2020 [BEV-Net, Mapping, Zoox]
- VPN: Cross-view Semantic Segmentation for Sensing Surroundings [Notes] RAL 2020 [Bolei Zhou, BEV-Net]
- VED: Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks [Notes] ICRA 2019 [BEV-Net]
- Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View [Notes] ITSC 2020 [BEV-Net]
- Learning to Look around Objects for Top-View Representations of Outdoor Scenes [Notes] ECCV 2018 [BEV-Net, UCSD, Manmohan Chandraker]
- A Parametric Top-View Representation of Complex Road Scenes CVPR 2019 [BEV-Net, UCSD, Manmohan Chandraker]
- FTM: Understanding Road Layout from Videos as a Whole CVPR 2020 [BEV-Net, UCSD, Manmohan Chandraker]
- KM3D-Net: Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training [Notes] [RTM3D, Peixuan Li]
- InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving [Notes] IROS 2020 [motion segmentation]
- MPV-Nets: Monocular Plan View Networks for Autonomous Driving [Notes] IROS 2019 [BEV-Net]
- Class-Balanced Loss Based on Effective Number of Samples [Notes] CVPR 2019 [Focal loss authors]
- Geometric Pretraining for Monocular Depth Estimation [Notes] ICRA 2020
- Robust Traffic Light and Arrow Detection Using Digital Map with Spatial Prior Information for Automated Driving [Notes] Sensors 2020 [traffic light, 金沢]
- Feature-metric Loss for Self-supervised Learning of Depth and Egomotion [Notes] ECCV 2020 [feature-metric, local minima, monodepth]
- Depth-VO-Feat: Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction CVPR 2018 [feature-metric, monodepth]
- MonoResMatch: Learning monocular depth estimation infusing traditional stereo knowledge [Notes] CVPR 2019 [monodepth, local minima, cheap stereo GT]
- SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance [Notes] ECCV 2020 [Moving objects]
- Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding ECCV 2018 [dynamic objects, rigid and dynamic motion]
- Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding TPAMI 2018
- CC: Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation [Notes] CVPR 2019
- ObjMotionNet: Self-supervised Object Motion and Depth Estimation from Video [Notes] CVPRW 2020 [object motion prediction, velocity prediction]
- Instance-wise Depth and Motion Learning from Monocular Videos
- Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation
- Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues
- DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency ECCV 2018
- LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments [mapping]
- Road-SLAM: Road Marking based SLAM with Lane-level Accuracy [Notes] [HD mapping]
- AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot [Notes] IROS 2020 [Huawei, HD mapping, Tong Qin, VINS author, autonomous valet parking]
- AVP-SLAM-Late-Fusion: Mapping and Localization using Semantic Road Marking with Centimeter-level Accuracy in Indoor Parking Lots [Notes] ITSC 2019
- Lane markings-based relocalization on highway ITSC 2019
- DeepRoadMapper: Extracting Road Topology from Aerial Images [Notes] ICCV 2017 [Uber ATG, NOT HD maps]
- RoadTracer: Automatic Extraction of Road Networks from Aerial Images CVPR 2018 [NOT HD maps]
- PolyMapper: Topological Map Extraction From Overhead Images [Notes] ICCV 2019 [mapping, polygon, NOT HD maps]
- HRAN: Hierarchical Recurrent Attention Networks for Structured Online Maps [Notes] CVPR 2018 [HD mapping, highway, polyline loss]
- Deep Structured Crosswalk: End-to-End Deep Structured Models for Drawing Crosswalks [Notes] ECCV 2018
- DeepBoundaryExtractor: Convolutional Recurrent Network for Road Boundary Extraction [Notes] CVPR 2019 [HD mapping, boundary, polyline loss]
- DAGMapper: Learning to Map by Discovering Lane Topology [Notes] ICCV 2019 [HD mapping, highway, forks and merges, polyline loss]
- Sparse-HD-Maps: Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization [Notes] IROS 2019 oral [Uber ATG, metadata, mapping, localization]
- Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks IEEE TGRS 2018
- Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs Sensors 2020 [Tsinghua, 3D HD maps]
- PatchNet: Rethinking Pseudo-LiDAR Representation [Notes] ECCV 2020 [SenseTime, Wanli Ouyang]
- D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection [Notes] CVPR 2020 [mono3D]
- MfS: Learning Stereo from Single Images [Notes] ECCV 2020 [mono for stereo, learn stereo matching with mono]
- BorderDet: Border Feature for Dense Object Detection ECCV 2020 oral [Megvii]
- Scale-Aware Trident Networks for Object Detection ICCV 2019 [different heads for different scales]
- Learning Depth from Monocular Videos using Direct Methods
- Vid2Depth: Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints CVPR 2018 [Google]
- Atlas: End-to-End 3D Scene Reconstruction from Posed Images ECCV 2020
- NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
- Supervising the new with the old: learning SFM from SFM [Notes] ECCV 2018
- Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera CVPR 2019 [multi-frame monodepth]
- Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [multi-frame monodepth, RNN]
- Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth [multi-frame monodepth, RNN]
- Exploiting temporal consistency for real-time video depth estimation ICCV 2019 [multi-frame monodepth, RNN, indoor]
- SfM-Net: Learning of Structure and Motion from Video [dynamic object, SfM]
- MB-Net: MergeBoxes for Real-Time 3D Vehicles Detection [Notes] IV 2018 [mono3D: Daimler]
- BS3D: Beyond Bounding Boxes: Using Bounding Shapes for Real-Time 3D Vehicle Detection from Monocular RGB Images [Notes] IV 2019 [mono3D, Daimler]
- 3D-GCK: Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometrically Constrained Keypoints in Real-Time [Notes] IV 2020 [[mono3D, Daimler]
- UR3D: Distance-Normalized Unified Representation for Monocular 3D Object Detection [Notes] ECCV 2020 [mono3D]
- DA-3Det: Monocular 3D Object Detection via Feature Domain Adaptation [Notes] ECCV 2020 [mono3D]
- RAR-Net: Reinforced Axial Refinement Network for Monocular 3D Object Detection [Notes] ECCV 2020 [mono3D]
- CenterTrack: Tracking Objects as Points [Notes] ECCV 2020 spotlight [camera based 3D MOD, MOT SOTA, CenterNet, video based object detection]
- CenterPoint: Center-based 3D Object Detection and Tracking [Notes] [lidar based 3D MOD, CenterNet]
- Tracktor: Tracking without bells and whistles [Notes] ICCV 2019 [Tracktor/Tracktor++, Laura Leal-Taixe@TUM]
- FairMOT: A Simple Baseline for Multi-Object Tracking [Notes]
- DeepMOT: A Differentiable Framework for Training Multiple Object Trackers [Notes] CVPR 2020 [trainable Hungarian, Laura Leal-Taixe@TUM]
- MPNTracker: Learning a Neural Solver for Multiple Object Tracking CVPR 2020 oral [trainable Hungarian, Laura Leal-Taixe@TUM]
- nuScenes: A multimodal dataset for autonomous driving [Notes] CVPR 2020 [dataset, point cloud, radar]
- CBGS: Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection [Notes] CVPRW 2019 [Megvii, lidar, WAD challenge winner]
- AFDet: Anchor Free One Stage 3D Object Detection and Competition solution [Notes] CVPRW 2020 [Horizon robotics, lidar, winning for Waymo challenge]
- Review of MOT and SOT [Notes]
- CrowdHuman: A Benchmark for Detecting Human in a Crowd [Notes] [megvii, pedestrian, dataset]
- WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild [Notes] TMM 2019 [dataset, pedestrian]
- Tsinghua-Daimler Cyclists: A New Benchmark for Vison-Based Cyclist Detection [Notes] IV 2016 [dataset, cyclist Detection]
- Specialized Cyclist Detection Dataset: Challenging Real-World Computer Vision Dataset for Cyclist Detection Using a Monocular RGB Camera [Notes] IV 2019 [Extention to KITTI]
- PointTrack: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation [Notes] ECCV 2020 oral [MOTS]
- PointTrack++ for Effective Online Multi-Object Tracking and Segmentation [Notes] CVPR 2020 workshop [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]
- SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth [Notes] ICCV 2019 [one-stage, instance segmentation]
- BA-Net: Dense Bundle Adjustment Networks [Notes] ICLR 2019 [Bundle adjustment, multi-frame monodepth, feature-metric]
- DeepSFM: Structure From Motion Via Deep Bundle Adjustment ECCV 2020 oral [multi-frame monodepth, indoor scene]
- Consistent Video Depth Estimation [Notes] SIGGRAPH 2020 [multi-frame monodepth, online finetune]
- DeepV2D: Video to Depth with Differentiable Structure from Motion [Notes] ICLR 2020 [multi-frame monodepth, Jia Deng]
- GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [Notes] CVPR 2018 [residual optical flow, monodepth, rigid and dynamic motion]
- GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera [Notes] ICCV 2019 [online finetune, rigid and dynamic motion]
- Depth Hints: Self-Supervised Monocular Depth Hints [Notes] ICCV 2019 [monodepth, local minima, cheap stereo GT]
- MonoUncertainty: On the uncertainty of self-supervised monocular depth estimation [Notes] CVPR 2020 [depth uncertainty]
- Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment [Notes] [Bundle adjustment, xmotors.ai, multi-frame monodepth]
- Kinematic 3D Object Detection in Monocular Video [Notes] ECCV 2020 [multi-frame mono3D, Xiaoming Liu]
- VelocityNet: Camera-based vehicle velocity estimation from monocular video [Notes] CVPR 2017 workshop [monocular velocity estimation, CVPR 2017 challenge winner]
- Vehicle Centric VelocityNet: End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera [Notes] [monocular velocity estimation, monocular distance, SOTA]
- LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain [Notes] IROS 2018 [lidar, mapping]
- PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction [Notes] ICCV 2019
- JAAD: Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior ICCV 2017
- Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs BMVC 2019
- Is the Pedestrian going to Cross? Answering by 2D Pose Estimation IV 2018
- Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation ITSC 2019 [skeleton, pedestrian, cyclist intention]
- Attentive Single-Tasking of Multiple Tasks CVPR 2019
- DETR: End-to-End Object Detection with Transformers [Notes] ECCV 2020 oral [FAIR]
- Transformer: Attention Is All You Need [Notes] NIPS 2017
- SpeedNet: Learning the Speediness in Videos [Notes] CVPR 2020 oral
- MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships [Notes] CVPR 2020 [Mono3D, pairwise relationship]
- SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation [Notes] CVPRW 2020 [Mono3D, Zongmu]
- Vehicle Re-ID for Surround-view Camera System [Notes] CVPRW 2020 [tireline, vehicle ReID, Zongmu]
- End-to-End Lane Marker Detection via Row-wise Classification [Notes] [Qualcomm Korea, LLD as cls]
- Reliable multilane detection and classification by utilizing CNN as a regression network ECCV 2018 [LLD as reg]
- SUPER: A Novel Lane Detection System [Notes]
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation ICCV 2019
- StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation BMVC 2015
- StixelNetV2: Real-time category-based and general obstacle detection for autonomous driving [Notes] ICCV 2017 [DS]
- Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [Notes] CVPR 2016 [channel-to-pixel]
- Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints [mono3D]
- Self-Mono-SF: Self-Supervised Monocular Scene Flow Estimation [Notes] CVPR 2020 oral [scene-flow, Stereo input]
- MEBOW: Monocular Estimation of Body Orientation In the Wild [Notes] CVPR 2020
- VG-NMS: Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes [Notes] NeurIPS 2019 workshop [Crowded scene, NMS, Daimler]
- WYSIWYG: What You See is What You Get: Exploiting Visibility for 3D Object Detection [Notes] CVPR 2020 oral [occupancy grid]
- Real-Time Panoptic Segmentation From Dense Detections [Notes] CVPR 2020 oral [bbox + semantic segmentation = panoptic segmentation, Toyota]
- Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving [Notes] CVPRW 2020 [efficient annotation]
- SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving [Notes] CVPR 2020 oral [Waymo, auto data generation, surfel]
- LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World [Notes] CVPR 2020 oral [Uber ATG, auto data generation, surfel]
- SuMa++: Efficient LiDAR-based Semantic SLAM IROS 2019 [semantic segmentation, lidar, SLAM]
- PyrOccNet: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks [Notes] CVPR 2020 oral [BEV-Net, OFT]
- MonoLayout: Amodal scene layout from a single image [Notes] WACV 2020 [BEV-Net]
- BEV-Seg: Bird’s Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud [Notes] CVPR 2020 workshop [BEV-Net, Mapping]
- A Geometric Approach to Obtain a Bird's Eye View from an Image ICCVW 2019 [mapping, geometry, Andrew Zisserman]
- FrozenDepth: Learning the Depths of Moving People by Watching Frozen People [Notes] CVPR 2019 oral
- ORB-SLAM: a Versatile and Accurate Monocular SLAM System TRO 2015
- ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras TRO 2016
- CubeSLAM: Monocular 3D Object SLAM [Notes] TRO 2019 [dynamic SLAM, orb slam + mono3D]
- ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings [Notes] CVPR 2020 [general dynamic SLAM]
- S3DOT: Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving [Notes] ECCV 2018 [Peiliang Li]
- Multi-object Monocular SLAM for Dynamic Environments [Notes] IV 2020 [monolayout authors]
- PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume [Notes] CVPR 2018 oral [Optical flow]
- LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation CVPR 2018 [Optical flow]
- FlowNet: Learning Optical Flow With Convolutional Networks ICCV 2015 [Optical flow]
- FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks CVPR 2017 [Optical flow]
- ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network CVPR 2019 [semantic segmentation, lightweight]
- Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes ICCV 2019 [depth uncertainty]
- Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems [Notes] [Honda] ICRA 2019
- PackNet: 3D Packing for Self-Supervised Monocular Depth Estimation [Notes] CVPR 2020 oral [Scale aware depth]
- PackNet-SG: Semantically-Guided Representation Learning for Self-Supervised Monocular Depth [Notes] ICLR 2020 [TRI, infinite-depth problem]
- TrianFlow: Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Notes] CVPR 2020 [Scale aware]
- Understanding the Limitations of CNN-based Absolute Camera Pose Regression [Notes] CVPR 2019 [Drawbacks of PoseNet, MapNet, Laura Leal-Taixe@TUM]
- To Learn or Not to Learn: Visual Localization from Essential Matrices [Notes] ICRA 2020 [SIFT + 5 pt solver >> others for VO, Laura Leal-Taixe@TUM]
- DF-VO: Visual Odometry Revisited: What Should Be Learnt? [Notes] ICRA 2020 [Depth and Flow for accurate VO]
- D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry [Notes] CVPR 2020 oral [Daniel Cremers, TUM, depth uncertainty]
- Network Slimming: Learning Efficient Convolutional Networks through Network Slimming [Notes] ICCV 2017
- BatchNorm Pruning: Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers [Notes] ICLR 2018
- Direct Sparse Odometry PAMI 2018
- Train in Germany, Test in The USA: Making 3D Object Detectors Generalize [Notes] CVPR 2020
- PseudoLidarV3: End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [Notes] CVPR 2020
- ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection [Notes] CVPR 2020 oral
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression AAAI 2020
- Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [Journal version]
- YOLOv4: Optimal Speed and Accuracy of Object Detection [Notes]
- CBN: Cross-Iteration Batch Normalization [Notes]
- Stitcher: Feedback-driven Data Provider for Object Detection [Notes]
- SKNet: Selective Kernel Networks [Notes] CVPR 2019
- CBAM: Convolutional Block Attention Module [Notes] ECCV 2018
- EfficientDet: Scalable and Efficient Object Detection CVPR 2020
- ResNeSt: Split-Attention Networks [Notes]
- ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst [Notes] RSS 2019 [Waymo]
- IntentNet: Learning to Predict Intention from Raw Sensor Data [Notes] CoRL 2018 [Uber ATG, perception and prediction, Lidar+Map]
- RoR: Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions [Notes] CVPR 2019 [Zoox]
- MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction [Notes] CoRL 2019 [Waymo, authors from RoR and ChauffeurNet]
- NMP: End-to-end Interpretable Neural Motion Planner [Notes] CVPR 2019 oral [Uber ATG]
- Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks [Notes] ICRA 2019 [Multimodal, Uber ATG Pittsburgh]
- Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving WACV 2020 [Uber ATG Pittsburgh]
- Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles IROS 2019 Oral [Uber ATG, behavioral planning, motion planning]
- TensorMask: A Foundation for Dense Object Segmentation [Notes] ICCV 2019 [single-stage instance seg]
- BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [Notes] CVPR 2020 oral
- Mask Encoding for Single Shot Instance Segmentation [Notes] CVPR 2020 oral [single-stage instance seg, Chunhua Shen]
- PolarMask: Single Shot Instance Segmentation with Polar Representation [Notes] CVPR 2020 oral [single-stage instance seg]
- SOLO: Segmenting Objects by Locations [Notes] ECCV 2020 [single-stage instance seg, Chunhua Shen]
- SOLOv2: Dynamic, Faster and Stronger [Notes] [single-stage instance seg, Chunhua Shen]
- CondInst: Conditional Convolutions for Instance Segmentation [Notes] ECCV 2020 oral [single-stage instance seg, Chunhua Shen]
- CenterMask: Single Shot Instance Segmentation With Point Representation [Notes]CVPR 2020
- VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition [Notes] ICCV 2017
- Which Tasks Should Be Learned Together in Multi-task Learning? [Notes] [Stanford, MTL] ICML 2020
- Multi-Task Learning as Multi-Objective Optimization NeurIPS 2018
- Taskonomy: Disentangling Task Transfer Learning [Notes] CVPR 2018
- Rethinking ImageNet Pre-training [Notes] ICCV 2019 [Kaiming He]
- UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor [Notes] [superpoint]
- KP2D: Neural Outlier Rejection for Self-Supervised Keypoint Learning [Notes] ICLR 2020 (pointNet)
- KP3D: Self-Supervised 3D Keypoint Learning for Ego-motion Estimation [Notes] CoRL 2020 [Toyota, superpoint]
- NG-RANSAC: Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses [Notes] ICCV 2019 [pointNet]
- Learning to Find Good Correspondences [Notes] CVPR 2018 Oral (pointNet)
- RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving [Notes] [Huawei, Mono3D]
- DSP: Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation [Notes] AAAI 2020 (SenseTime, Mono3D)
- Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks (LLD, LSTM)
- LaneNet: Towards End-to-End Lane Detection: an Instance Segmentation Approach [Notes] IV 2018 (LaneNet)
- 3D-LaneNet: End-to-End 3D Multiple Lane Detection [Notes] ICCV 2019
- Semi-Local 3D Lane Detection and Uncertainty Estimation [Notes] [GM Israel, 3D LLD]
- Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection [Notes] ECCV 2020 [Apollo, 3D LLD]
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty CVPR 2018 [Egocentric prediction]
- It’s Not All About Size: On the Role of Data Properties in Pedestrian Detection ECCV 2018 [pedestrian]
- Associative Embedding: End-to-End Learning for Joint Detection and Grouping [Notes] NIPS 2017
- Pixels to Graphs by Associative Embedding [Notes] NIPS 2017
- Social LSTM: Human Trajectory Prediction in Crowded Spaces [Notes] CVPR 2017
- Online Video Object Detection using Association LSTM [Notes] [single stage, recurrent]
- SuperPoint: Self-Supervised Interest Point Detection and Description [Notes] CVPR 2018 (channel-to-pixel, deep SLAM, Magic Leap)
- PointRend: Image Segmentation as Rendering [Notes] CVPR 2020 Oral [Kaiming He, FAIR]
- Multigrid: A Multigrid Method for Efficiently Training Video Models [Notes] CVPR 2020 Oral [Kaiming He, FAIR]
- GhostNet: More Features from Cheap Operations [Notes] CVPR 2020
- FixRes: Fixing the train-test resolution discrepancy [Notes] NIPS 2019 [FAIR]
- MoVi-3D: Towards Generalization Across Depth for Monocular 3D Object Detection [Notes] ECCV 2020 [Virtual Cam, viewport, Mapillary/Facebook, Mono3D]
- Amodal Completion and Size Constancy in Natural Scenes [Notes] ICCV 2015 (Amodal completion)
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning [Notes] CVPR 2020 Oral [FAIR, Kaiming He]
- Double Descent: Reconciling modern machine learning practice and the bias-variance trade-of [Notes] PNAS 2019
- Deep Double Descent: Where Bigger Models and More Data Hurt [Notes]
- Visualizing the Loss Landscape of Neural Nets NIPS 2018
- The ApolloScape Open Dataset for Autonomous Driving and its Application CVPR 2018 (dataset)
- ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving [Notes] CVPR 2019
- Part-level Car Parsing and Reconstruction from a Single Street View [Notes] [Baidu]
- 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images [Notes] CVPR 2019
- RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving [Notes] ECCV 2020 spotlight
- DORN: Deep Ordinal Regression Network for Monocular Depth Estimation [Notes] CVPR 2018 [monodepth, supervised]
- D&T: Detect to Track and Track to Detect [Notes] ICCV 2017 (from Feichtenhofer)
- CRF-Net: A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection [Notes] SDF 2019 (radar detection)
- RVNet: Deep Sensor Fusion of Monocular Camera and Radar for Image-based Obstacle Detection in Challenging Environments [Notes] PSIVT 2019
- RRPN: Radar Region Proposal Network for Object Detection in Autonomous Vehicles [Notes] ICIP 2019
- ROLO: Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking [Notes] ISCAS 2016
- Recurrent SSD: Recurrent Multi-frame Single Shot Detector for Video Object Detection [Notes] BMVC 2018 (Mitsubishi)
- Recurrent RetinaNet: A Video Object Detection Model Based on Focal Loss [Notes] ICONIP 2018 (single stage, recurrent)
- Actions as Moving Points [Notes] [not suitable for online]
- The PREVENTION dataset: a novel benchmark for PREdiction of VEhicles iNTentIONs [Notes] ITSC 2019 [dataset, cut-in]
- Semi-Automatic High-Accuracy Labelling Tool for Multi-Modal Long-Range Sensor Dataset [Notes] IV 2018
- Astyx dataset: Automotive Radar Dataset for Deep Learning Based 3D Object Detection [Notes] EuRAD 2019 (Astyx)
- Astyx camera radar: Deep Learning Based 3D Object Detection for Automotive Radar and Camera [Notes] EuRAD 2019 (Astyx)
- How Do Neural Networks See Depth in Single Images? [Notes] ICCV 2019
- Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera ICRA 2019 (depth completion)
- DC: Depth Coefficients for Depth Completion [Notes] CVPR 2019 [Xiaoming Liu, Multimodal]
- Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation [Notes] ICRA 2017
- VO-Monodepth: Enhancing self-supervised monocular depth estimation with traditional visual odometry [Notes] 3DV 2019 (sparse to dense)
- Probabilistic Object Detection: Definition and Evaluation [Notes]
- The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation [Notes] ICCV 2019
- On Calibration of Modern Neural Networks [Notes] ICML 2017 (Weinberger)
- Extreme clicking for efficient object annotation [Notes] ICCV 2017
- Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems [Notes] NeurIPS 2019 (radar)
- Deep Active Learning for Efficient Training of a LiDAR 3D Object Detector [Notes] IV 2019
- C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion [Notes] ICCV 2019
- YOLACT: Real-time Instance Segmentation [Notes] ICCV 2019 [single-stage instance seg]
- YOLACT++: Better Real-time Instance Segmentation [single-stage instance seg]
- Review of Image and Feature Descriptors
- Vehicle Detection With Automotive Radar Using Deep Learning on Range-Azimuth-Doppler Tensors [Notes] ICCV 2019
- GPP: Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road [Notes] IV 2020 [UCSD, Trevidi, mono 3DOD]
- MVRA: Multi-View Reprojection Architecture for Orientation Estimation [Notes] ICCV 2019
- YOLOv3: An Incremental Improvement
- Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving [Notes] ICCV 2019 (Detection with Uncertainty)
- Bayesian YOLOv3: Uncertainty Estimation in One-Stage Object Detection [Notes] [DriveU]
- Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [Notes] ITSC 2018 (DriveU)
- Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [Notes] IV 2019 (DriveU)
- Can We Trust You? On Calibration of a Probabilistic Object Detector for Autonomous Driving [Notes] IROS 2019 (DriveU)
- LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [Notes] CVPR 2019 (uncertainty)
- LaserNet KL: Learning an Uncertainty-Aware Object Detector for Autonomous Driving [Notes] [LaserNet with KL divergence]
- IoUNet: Acquisition of Localization Confidence for Accurate Object Detection [Notes] ECCV 2018
- gIoU: Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression [Notes] CVPR 2019
- The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks CVPR 2018 [IoU as loss]
- KL Loss: Bounding Box Regression with Uncertainty for Accurate Object Detection [Notes] CVPR 2019
- CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth [Notes] CVPR 2019
- BayesOD: A Bayesian Approach for Uncertainty Estimation in Deep Object Detectors [Notes]
- TW-SMNet: Deep Multitask Learning of Tele-Wide Stereo Matching [Notes] ICIP 2019
- Accurate Uncertainties for Deep Learning Using Calibrated Regression [Notes] ICML 2018
- Calibrating Uncertainties in Object Localization Task [Notes] NIPS 2018
- SMWA: On the Over-Smoothing Problem of CNN Based Disparity Estimation [Notes] ICCV 2019 [Multimodal, depth estimation]
- Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image [Notes] ICRA 2018 (depth completion)
- Review of monocular object detection
- Review of 2D 3D contraints in Mono 3DOD
- MonoGRNet 2: Monocular 3D Object Detection via Geometric Reasoning on Keypoints [Notes] [estimates depth from keypoints]
- Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [Notes] CVPR 2017
- SS3D: Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss [Notes] [rergess distance from images, centernet like]
- GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving [Notes] CVPR 2019
- M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [Notes] ICCV 2019 oral [3D anchors, cyclists, Xiaoming Liu]
- TLNet: Triangulation Learning Network: from Monocular to Stereo 3D Object Detection [Notes] CVPR 2019
- A Survey on 3D Object Detection Methods for Autonomous Driving Applications [Notes] TITS 2019 [Review]
- BEV-IPM: Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image [Notes] IV 2019
- ForeSeE: Task-Aware Monocular Depth Estimation for 3D Object Detection [Notes] AAAI 2020 oral [successor to pseudo-lidar, mono 3DOD SOTA]
- Obj-dist: Learning Object-specific Distance from a Monocular Image [Notes] ICCV 2019 (xmotors.ai + NYU) [monocular distance]
- DisNet: A novel method for distance estimation from monocular camera [Notes] IROS 2018 [monocular distance]
- BirdGAN: Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles [Notes] IROS 2019
- Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints [Notes] ICIP 2019
- 3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare [Notes] CVPR 2018
- Deep Optics for Monocular Depth Estimation and 3D Object Detection [Notes] ICCV 2019
- MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation [Notes] ICCV 2019
- Joint Monocular 3D Vehicle Detection and Tracking [Notes] ICCV 2019 (Berkeley DeepDrive)
- CasGeo: 3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results [Notes]
- Slimmable Neural Networks [Notes] ICLR 2019
- Universally Slimmable Networks and Improved Training Techniques [Notes] ICCV 2019
- AutoSlim: Towards One-Shot Architecture Search for Channel Numbers
- Once for All: Train One Network and Specialize it for Efficient Deployment
- DOTA: A Large-scale Dataset for Object Detection in Aerial Images [Notes] CVPR 2018 (rotated bbox)
- RoiTransformer: Learning RoI Transformer for Oriented Object Detection in Aerial Images [Notes] CVPR 2019 (rotated bbox)
- RRPN: Arbitrary-Oriented Scene Text Detection via Rotation Proposals TMM 2018
- R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection (rotated bbox)
- TI white paper: Webinar: mmWave Radar for Automotive and Industrial applications [Notes] [TI, radar]
- Federated Learning: Strategies for Improving Communication Efficiency [Notes] NIPS 2016
- sort: Simple Online and Realtime Tracking [Notes] ICIP 2016
- deep-sort: Simple Online and Realtime Tracking with a Deep Association Metric [Notes]
- MT-CNN: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks [Notes] SPL 2016 (real time, facial landmark)
- RetinaFace: Single-stage Dense Face Localisation in the Wild [Notes] CVPR 2020 [joint object and landmark detection]
- SC-SfM-Learner: Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video [Notes] NIPS 2019
- SiamMask: Fast Online Object Tracking and Segmentation: A Unifying Approach CVPR 2019 (tracking, segmentation, label propagation)
- Review of Kálmán Filter (from Tim Babb, Pixar Animation) [Notes]
- R-FCN: Object Detection via Region-based Fully Convolutional Networks [Notes] NIPS 2016
- Guided backprop: Striving for Simplicity: The All Convolutional Net [Notes] ICLR 2015
- Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks [Notes] CVPR 2019
- Boxy Vehicle Detection in Large Images [Notes] ICCV 2019
- FQNet: Deep Fitting Degree Scoring Network for Monocular 3D Object Detection [Notes] CVPR 2019 [Mono 3DOD, Jiwen Lu]
- Mono3D: Monocular 3D Object Detection for Autonomous Driving [Notes] CVPR2016
- MonoDIS: Disentangling Monocular 3D Object Detection [Notes] ICCV 2019
- Pseudo lidar-e2e: Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud [Notes] ICCV 2019 (pseudo-lidar with 2d and 3d consistency loss, better than PL and worse than PL++, SOTA for pure mono3D)
- MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization [Notes] AAAI 2019 (SOTA of Mono3DOD, MLF < MonoGRNet < Pseudo-lidar)
- MLF: Multi-Level Fusion based 3D Object Detection from Monocular Images [Notes] CVPR 2018 (precursor to pseudo-lidar)
- ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape [Notes] CVPR 2019
- AM3D: Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving [Notes] ICCV 2019 [similar to pseudo-lidar, color-enhanced]
- Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors [Notes] (from Stefano Soatto) AAAI 2019
- Deep Metadata Fusion for Traffic Light to Lane Assignment [Notes] IEEE RA-L 2019 (traffic lights association)
- Automatic Traffic Light to Ego Vehicle Lane Association at Complex Intersections ITSC 2019 (traffic lights association)
- Distant Vehicle Detection Using Radar and Vision[Notes] ICRA 2019 [radar, vision, radar tracklets fusion]
- Distance Estimation of Monocular Based on Vehicle Pose Information [Notes]
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics [Notes] CVPR 2018 (Alex Kendall)
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks [Notes] ICML 2018 (multitask)
- DTP: Dynamic Task Prioritization for Multitask Learning [Notes] ECCV 2018 [multitask, Stanford]
- Will this car change the lane? - Turn signal recognition in the frequency domain [Notes] IV 2014
- Complex-YOLO: Real-time 3D Object Detection on Point Clouds [Notes] (BEV detection only)
- Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds CVPR 2019 (sensor fusion and tracking)
- An intriguing failing of convolutional neural networks and the CoordConv solution [Notes] NIPS 2018
- Deep Parametric Continuous Convolutional Neural Networks [Notes] CVPR 2018 (@Uber, sensor fusion)
- ContFuse: Deep Continuous Fusion for Multi-Sensor 3D Object Detection [Notes] ECCV 2018 [Uber ATG, sensor fusion, BEV]
- Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [Notes] CVPR 2018 oral [lidar only, perception and prediction]
- LearnK: Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras [Notes] ICCV 2019 [monocular depth estimation, intrinsic estimation, SOTA]
- monodepth: Unsupervised Monocular Depth Estimation with Left-Right Consistency [Notes] CVPR 2017 oral (monocular depth estimation, stereo for training)
- Struct2depth: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos [Notes] AAAI 2019 [monocular depth estimation, estimating movement of dynamic object, infinite depth problem, online finetune]
- Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency [Notes] AAAI 2018 (monocular depth estimation, static assumption, surface normal)
- LEGO Learning Edge with Geometry all at Once by Watching Videos [Notes] CVPR 2018 spotlight (monocular depth estimation, static assumption, surface normal)
- Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network [Notes] (radar, RD map, OD, Arxiv 201902)
- A study on Radar Target Detection Based on Deep Neural Networks [Notes] (radar, RD map, OD)
- 2D Car Detection in Radar Data with PointNets [Notes] (from Ulm Univ, radar, point cloud, OD, Arxiv 201904)
- Learning Confidence for Out-of-Distribution Detection in Neural Networks [Notes] (budget to cheat)
- A Deep Learning Approach to Traffic Lights: Detection, Tracking, and Classification [Notes] ICRA 2017 (Bosch, traffic lights)
- How hard can it be? Estimating the difficulty of visual search in an image [Notes] CVPR 2016
- Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [Notes] (review from Bosch)
- Review of monocular 3d object detection (blog from 知乎)
- Deep3dBox: 3D Bounding Box Estimation Using Deep Learning and Geometry [Notes] CVPR 2017 [Zoox]
- MonoPSR: Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [Notes] CVPR 2019
- OFT: Orthographic Feature Transform for Monocular 3D Object Detection [Notes] BMVC 2019 [Convert camera to BEV, Alex Kendall]
- MixMatch: A Holistic Approach to Semi-Supervised Learning [Notes]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Notes] ICML 2019
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? [Notes] NIPS 2017
- Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding [Notes]BMVC 2017
- TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents [Notes] AAAI 2019 oral
- Deep Depth Completion of a Single RGB-D Image [Notes] CVPR 2018 (indoor)
- DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image [Notes] CVPR 2019 (outdoor)
- SfMLearner: Unsupervised Learning of Depth and Ego-Motion from Video [Notes] CVPR 2017
- Monodepth2: Digging Into Self-Supervised Monocular Depth Estimation [Notes] ICCV 2019 [Niantic]
- DeepSignals: Predicting Intent of Drivers Through Visual Signals [Notes] ICRA 2019 (@Uber, turn signal detection)
- FCOS: Fully Convolutional One-Stage Object Detection [Notes] ICCV 2019 [Chunhua Shen]
- Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving [Notes] ICLR 2020
- MMF: Multi-Task Multi-Sensor Fusion for 3D Object Detection [Notes] CVPR 2019 (@Uber, sensor fusion)
- CenterNet: Objects as points (from ExtremeNet authors) [Notes]
- CenterNet: Object Detection with Keypoint Triplets [Notes]
- Object Detection based on Region Decomposition and Assembly [Notes] AAAI 2019
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [Notes] ICLR 2019
- M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network [Notes] AAAI 2019
- Deep Radar Detector [Notes] RadarCon 2019
- Semantic Segmentation on Radar Point Clouds [[Notes]] (from Daimler AG) FUSION 2018
- Pruning Filters for Efficient ConvNets [Notes] ICLR 2017
- Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks [Notes] NIPS 2018 talk
- LeGR: Filter Pruning via Learned Global Ranking [Notes] CVPR 2020 oral
- NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [Notes] CVPR 2019
- AutoAugment: Learning Augmentation Policies from Data [Notes] CVPR 2019
- Path Aggregation Network for Instance Segmentation [Notes] CVPR 2018
- Channel Pruning for Accelerating Very Deep Neural Networks ICCV 2017 (Face++, Yihui He) [Notes]
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices ECCV 2018 (Song Han, Yihui He)
- MobileNetV3: Searching for MobileNetV3 [Notes]
- MnasNet: Platform-Aware Neural Architecture Search for Mobile [Notes] CVPR 2019
- Rethinking the Value of Network Pruning ICLR 2019
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (MobileNets v2) [Notes] CVPR 2018
- A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms [Notes] ITSC 2013
- MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving [Notes]
- Optimizing the Trade-off between Single-Stage and Two-Stage Object Detectors using Image Difficulty Prediction (Very nice illustration of 1 and 2 stage object detection)
- Light-Head R-CNN: In Defense of Two-Stage Object Detector [Notes] (from Megvii)
- CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection [Notes] CVPR 2019 [center and scale prediction, anchor-free, near SOTA pedestrian]
- Review of Anchor-free methods (知乎Blog) 目标检测:Anchor-Free时代 Anchor free深度学习的目标检测方法 My Slides on CSP
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- CornerNet: Detecting Objects as Paired Keypoints [Notes] ECCV 2018
- ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points [Notes] CVPR 2019
- FSAF: Feature Selective Anchor-Free Module for Single-Shot Object Detection [Notes] CVPR 2019
- FoveaBox: Beyond Anchor-based Object Detector (anchor-free) [Notes]
- Bag of Freebies for Training Object Detection Neural Networks [Notes]
- mixup: Beyond Empirical Risk Minimization [Notes] ICLR 2018
- Multi-view Convolutional Neural Networks for 3D Shape Recognition (MVCNN) [Notes] ICCV 2015
- 3D ShapeNets: A Deep Representation for Volumetric Shapes [Notes] CVPR 2015
- Volumetric and Multi-View CNNs for Object Classification on 3D Data [Notes] CVPR 2016
- Group Normalization [Notes] ECCV 2018
- Spatial Transformer Networks [Notes] NIPS 2015
- Frustum PointNets for 3D Object Detection from RGB-D Data (F-PointNet) [Notes] CVPR 2018
- Dynamic Graph CNN for Learning on Point Clouds [Notes]
- PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud (SOTA for 3D object detection) [Notes] CVPR 2019
- Multi-View 3D Object Detection Network for Autonomous Driving (MV3D) [Notes] CVPR 2017 (Baidu, sensor fusion, BV proposal)
- Joint 3D Proposal Generation and Object Detection from View Aggregation (AVOD) [Notes] IROS 2018 (sensor fusion, multiview proposal)
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [Notes]
- Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gafp in 3D Object Detection for Autonomous Driving [Notes] CVPR 2019
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection CVPR 2018 (Apple, first end-to-end point cloud encoding to grid)
- SECOND: Sparsely Embedded Convolutional Detection Sensors 2018 (builds on VoxelNet)
- PointPillars: Fast Encoders for Object Detection from Point Clouds [Notes] CVPR 2019 (builds on SECOND)
- Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite [Notes] CVPR 2012
- Vision meets Robotics: The KITTI Dataset [Notes] IJRR 2013
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) [Notes]Video CVPR 2017
- Initialization Strategies of Spatio-Temporal Convolutional Neural Networks [Notes] Video
- Detect-and-Track: Efficient Pose Estimation in Videos [Notes] ICCV 2017 Video
- Deep Learning Based Rib Centerline Extraction and Labeling [Notes] MI MICCAI 2018
- SlowFast Networks for Video Recognition [Notes] ICCV 2019 Oral
- Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) [Notes] CVPR 2017
- Beyond the pixel plane: sensing and learning in 3D (blog, 中文版本)
- VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition (VoxNet) [Notes]
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation CVPR 2017 [Notes]
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space NIPS 2017 [Notes]
- Review of Geometric deep learning 几何深度学习前沿 (from 知乎) (Up to CVPR 2018)
- DQN: Human-level control through deep reinforcement learning (Nature DQN paper) [Notes] DRL
- Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection [Notes] MI
- Panoptic Segmentation [Notes] PanSeg
- Panoptic Feature Pyramid Networks [Notes] PanSeg
- Attention-guided Unified Network for Panoptic Segmentation [Notes] PanSeg
- Bag of Tricks for Image Classification with Convolutional Neural Networks [Notes] CLS
- Deep Reinforcement Learning for Vessel Centerline Tracing in Multi-modality 3D Volumes [Notes] DRL MI
- Deep Reinforcement Learning for Flappy Bird [Notes] DRL
- Long-Term Feature Banks for Detailed Video Understanding [Notes] Video
- Non-local Neural Networks [Notes] Video CVPR 2018
- Mask R-CNN
- Cascade R-CNN: Delving into High Quality Object Detection
- Focal Loss for Dense Object Detection (RetinaNet) [Notes]
- Squeeze-and-Excitation Networks (SENet)
- Progressive Growing of GANs for Improved Quality, Stability, and Variation
- Deformable Convolutional Networks ICCV 2017 [build on R-FCN]
- Learning Region Features for Object Detection
- Learning notes on Deep Learning
- List of Papers on Machine Learning
- Notes of Literature Review on CNN in CV This is the notes for all the papers in the recommended list here
- Notes of Literature Review (Others)
- Notes on how to set up DL/ML environment
- Useful setup notes
Here is the list of papers waiting to be read.
- SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness ICML 2019
- Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet (BagNet) blog ICML 2019
- A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
- Understanding deep learning requires rethinking generalization
- Gradient Reversal: Unsupervised Domain Adaptation by Backpropagation ICML 2015
- Mask Scoring R-CNN CVPR 2019
- Training Region-based Object Detectors with Online Hard Example Mining
- Gliding vertex on the horizontal bounding box for multi-oriented object detection
- ONCE: Incremental Few-Shot Object Detection CVPR 2020
- Domain Adaptive Faster R-CNN for Object Detection in the Wild CVPR 2018
- Foggy Cityscapes: Semantic Foggy Scene Understanding with Synthetic Data IJCV 2018
- Foggy Cityscapes ECCV: Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding ECCV 2018
- Dropout Sampling for Robust Object Detection in Open-Set Conditions ICRA 2018 (Niko Sünderhauf)
- Hybrid Task Cascade for Instance Segmentation CVPR 2019 (cascaded mask RCNN)
- Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection ICRA 2019 (Niko Sünderhauf)
- A Unified Panoptic Segmentation Network CVPR 2019 PanSeg
- Model Vulnerability to Distributional Shifts over Image Transformation Sets (CVPR workshop) tl:dr
- Automatic adaptation of object detectors to new domains using self-training CVPR 2019 (find corner case and boost)
- Missing Labels in Object Detection CVPR 2019
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- Circular Object Detection in Polar Coordinates for 2D LIDAR Data CCPR 2016
- LFFD: A Light and Fast Face Detector for Edge Devices [Lightweight, face detection, car detection]
- UnitBox: An Advanced Object Detection Network ACM MM 2016 [Ln IoU loss, Thomas Huang]
- Learning Spatiotemporal Features with 3D Convolutional Networks (C3D) Video ICCV 2015
- AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
- Spatiotemporal Residual Networks for Video Action Recognition (decouple spatiotemporal) NIPS 2016
- Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks (P3D, decouple spatiotemporal) ICCV 2017
- A Closer Look at Spatiotemporal Convolutions for Action Recognition (decouple spatiotemporal) CVPR 2018
- Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification (decouple spatiotemporal) ECCV 2018
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? CVPR 2018
- AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation ICCV 2019
- One-Shot Video Object Segmentation CVPR 2017
- Looking Fast and Slow: Memory-Guided Mobile Video Object Detection CVPR 2018
- Towards High Performance Video Object Detection [Notes] CVPR 2018
- Towards High Performance Video Object Detection for Mobiles [Notes]
- Temporally Distributed Networks for Fast Video Semantic Segmentation CVPR 2020 [efficient video segmentation]
- Memory Enhanced Global-Local Aggregation for Video Object Detection CVPR 2020 [efficient video object detection]
- Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation IJCAI 2018 oral [video skeleton]
- RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving NeurIPS 2019 workshop
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description CVPR 2015 oral
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition ECCV 2016
- TRN: Temporal Relational Reasoning in Videos ECCV 2018
- X3D: Expanding Architectures for Efficient Video Recognition CVPR 2020 oral [FAIR]
- Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians CVPR 2020 oral [pedestrian, video]
- Flow-guided feature aggregation for video object detection ICCV 2017 [video, object detection]
- 3D human pose estimation in video with temporal convolutions and semi-supervised training CVPR 2019 [mono3D pose estimation from video]
- OmegaNet: Distilled Semantics for Comprehensive Scene Understanding from Videos CVPR 2020
- Object Detection in Videos with Tubelet Proposal Networks CVPR 2017 [video object detection]
- T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos [video object detection]
- Flow-Guided Feature Aggregation for Video Object Detection ICCV 2017 [Jifeng Dai]
- Efficient Deep Learning Inference based on Model Compression (Model Compression)
- Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks
- CBAM: Convolutional Block Attention Module
- Playing Atari with Deep Reinforcement Learning NIPS 2013
- Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scan
- An Artificial Agent for Robust Image Registration
- 3D-CNN:3D Convolutional Neural Networks for Landing Zone Detection from LiDAR
- Generative and Discriminative Voxel Modeling with Convolutional Neural Networks
- Orientation-boosted Voxel Nets for 3D Object Recognition (ORION) <BMVC 2017>
- GIFT: A Real-time and Scalable 3D Shape Search Engine CVPR 2016
- 3D Shape Segmentation with Projective Convolutional Networks (ShapePFCN)CVPR 2017
- Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
- Open3D: A Modern Library for 3D Data Processing
- Multimodal Deep Learning for Robust RGB-D Object Recognition IROS 2015
- FlowNet3D: Learning Scene Flow in 3D Point Clouds CVPR 2019
- Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling CVPR 2018 (Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds)
- PU-Net: Point Cloud Upsampling Network CVPR 2018
- Recurrent Slice Networks for 3D Segmentation of Point Clouds CVPR 2018
- SPLATNet: Sparse Lattice Networks for Point Cloud Processing CVPR 2018
- Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering NIPS 2016
- Semi-Supervised Classification with Graph Convolutional Networks ICLR 2017
- Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks NIPS 2017
- Graph Attention Networks ICLR 2018
- 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection (3D SSD)
- Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models ICCV 2017
- Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis CVPR 2017
- IPOD: Intensive Point-based Object Detector for Point Cloud
- Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images CVPR 2017
- 2D-Driven 3D Object Detection in RGB-D Images
- 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection
- Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection [classify occluded object]
- PSMNet: Pyramid Stereo Matching Network CVPR 2018
- Stereo R-CNN based 3D Object Detection for Autonomous Driving CVPR 2019
- Deep Rigid Instance Scene Flow CVPR 2019
- Upgrading Optical Flow to 3D Scene Flow through Optical Expansion CVPR 2020
- Learning Multi-Object Tracking and Segmentation from Automatic Annotations CVPR 2020 [automatic MOTS annotation]
- Traffic-Sign Detection and Classification in the Wild CVPR 2016 [Tsinghua, Tencent, traffic signs]
- A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection IEEE CRV 2018 [U torronto]
- Detecting Traffic Lights by Single Shot Detection ITSC 2018
- DeepTLR: A single Deep Convolutional Network for Detection and Classification of Traffic Lights IV 2016
- Evaluating State-of-the-art Object Detector on Challenging Traffic Light Data CVPR 2017 workshop
- Traffic light recognition in varying illumination using deep learning and saliency map ITSC 2014 [traffic light]
- Traffic light recognition using high-definition map features RAS 2019
- Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives TITS 2015
- The DriveU Traffic Light Dataset: Introduction and Comparison with Existing Datasets ICRA 2018
- The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset
- Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives (traffic light survey, UCSD LISA)
- Review of Graph Spectrum Theory (WIP)
- 3D Deep Learning Tutorial at CVPR 2017 [Notes] - (WIP)
- A Survey on Neural Architecture Search
- Network pruning tutorial (blog)
- GNN tutorial at CVPR 2019
- One Thousand and One Hours: Self-driving Motion Prediction Dataset
- PANDA: A Gigapixel-level Human-centric Video Dataset CVPR 2020
- SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences ICCV 2019
- Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation 3DV 2018
- Depth Map Prediction from a Single Image using a Multi-Scale Deep Network NIPS 2014 (Eigen et al)
- Learning Depth from Monocular Videos using Direct Methods CVPR 2018 (monocular depth estimation)
- Virtual-Normal: Enforcing geometric constraints of virtual normal for depth prediction [Notes] ICCV 2019 (better generation of PL)
- Spatial Correspondence with Generative Adversarial Network: Learning Depth from Monocular Videos ICCV 2019
- Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM ICCV 2019
- Visualization of Convolutional Neural Networks for Monocular Depth Estimation ICCV 2019
- Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation ICCV 2019 workshop [indoor]
- Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation ECCV 2020 [indoor depth]
- Disambiguating Monocular Depth Estimation with a Single Transient ECCV 2020 [additional laser sensor, indoor depth]
- Guiding Monocular Depth Estimation Using Depth-Attention Volume ECCV 2020 [indoor depth]
- Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets ECCV 2020 [indoor depth]
- CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss ECCV 2020 [indoor depth]
- PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation (pointnet alternative, backbone)
- Vehicle Detection from 3D Lidar Using Fully Convolutional Network (VeloFCN) RSS 2016
- KPConv: Flexible and Deformable Convolution for Point Clouds (from the authors of PointNet)
- PointCNN: Convolution On X-Transformed Points NIPS 2018
- L3-Net: Towards Learning based LiDAR Localization for Autonomous Driving CVPR 2019
- RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement (sensor fusion, 3D mono proposal, refined in point cloud)
- DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map CVPR 2018
- Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection IROS 2019
- PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing
- Gated2Depth: Real-time Dense Lidar from Gated Images ICCV 2019 oral
- A Multi-Sensor Fusion System for Moving Object Detection and Tracking in Urban Driving Environments ICRA 2014
- PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation CVPR 2018 [sensor fusion, Zoox]
- Deep Hough Voting for 3D Object Detection in Point Clouds ICCV 2019 [Charles Qi]
- StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation
- PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection CVPR 2020 [Waymo challenge 2nd place]
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation CVPR 2020
- Depth Sensing Beyond LiDAR Range CVPR 2020 [wide baseline stereo with trifocal]
- Probabilistic Semantic Mapping for Urban Autonomous Driving Applications IROS 2020 [lidar mapping]
- RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds CVPR 2020 oral [lidar segmentation]
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation CVPR 2020 [lidar segmentation]
- OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression CVPR 2020 oral [lidar compression]
- MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models NeurIPS 2020 oral [lidar compression]
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty CVPR 2018 [on-board bbox prediction]
- Unsupervised Traffic Accident Detection in First-Person Videos IROS 2019 (Honda)
- NEMO: Future Object Localization Using Noisy Ego Priors (Honda)
- Robust Aleatoric Modeling for Future Vehicle Localization (perspective)
- Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments WACV 2020 (perspective bbox, pedestrian)
- Using panoramic videos for multi-person localization and tracking in a 3D panoramic coordinate
- End-to-end Lane Detection through Differentiable Least-Squares Fitting ICCV 2019
- Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit TITS 2019 [object-like proposals]
- Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers [3D LLD]
- Ultra Fast Structure-aware Deep Lane Detection ECCV 2020 [lane detection]
- A Novel Approach for Detecting Road Based on Two-Stream Fusion Fully Convolutional Network (convert camera to BEV)
- FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network
- RetinaTrack: Online Single Stage Joint Detection and Tracking CVPR 2020
- Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art (latest update in Dec 2019)
- Simultaneous Identification and Tracking of Multiple People Using Video and IMUs CVPR 2019
- Detect-and-Track: Efficient Pose Estimation in Videos
- TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis
- Video Action Transformer Network CVPR 2019 oral
- Online Real-time Multiple Spatiotemporal Action Localisation and Prediction ICCV 2017
- 多目标跟踪 近年论文及开源代码汇总
- GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning CVPR 2020 oral [3DMOT, CMU, Kris Kitani]
- Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking ECCV 2020 spotlight [MOT, Tencent]
- Towards Real-Time Multi-Object Tracking ECCV 2020 [MOT]
- PifPaf: Composite Fields for Human Pose Estimation CVPR 2019
- Probabilistic Face Embeddings ICCV 2019
- Data Uncertainty Learning in Face Recognition CVPR 2020
- Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos CVPR 2020 oral [VGG, self-supervised, interpretable, discriminator]
- Revisiting Small Batch Training for Deep Neural Networks
- ICML2019 workshop: Adaptive and Multitask Learning: Algorithms & Systems ICML 2019
- Adaptive Scheduling for Multi-Task Learning NIPS 2018 (NMT)
- Polar Transformer Networks ICLR 2018
- Measuring Calibration in Deep Learning CVPR 2019
- Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation ICCV 2019 (epistemic uncertainty)
- Making Convolutional Networks Shift-Invariant Again ICML
- ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks ICCV 2019
- Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty NeurIPS 2019
- Understanding deep learning requires rethinking generalization ICLR 2017 [ICLR best paper]
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks ICLR 2017 (NLL score as anomaly score)
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination CVPR 2018 spotlight (Stella Yu)
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks TIP 2018
- The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning ICML 2018
- Designing Network Design Spaces CVPR 2020
- Moco2: Improved Baselines with Momentum Contrastive Learning
- SGD on Neural Networks Learns Functions of Increasing Complexity NIPS 2019 (SGD learns a linear classifier first)
- Pay attention to the activations: a modular attention mechanism for fine-grained image recognition
- A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images BMVC 2018 (multi-bin, what's new?)
- In-Place Activated BatchNorm for Memory-Optimized Training of DNNs CVPR 2018 (optimized BatchNorm + ReLU)
- FCNN: Fourier Convolutional Neural Networks (FFT as CNN)
- Visualizing the Loss Landscape of Neural Nets NIPS 2018
- Xception: Deep Learning with Depthwise Separable Convolutions (Xception)
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (uncertainty)
- Learning to Drive from Simulation without Real World Labels ICRA 2019 (domain adaptation, sim2real)
- Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks CVPR 2020 oral
- Switchable Whitening for Deep Representation Learning ICCV 2019 [domain adaptation]
- Visual Chirality CVPR 2020 oral [best paper nominee]
- Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data CVPR 2020
- Self-training with Noisy Student improves ImageNet classification CVPR 2020 [distillation]
- Keep it Simple: Image Statistics Matching for Domain Adaptation CVPRW 2020 [Domain adaptation for 2D mod bbox]
- Epipolar Transformers CVPR 2020 [Yihui He]
- Scalable Uncertainty for Computer Vision With Functional Variational Inference CVPR 2020 [epistemic uncertainty with one fwd pass]
- 3DOP: 3D Object Proposals for Accurate Object Class Detection NIPS 2015
- DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation
- Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery ECCV 2018 (Monocular 3D object detection and depth estimation)
- Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-aware Representation CVPR 2019 [unified conditional decoder]
- DDP: Dense Depth Posterior from Single Image and Sparse Range CVPR 2019
- Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes IJCV 2018 (data augmentation with AR, Toyota)
- Exploring the Capabilities and Limits of 3D Monocular Object Detection -- A Study on Simulation and Real World Data IITS
- Towards Scene Understanding with Detailed 3D Object Representations IJCV 2014 (keypoint, 3D bbox annotation)
- Deep Cuboid Detection: Beyond 2D Bounding Boxes (Magic Leap)
- Viewpoints and Keypoints (Malik)
- Lifting Object Detection Datasets into 3D (PASCAL)
- 3D Object Class Detection in the Wild (keypoint based)
- Fast Single Shot Detection and Pose Estimation 3DV 2016 (SSD + pose, Wei Liu)
- Virtual KITTI 2
- Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing CVPR 2017
- Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views ICCV 2015 Oral
- Real-Time Seamless Single Shot 6D Object Pose Prediction CVPR 2018
- Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching NIPS 2018 [disparity estimation]
- Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera ICRA 2019
- Learning Depth with Convolutional Spatial Propagation Network (Baidu, depth from SPN) ECCV 2018
- Just Go with the Flow: Self-Supervised Scene Flow Estimation CVPR 2020 oral [Scene flow, Lidar]
- Online Depth Learning against Forgetting in Monocular Videos CVPR 2020 [monodepth]
- Self-Supervised Deep Visual Odometry with Online Adaptation CVPR 2020 oral [DF-VO, TrianFlow, meta-learning]
- Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume CVPR 2020
- Online Depth Learning against Forgetting in Monocular Videos CVPR 2020 [monodepth, online learning]
- SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation CVPR 2020 [monodepth, semantic]
- Inferring Distributions Over Depth from a Single Image TRO [Depth confidence, stitching them together]
- Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths CVPR 2020
- The Edge of Depth: Explicit Constraints between Segmentation and Depth CVPR 2020 [Xiaoming Liu, multimodal, depth bleeding]
- Classification of Objects in Polarimetric Radar Images Using CNNs at 77 GHz (Radar, polar)
- CNNs for Interference Mitigation and Denoising in Automotive Radar Using Real-World Data NeurIPS 2019 (radar)
- Road Scene Understanding by Occupancy Grid Learning from Sparse Radar Clusters using Semantic Segmentation ICCV 2019 (radar)
- RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects ECCV 2020 [Uber ATG]
- Depth Estimation from Monocular Images and Sparse Radar Data IROS 2020 [Camera + Radar for monodepth, nuscenes]
- RPR: Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles IROS 2020 [radar proposal refinement]
- PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [Notes] ICCV 2015
- PoseNet2: Modelling Uncertainty in Deep Learning for Camera Relocalization ICRA 2016
- PoseNet3: Geometric Loss Functions for Camera Pose Regression with Deep Learning CVPR 2017
- EssNet: Convolutional neural network architecture for geometric matching CVPR 2017
- NC-EssNet: Neighbourhood Consensus Networks NeurIPS 2018
- Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task CVPR 2020 oral [Eric Brachmann, ngransac]
- Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints CVPR 2018
- DynSLAM: Robust Dense Mapping for Large-Scale Dynamic Environments [dynamic SLAM, Andreas Geiger] ICRA 2018
- GCNv2: Efficient Correspondence Prediction for Real-Time SLAM LRA 2019 [Superpoint + orb slam]
- [Real-time Scalable Dense Surfel Mapping](Real-time Scalable Dense Surfel Mapping) ICRA 2019 [dense reconstruction, monodepth]
- Dynamic SLAM: The Need For Speed
- GSLAM: A General SLAM Framework and Benchmark ICCV 2019
- Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar CVPR 2020 [Daimler]
- Radar+RGB Attentive Fusion for Robust Object Detection in Autonomous Vehicles ICIP 2020
- Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor sensors 2020 [radar, camera, early fusion]
- A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence
- Monocular Depth Estimation Based On Deep Learning: An Overview
- Uncertainty Guided Multi-Scale Residual Learning-using a Cycle Spinning CNN for Single Image De-Raining CVPR 2019
- Learn to Combine Modalities in Multimodal Deep Learning (sensor fusion, general DL)
- Safe Trajectory Generation For Complex Urban Environments Using Spatio-temporal Semantic Corridor LRA 2019 [Motion planning]
- DAgger: Driving Policy Transfer via Modularity and Abstraction CoRL 2018 [DAgger, Immitation Learning]
- Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching ICRA 2020 [Motion planning]
- Baidu Apollo EM Motion Planner
- Calibration of Heterogeneous Sensor Systems
- Intro:Sensor Fusion for Adas 无人驾驶中的数据融合 (from 知乎) (Up to CVPR 2018)
- YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving CVPR 2019 (Real Time, Low Power)
- Deep Fusion of Heterogeneous Sensor Modalities for the Advancements of ADAS to Autonomous Vehicles
- Temporal Coherence for Active Learning in Videos ICCVW 2019 [active learning, temporal coherence]
- R-TOD: Real-Time Object Detector with Minimized End-to-End Delay for Autonomous Driving RTSS 2020 [perception system design]
- Learning Lane Graph Representations for Motion Forecasting ECCV 2020 [Uber ATG]
- VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation CVPR 2020 [Waymo]
- CoverNet: Multimodal Behavior Prediction using Trajectory Sets CVPR 2020 [prediction, nuScenes]
- PnPNet: End-to-End Perception and Prediction with Tracking in the Loop CVPR 2020 [Uber ATG]
- DSDNet: Deep Structured self-Driving Network ECCV 2020 [Uber ATG]
- Temporal Coherence for Active Learning in Videos ICCV 2019 workshop
- Leveraging Pre-Trained 3D Object Detection Models For Fast Ground Truth Generation ITSC 2018 [UToronto, autolabeling]
- Learning Multi-Object Tracking and Segmentation From Automatic Annotations CVPR 2020 [Autolabeling]
- Canonical Surface Mapping via Geometric Cycle Consistency ICCV 2019
- Ad推荐系统方向文章汇总
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction [Notes] (dimension reduction, better than t-SNE)
- Review Notes of Classical Key Points and Descriptors
- CRF
- Visual SLAM and Visual Odometry
- ORB SLAM
- Bundle Adjustment
- 3D vision
- SLAM/VIO学习总结
- Design Patterns