Awesome-3D-Perception

3D Perception

Object-level

PointLLM: Empowering Large Language Models to Understand Point Clouds [Paper] [Homepage] [Github]
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following [Paper] [Demo] [Github]

Scenes-level

3D-LLM: Injecting the 3D World into Large Language Models (NeurIPS2023 Spotlight) (10TB Object data)[Paper] [Homepage] [Github]
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning[Paper] [Homepage] [Github]
AN EMBODIED GENERALIST AGENT IN 3D WORLD[Paper] [Homepage] [Github]
M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts[Paper] [Homepage]
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI [Paper] [Homepage]
ODIN: A Single Model for 2D and 3D Perception[Paper] [Homepage]

3D With CLIP

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [Paper] [Github]
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding [Paper] [Github]
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding [Paper] [Github] [Homepage]
CLIP ² : Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [Paper] [Github]
CLIP Goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition [Paper] [Github]
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training [Paper] [Github]
Uni3D: Exploring Unified 3D Representation at Scale [Paper] [Github]
MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation [Paper] [Github]

3D-Dataset

Object-level

OmniObject3D (CVPR 2023 Award Candidate): real-scanned 3D objects(6K), 190 classes [Paper] [Homepage]
Objaverse-XL: 3D Objects(10M+) [Paper] [Homepage] [Dataset]
Cap3D: 3D-Text pairs(660K) [Paper] [Download]
ULIP - Objaverse Triplets: 3D Point Clouds(800K)-Images(10M)-Language(100M) Triplets, [Download]
ULIP - ShapeNet Triplets: 3D Point Clouds(52.5K)-Images(3M)-Language(30M) Triplets,[Download]

Scene-level

ScanRefer: 3D object localization in RGB-D scans using natural language
SQA3D: 650 Scenes, 6.8K situations, 20.4k descriptions and 33.4k diverse reasoning questions for these situations[Paper] [Homepage]

Survey

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation [Paper]
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Awesome-3D-Perception

3D Perception

Object-level

Scenes-level

3D With CLIP

3D-Dataset

Object-level

Scene-level

Survey

About

Releases

Packages

Yioutpi/Awesome-3D-Perception

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Awesome-3D-Perception

3D Perception

Object-level

Scenes-level

3D With CLIP

3D-Dataset

Object-level

Scene-level

Survey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages