- PointLLM: Empowering Large Language Models to Understand Point Clouds [Paper] [Homepage] [Github]
- Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following [Paper] [Demo] [Github]
- 3D-LLM: Injecting the 3D World into Large Language Models (NeurIPS2023 Spotlight) (10TB Object data)[Paper] [Homepage] [Github]
- LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning[Paper] [Homepage] [Github]
- AN EMBODIED GENERALIST AGENT IN 3D WORLD[Paper] [Homepage] [Github]
- M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts[Paper] [Homepage]
- EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI [Paper] [Homepage]
- ODIN: A Single Model for 2D and 3D Perception[Paper] [Homepage]
- ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [Paper] [Github]
- ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding [Paper] [Github]
- OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding [Paper] [Github] [Homepage]
- CLIP 2 : Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [Paper] [Github]
- CLIP Goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition [Paper] [Github]
- CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training [Paper] [Github]
- Uni3D: Exploring Unified 3D Representation at Scale [Paper] [Github]
- MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation [Paper] [Github]
- OmniObject3D (CVPR 2023 Award Candidate): real-scanned 3D objects(6K), 190 classes [Paper] [Homepage]
- Objaverse-XL: 3D Objects(10M+) [Paper] [Homepage] [Dataset]
- Cap3D: 3D-Text pairs(660K) [Paper] [Download]
- ULIP - Objaverse Triplets: 3D Point Clouds(800K)-Images(10M)-Language(100M) Triplets, [Download]
- ULIP - ShapeNet Triplets: 3D Point Clouds(52.5K)-Images(3M)-Language(30M) Triplets,[Download]
- ScanRefer: 3D object localization in RGB-D scans using natural language
- SQA3D: 650 Scenes, 6.8K situations, 20.4k descriptions and 33.4k diverse reasoning questions for these situations[Paper] [Homepage]