Welcome to the official repository for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter.
- To-Do List
- Setup
- Assets
- Running the Simulation
- Running the Realworld Code
- Potential Issues of Installation
- Citation
- Simulation Code Cleanup (without VLP)
- Real-World Code Cleanup (without VLP)
- Write a Complete README
- Add Additional Documentation
- Operating System: Ubuntu 23.04
- Dependencies:
- PyTorch: 1.13.1
- Torchvision: 0.14.1
- CUDA: 11.8
- Pybullet (simulation environment)
- Hardware: GTX 3090 x 2 (for the complete version)
- Minimum Requirements:
- Simulation: NVIDIA GTX 3090 (single GPU) with ~13GB GPU memory.
- Real-World Execution: NVIDIA GTX 3090 with ~9.38GB GPU memory (LangSAM).
- Recommended Setup:
- Two NVIDIA GTX 3090 GPUs for best performance when running VLPart.
- Minimum Requirements:
-
Create and Activate the Conda Environment:
conda create -n thinkgrasp python=3.8 conda activate thinkgrasp
-
Install PyTorch and Torchvision:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
-
Allow Deprecated Scikit-learn:
export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
-
Install Additional Requirements:
pip install -r requirements.txt pip install -r langsam.txt
-
Develop Mode Installation:
python setup.py develop
-
Install PointNet2:
cd models/graspnet/pointnet2 python setup.py install cd ../knn python setup.py install cd ../../..
-
Install CUDA 11.8:
Download the CUDA installer and run:sudo bash cuda_11.8.0_520.61.05_linux.run
Add the following lines to your
~/.bashrc
file:export CUDA_HOME=/usr/local/cuda-11.8 export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
Refresh the shell:
source ~/.bashrc
Download the processed object models from:
Place the downloaded files in the assets
folder. Ensure the structure is as follows:
ThinkGrasp
└── assets
├── simplified_objects
├── unseen_objects_40
└── unseen_objects
-
Log in to WandB:
wandb login
-
Set Your OpenAI API Key:
export OPENAI_API_KEY="sk-xxxxx"
-
Start the Simulation:
pip install protobuf==3.20.1 python simulation_main.py
-
Change Testing Data:
Update the dataset directory insimulation_main.py
by modifying line 238:parser.add_argument('--testing_case_dir', action='store', type=str, default='heavy_unseen/')
pip install flask
python realarm.py
-
Flask Configuration: The Flask application is configured to run on:
app.run(host='0.0.0.0', port=5000)
This allows the app to be accessed from any network interface on port
5000
. -
API Endpoint: The Flask application provides the following endpoint:
POST http://localhost:5000/grasp_pose
Payload Format:
{ "image_path": "/path/to/rgb/image.png", "depth_path": "/path/to/depth/image.png", "text_path": "/path/to/goal_text.txt" }
- image_path: The path to the RGB image captured by the real-world camera connected to your robotic setup.
- depth_path: The path to the depth image from the same real-world camera.
- text_path: A text file containing the goal or task description.
You can test the API using various tools:
- Open Postman and create a new POST request.
- Set the URL to
http://localhost:5000/grasp_pose
. - In the "Body" tab, select "raw" and set the type to
JSON
. - Provide the JSON payload, ensuring the paths point to the images captured by your real-world camera:
{ "image_path": "/home/freax/camera_outputs/rgb_image.png", "depth_path": "/home/freax/camera_outputs/depth_image.png", "text_path": "/home/freax/goal_texts/task_goal.txt" }
- Click "Send" to test the endpoint.
Alternatively, use curl
in the terminal:
curl -X POST http://localhost:5000/grasp_pose \
-H "Content-Type: application/json" \
-d '{
"image_path": "/home/freax/camera_outputs/rgb_image.png",
"depth_path": "/home/freax/camera_outputs/depth_image.png",
"text_path": "/home/freax/goal_texts/task_goal.txt"
}'
Use Python's requests
library:
import requests
url = "http://localhost:5000/grasp_pose"
payload = {
"image_path": "/home/freax/camera_outputs/rgb_image.png",
"depth_path": "/home/freax/camera_outputs/depth_image.png",
"text_path": "/home/freax/goal_texts/task_goal.txt"
}
response = requests.post(url, json=payload)
print(response.json())
- Ensure that the real-world camera is correctly configured and outputs the RGB and depth images to the specified paths (
/home/freax/camera_outputs/
in the example). - If testing on a remote server, replace
localhost
with the server's IP address in your requests. - Verify that all files are accessible and correctly formatted for processing by the application.
- Cause: Deprecated usage of
numpy.float
. - Solution:
Update the problematic lines in the file (e.g.,transforms3d/quaternions.py
):_MAX_FLOAT = np.maximum_sctype(np.float64) _FLOAT_EPS = np.finfo(np.float64).eps
Error:
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn' rather than 'sklearn' for pip commands.
Solution:
Allow deprecated scikit-learn compatibility by exporting the following environment variable:
export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
Error:
RuntimeError: CUDA error: no kernel image is available for execution on the device.
Solution:
Ensure the installed PyTorch version matches your CUDA version. For CUDA 11.8, use:
pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
If you still encounter errors, install the following dependencies:
-
Install Python development tools:
sudo apt-get install python3-dev
-
Install GCC and G++ compilers via Conda:
conda install gxx_linux-64 conda install gcc_linux-64
-
Install Ray and GroundingDINO:
pip install ray pip install https://github.com/IDEA-Research/GroundingDINO/archive/refs/tags/v0.1.0-alpha2.tar.gz
-
Clone and install GroundingDINO:
cd langsam git clone https://github.com/IDEA-Research/GroundingDINO.git cd GroundingDINO pip install -e .
Install CUDA 11.8 using the downloaded installer:
sudo bash cuda_11.8.0_520.61.05_linux.run
Add the following lines to your ~/.bashrc
file:
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
Refresh the shell:
source ~/.bashrc
If you plan to use Vision-Language Processing (VLP):
-
Install additional requirements:
pip install -r vlp_requirements.txt
-
Download the required
.pth
files:cd VLP wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
-
Place the downloaded files in the appropriate directory (
som/downloaddata
).
If you want to compare with VLG, download the repository from VLG GitHub and replace the test data and assets.
If you find this work useful, please consider citing:
@misc{qian2024thinkgrasp,
title={ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter},
author={Yaoyao Qian and Xupeng Zhu and Ondrej Biza and Shuo Jiang and Linfeng Zhao and Haojie Huang and Yu Qi and Robert Platt},
year={2024},
eprint={2407.11298},
archivePrefix={arXiv},
primaryClass={cs.RO}
}