Skip to content

H-Freax/ThinkGrasp

Repository files navigation

[CoRL 2024] ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Welcome to the official repository for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter.

arXiv YouTube GitHub
image

Table of Contents

To-Do List

  • Simulation Code Cleanup (without VLP)
  • Real-World Code Cleanup (without VLP)
  • Write a Complete README
  • Add Additional Documentation

Setup

Installation Requirements

  • Operating System: Ubuntu 23.04
  • Dependencies:
    • PyTorch: 1.13.1
    • Torchvision: 0.14.1
    • CUDA: 11.8
    • Pybullet (simulation environment)
  • Hardware: GTX 3090 x 2 (for the complete version)
    • Minimum Requirements:
      • Simulation: NVIDIA GTX 3090 (single GPU) with ~13GB GPU memory.
      • Real-World Execution: NVIDIA GTX 3090 with ~9.38GB GPU memory (LangSAM).
    • Recommended Setup:
      • Two NVIDIA GTX 3090 GPUs for best performance when running VLPart.

Installation Steps

  1. Create and Activate the Conda Environment:

    conda create -n thinkgrasp python=3.8
    conda activate thinkgrasp
  2. Install PyTorch and Torchvision:

    pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
  3. Allow Deprecated Scikit-learn:

    export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
  4. Install Additional Requirements:

    pip install -r requirements.txt
    pip install -r langsam.txt
  5. Develop Mode Installation:

    python setup.py develop
  6. Install PointNet2:

    cd models/graspnet/pointnet2
    python setup.py install
    cd ../knn
    python setup.py install
    cd ../../..
  7. Install CUDA 11.8:
    Download the CUDA installer and run:

    sudo bash cuda_11.8.0_520.61.05_linux.run

    Add the following lines to your ~/.bashrc file:

    export CUDA_HOME=/usr/local/cuda-11.8
    export PATH=$CUDA_HOME/bin:$PATH
    export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

    Refresh the shell:

    source ~/.bashrc

Assets

Download the processed object models from:

Place the downloaded files in the assets folder. Ensure the structure is as follows:

ThinkGrasp
└── assets
    ├── simplified_objects
    ├── unseen_objects_40
    └── unseen_objects

Running the Simulation

  1. Log in to WandB:

    wandb login
  2. Set Your OpenAI API Key:

    export OPENAI_API_KEY="sk-xxxxx"
  3. Start the Simulation:

    pip install protobuf==3.20.1
    python simulation_main.py
  4. Change Testing Data:
    Update the dataset directory in simulation_main.py by modifying line 238:

    parser.add_argument('--testing_case_dir', action='store', type=str, default='heavy_unseen/')

Running the Realworld Code

   pip install flask
   python realarm.py

Flask Application Notes:

  1. Flask Configuration: The Flask application is configured to run on:

    app.run(host='0.0.0.0', port=5000)

    This allows the app to be accessed from any network interface on port 5000.

  2. API Endpoint: The Flask application provides the following endpoint:

    POST http://localhost:5000/grasp_pose
    

    Payload Format:

    {
        "image_path": "/path/to/rgb/image.png",
        "depth_path": "/path/to/depth/image.png",
        "text_path": "/path/to/goal_text.txt"
    }
    • image_path: The path to the RGB image captured by the real-world camera connected to your robotic setup.
    • depth_path: The path to the depth image from the same real-world camera.
    • text_path: A text file containing the goal or task description.

Testing the API:

You can test the API using various tools:

Postman:
  1. Open Postman and create a new POST request.
  2. Set the URL to http://localhost:5000/grasp_pose.
  3. In the "Body" tab, select "raw" and set the type to JSON.
  4. Provide the JSON payload, ensuring the paths point to the images captured by your real-world camera:
    {
        "image_path": "/home/freax/camera_outputs/rgb_image.png",
        "depth_path": "/home/freax/camera_outputs/depth_image.png",
        "text_path": "/home/freax/goal_texts/task_goal.txt"
    }
  5. Click "Send" to test the endpoint.
Curl:

Alternatively, use curl in the terminal:

curl -X POST http://localhost:5000/grasp_pose \
-H "Content-Type: application/json" \
-d '{
    "image_path": "/home/freax/camera_outputs/rgb_image.png",
    "depth_path": "/home/freax/camera_outputs/depth_image.png",
    "text_path": "/home/freax/goal_texts/task_goal.txt"
}'
Python Script:

Use Python's requests library:

import requests

url = "http://localhost:5000/grasp_pose"
payload = {
    "image_path": "/home/freax/camera_outputs/rgb_image.png",
    "depth_path": "/home/freax/camera_outputs/depth_image.png",
    "text_path": "/home/freax/goal_texts/task_goal.txt"
}
response = requests.post(url, json=payload)
print(response.json())

Notes:

  • Ensure that the real-world camera is correctly configured and outputs the RGB and depth images to the specified paths (/home/freax/camera_outputs/ in the example).
  • If testing on a remote server, replace localhost with the server's IP address in your requests.
  • Verify that all files are accessible and correctly formatted for processing by the application.

Potential Issues of Installation

1. AttributeError: module 'numpy' has no attribute 'float'

  • Cause: Deprecated usage of numpy.float.
  • Solution:
    Update the problematic lines in the file (e.g., transforms3d/quaternions.py):
    _MAX_FLOAT = np.maximum_sctype(np.float64)
    _FLOAT_EPS = np.finfo(np.float64).eps

2. graspnetAPI Installation Issue

Error:

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn' rather than 'sklearn' for pip commands.

Solution:
Allow deprecated scikit-learn compatibility by exporting the following environment variable:

export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True

3. CUDA Compatibility Issue

Error:

RuntimeError: CUDA error: no kernel image is available for execution on the device.

Solution:
Ensure the installed PyTorch version matches your CUDA version. For CUDA 11.8, use:

pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

4. Additional Dependencies

If you still encounter errors, install the following dependencies:

  1. Install Python development tools:

    sudo apt-get install python3-dev
  2. Install GCC and G++ compilers via Conda:

    conda install gxx_linux-64
    conda install gcc_linux-64
  3. Install Ray and GroundingDINO:

    pip install ray
    pip install https://github.com/IDEA-Research/GroundingDINO/archive/refs/tags/v0.1.0-alpha2.tar.gz
  4. Clone and install GroundingDINO:

    cd langsam
    git clone https://github.com/IDEA-Research/GroundingDINO.git
    cd GroundingDINO
    pip install -e .

5. CUDA Installation

Install CUDA 11.8 using the downloaded installer:

sudo bash cuda_11.8.0_520.61.05_linux.run

Add the following lines to your ~/.bashrc file:

export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Refresh the shell:

source ~/.bashrc

6. Vision-Language Processing (VLP) Setup

If you plan to use Vision-Language Processing (VLP):

  1. Install additional requirements:

    pip install -r vlp_requirements.txt
  2. Download the required .pth files:

    cd VLP
    wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth
    wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
  3. Place the downloaded files in the appropriate directory (som/downloaddata).

Comparison with Vision-Language Grasping (VLG)

If you want to compare with VLG, download the repository from VLG GitHub and replace the test data and assets.


Citation

If you find this work useful, please consider citing:

@misc{qian2024thinkgrasp,
  title={ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter},
  author={Yaoyao Qian and Xupeng Zhu and Ondrej Biza and Shuo Jiang and Linfeng Zhao and Haojie Huang and Yu Qi and Robert Platt},
  year={2024},
  eprint={2407.11298},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

About

[CoRL2024] ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter. https://arxiv.org/abs/2407.11298

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published