VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

This repository is the official implementation of VideoLifter, an efficient SfM-free framework for lifting videos into 3D using hierarchical stereo alignment.

Get Started

Installation

Clone InstantSplat and download pre-trained model.

git clone --recursive https://github.com/VITA-Group/VideoLifter.git
cd VideoLifter
mkdir -p submodules/mast3r/checkpoints/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P submodules/mast3r/checkpoints/

Create the environment.

conda create -n videolifter python=3.10 cmake=3.14.0 -y
conda activate videolifter
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install submodules/simple-knn
pip install submodules/diff-gaussian-rasterization-confidence

Optional but highly suggested, compile the cuda kernels for RoPE (as in CroCo v2).

# DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace

Data preparation

DATAROOT is ./data by default. Please first make data folder by mkdir data.

Tanks and Temples

Download the data preprocessed by Nope-NeRF as below, and the data is saved into the ./data/Tanks folder.
```
wget https://www.robots.ox.ac.uk/~wenjing/Tanks.zip
```
CO3D

We follow CF-3DGS and select the same 10 scenes from CO3D dataset. Download our preprocessed data, and put it saved into the ./data/co3d folder.

Usage

For training and evalution on Tanks and Temples, command:

  bash scripts/train_tt.sh

For CO3D, command:

  bash scripts/train_co3d.sh

Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

Gaussian-Splatting
DUSt3R
MASt3R

Citation

If you find our work useful in your research, please consider giving a star ⭐ and citing the following paper 📝.

@misc{cong2025videolifter,
      title={VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment}, 
      author={Wenyan Cong and Kevin Wang and Jiahui Lei and Colton Stearns and Yuanhao Cai and Dilin Wang and Rakesh Ranjan and Matt Feiszli and Leonidas Guibas and Zhangyang Wang and Weiyao Wang and Zhiwen Fan},
      year={2025},
      eprint={2501.01949},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.01949}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

Table of Contents

Get Started

Installation

Data preparation

Usage

Acknowledgement

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

Table of Contents

Get Started

Installation

Data preparation

Usage

Acknowledgement

Citation