ICLR 2024
Pseudo-Generalized Dynamic View Synthesis from a Video, ICLR 2024.
Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing.
This code has been tested on Ubuntu 20.04 with CUDA 11.8 on NVIDIA A100-SXM4-80GB GPU (driver 470.82.01).
We recommend using conda
for virtual environment control and libmamba
for a faster dependency check.
# setup libmamba
conda install -n base conda-libmamba-solver -y
conda config --set solver libmamba
# create virtual environment
conda env create -f envs/pgdvs.yaml
conda activate pgdvs
conda install pytorch3d=0.7.4 -c pytorch3d -y
[optional] Run the following to install JAX if you want to
conda activate pgdvs
pip install -r envs/requirements_jax.txt --verbose
To check that JAX is installed correctly, run the following.
NOTE: the first import torch
is important since it will make sure that JAX finds the cuDNN installed by conda
.
conda activate pgdvs
python -c "import torch; from jax import random; key = random.PRNGKey(0); x = random.normal(key, (10,)); print(x)"
# this environment variable is used for demonstration
cd /path/to/this/repo
export PGDVS_ROOT=$PWD
Since we use third parties's pretrained models, we provide two ways to download them:
- Directly download from those official repositories;
- Download from our copy for reproducing results in the paper just in case those official repositories's checkpoints are modified in the future.
FLAG_ORIGINAL=1 # set to 0 if you want to download from our copy
bash ${PGDVS_ROOT}/scripts/download_ckpts.sh ${PGDVS_ROOT}/ckpts ${FLAG_ORIGINAL}
We use DAVIS as an example to illustrate how to render novel view from monocular videos in the wild. Please see IN_THE_WILD.md for details.
Please see BENCHMARK_NVIDIA.md and BENCHMARK_iPhone.md for details about reproducing results on NVIDIA Dynamic Scenes and DyCheck's iPhone Dataset in the paper.
Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing. Pseudo-Generalized Dynamic View Synthesis from a Video. ICLR 2024.
@inproceedings{Zhao2024PGDVS,
title={{Pseudo-Generalized Dynamic View Synthesis from a Video}},
author={Xiaoming Zhao and Alex Colburn and Fangchang Ma and Miguel Angel Bautista and Joshua M. Susskind and Alexander G. Schwing},
booktitle={ICLR},
year={2024},
}
This sample code is released under the LICENSE terms.
Our project is not possible without the following ones:
- GNT (commit
7b63996cb807dbb5c95ab6898e8093996588e73a
) - RAFT (commit
3fa0bb0a9c633ea0a9bb8a79c576b6785d4e6a02
) - OneFormer (commit
56799ef9e02968af4c7793b30deabcbeec29ffc0
) - segment-anything (commit
6fdee8f2727f4506cfbbe553e23b895e27956588
) - ZoeDepth (commit
edb6daf45458569e24f50250ef1ed08c015f17a7
) - TAPIR (commit
4ac6b2acd0aed36c0762f4247de9e8630340e2e0
) - CoTracker (commit
0a0596b277545625054cb041f00419bcd3693ea5
) - casualSAM (we use our modified version)
- dynamic-video-depth (we use our modified version)