Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu
⭐If this work is helpful for you, please help star this repo. Thanks!🤗
- 📄 [2024.07.08] Paper preprint released!
- 💾 [2024.12.02] Codebase and model checkpoints are now available.
- 🏁 [2025.01.16] Training code for the KITTI dataset has been released.
- 🏆 [2025.06.26] Our paper has been accepted to ICCV 2025!
Clone this repo with submodules
git clone https://github.com/LabShuHangGU/PerLDiff.git
The code is tested with Pytorch==1.12.0
and cuda 11.3
on V100 servers. To setup the python environment, follow:
Clone this repository, and we use pytorch1.12.0+cu113 in V100, CUDA 11.3:
conda create -n perldiff python=3.8 -y
conda activate perldiff
pip install albumentations==0.4.3 opencv-python pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2
pip install pytorch-lightning==1.4.2 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 einops==0.3.0 torch-fidelity==0.3.0 timm
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install protobuf torchmetrics==0.6.0 transformers==4.19.2 kornia==0.5.8 ftfy regex tqdm
# git+https://github.com/openai/CLIP.git
cd ./CLIP
pip install .
cd ../
# pip install git+https://github.com/openai/CLIP.git
pip install nuscenes-devkit tensorboardX efficientnet_pytorch==0.7.0 scikit-image==0.18.0 ipdb gradio
# use "-i https://mirrors.aliyun.com/pypi/simple/" for pip install will be faster
We prepare the nuScenes dataset similarly to the instructions in BEVFormer. Specifically, follow these steps:
-
Download the nuScenes dataset from the official website and place it in the
./DATA/
directory.You should have the following directory structure:
DATA/nuscenes
├── maps
├── samples
├── v1.0-test
└── v1.0-trainval
There are two options to prepare the samples_road_map
:
Option 1: Use the provided script (time-consuming, not recommended)
-
Run the following Python script to download and prepare the road map:
python scripts/get_nusc_road_map.py
Option 2: Download from Hugging Face (recommended)
-
Alternatively, you can download the
samples_road_map
from Hugging Face here.After downloading the
samples_road_map.tar.gz
file, extract it using the following command:tar -xzf samples_road_map.tar.gz
Finally, you should have these files:
DATA/nuscenes
├── maps
├── samples
├── samples_road_map
├── v1.0-test
└── v1.0-trainval
Before training, download provided pretrained checkpoint on Hugging Face. Finally, you should have these checkpoints:
PerLDiff/
openai
DATA/
├── nuscenes
├── convnext_tiny_1k_224_ema.pth
├── sd-v1-4.ckpt
A training script for reference is provided in bash_run_train.sh
.
export TOKENIZERS_PARALLELISM=false
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
OMP_NUM_THREADS=16 torchrun \
--nproc_per_node=8 main.py \
--training \
--yaml_file=configs/nusc_text.yaml \
--batch_size=2 \
--name=nusc_train_256x384_perldiff_bs2x8 \
--guidance_scale_c=5 \
--step=50 \
--official_ckpt_name=sd-v1-4.ckpt \
--total_iters=60000 \
--save_every_iters=6000 \
Before testing, download provided PerLDiff checkpoint on Hugging Face. You should have these checkpoints:
PerLDiff/
openai
DATA/
├── nuscenes
├── convnext_tiny_1k_224_ema.pth
├── perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth
├── sd-v1-4.ckpt
A testing script for reference is provided in bash_run_test.sh
.
export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
--nproc_per_node=2 main.py \
--validation \
--yaml_file=configs/nusc_text.yaml \
--batch_size=2 \
--name=nusc_test_256x384_perldiff_bs2x8 \
--guidance_scale_c=5 \
--step=50 \
--official_ckpt_name=sd-v1-4.ckpt \
--total_iters=60000 \
--save_every_iters=6000 \
--val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \
If you want to use Hugging Face Gradio, you can run the script:
bash bash_run_gradio.sh
Before testing FID, you should generate the validation dataset using bash_run_gen.sh
.
export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
--nproc_per_node=2 main.py \
--generation \
--yaml_file=configs/nusc_text_with_path.yaml \
--batch_size=4 \
--name=nusc_test_256x384_perldiff_bs2x8 \
--guidance_scale_c=5 \
--step=50 \
--official_ckpt_name=sd-v1-4.ckpt \
--total_iters=60000 \
--save_every_iters=6000 \
--val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \
--gen_path=val_ddim50w5_256x384_perldiff_bs2x8 \
We provide two methods for measuring FID:
Option 1: Using clean_fid
-
The FID calculated by this method tends to be higher. First, you need to process the NuScenes real validation dataset and save it as 256x384 images:
python scripts/get_nusc_real_img.py
Then, calculate the FID:
pip install clean-fid python FID/cleanfid_test_fid.py val_ddim50w5_256x384_perldiff_bs2x8/samples samples_real_256x384/samples
Option 2: Using the method provided by MagicDrive
-
This method requires modifications to the MagicDrive code:
- Copy the generated data
val_ddim50w5_256x384_perldiff_bs2x8/
toMagicDrive/data/nuscenes
- Copy
FID/configs_256x384
to the working directoryMagicDrive/configs_256x384
- Copy
FID/fid_score_384.py
toMagicDrive/tools/fid_score_384.py
- Copy the generated data
-
Then, run
FID/fid_test.sh
@article{zhang2024perldiff,
title={PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models},
author={Zhang, Jinhua and Sheng, Hualian and Cai, Sijia and Deng, Bing and Liang, Qiao and Li, Wen and Fu, Ying and Ye, Jieping and Gu, Shuhang},
journal={arXiv preprint arXiv:2407.06109},
year={2024}
}
https://github.com/gligen/GLIGEN/
https://github.com/fundamentalvision/BEVFormer
https://github.com/cure-lab/MagicDrive/
https://github.com/mit-han-lab/bevfusion
https://github.com/bradyz/cross_view_transformers
If you have any questions, feel free to contact me through email ([email protected]).