Skip to content

LabShuHangGU/PerLDiff

Repository files navigation

PerLDiff:Controllable Street View Synthesis Using Perspective-Layout Diffusion Models (ICCV 2025)

Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu

arXiv GitHub Stars

⭐If this work is helpful for you, please help star this repo. Thanks!🤗

News

  • 📄 [2024.07.08] Paper preprint released!
  • 💾 [2024.12.02] Codebase and model checkpoints are now available.
  • 🏁 [2025.01.16] Training code for the KITTI dataset has been released.
  • 🏆 [2025.06.26] Our paper has been accepted to ICCV 2025!

Setup

Installation

Clone this repo with submodules

git clone https://github.com/LabShuHangGU/PerLDiff.git

The code is tested with Pytorch==1.12.0 and cuda 11.3 on V100 servers. To setup the python environment, follow:

Clone this repository, and we use pytorch1.12.0+cu113 in V100, CUDA 11.3:

conda create -n perldiff python=3.8 -y

conda activate perldiff

pip install albumentations==0.4.3 opencv-python pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2 

pip install pytorch-lightning==1.4.2 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 einops==0.3.0 torch-fidelity==0.3.0 timm

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113

pip install protobuf torchmetrics==0.6.0 transformers==4.19.2 kornia==0.5.8 ftfy regex tqdm

# git+https://github.com/openai/CLIP.git 
cd ./CLIP
pip install .
cd ../
# pip install git+https://github.com/openai/CLIP.git
pip install nuscenes-devkit tensorboardX efficientnet_pytorch==0.7.0 scikit-image==0.18.0 ipdb gradio

# use "-i https://mirrors.aliyun.com/pypi/simple/" for pip install will be faster

Datasets

We prepare the nuScenes dataset similarly to the instructions in BEVFormer. Specifically, follow these steps:

1. Download the nuScenes Dataset

  • Download the nuScenes dataset from the official website and place it in the ./DATA/ directory.

    You should have the following directory structure:

        DATA/nuscenes
        ├── maps
        ├── samples
        ├── v1.0-test
        └── v1.0-trainval

2. Prepare samples_road_map

There are two options to prepare the samples_road_map:

Option 1: Use the provided script (time-consuming, not recommended)

  • Run the following Python script to download and prepare the road map:

    python scripts/get_nusc_road_map.py
    

Option 2: Download from Hugging Face (recommended)

  • Alternatively, you can download the samples_road_map from Hugging Face here.

    After downloading the samples_road_map.tar.gz file, extract it using the following command:

    tar -xzf samples_road_map.tar.gz

Finally, you should have these files:

        DATA/nuscenes
        ├── maps
        ├── samples
        ├── samples_road_map
        ├── v1.0-test
        └── v1.0-trainval

Training

Before training, download provided pretrained checkpoint on Hugging Face. Finally, you should have these checkpoints:

PerLDiff/
    openai
    DATA/
    ├── nuscenes
    ├── convnext_tiny_1k_224_ema.pth
    ├── sd-v1-4.ckpt

A training script for reference is provided in bash_run_train.sh.

export TOKENIZERS_PARALLELISM=false
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" 
OMP_NUM_THREADS=16 torchrun \
            --nproc_per_node=8 main.py \
            --training \
            --yaml_file=configs/nusc_text.yaml   \
            --batch_size=2 \
            --name=nusc_train_256x384_perldiff_bs2x8 \
            --guidance_scale_c=5 \
            --step=50 \
            --official_ckpt_name=sd-v1-4.ckpt \
            --total_iters=60000 \
            --save_every_iters=6000 \

Evaluation and Visualize

Before testing, download provided PerLDiff checkpoint on Hugging Face. You should have these checkpoints:

PerLDiff/
    openai
    DATA/
    ├── nuscenes
    ├── convnext_tiny_1k_224_ema.pth
    ├── perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth
    ├── sd-v1-4.ckpt

A testing script for reference is provided in bash_run_test.sh.

export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
            --nproc_per_node=2 main.py \
            --validation \
            --yaml_file=configs/nusc_text.yaml   \
            --batch_size=2 \
            --name=nusc_test_256x384_perldiff_bs2x8 \
            --guidance_scale_c=5 \
            --step=50 \
            --official_ckpt_name=sd-v1-4.ckpt \
            --total_iters=60000 \
            --save_every_iters=6000 \
            --val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \

If you want to use Hugging Face Gradio, you can run the script:

bash bash_run_gradio.sh

Test FID

Before testing FID, you should generate the validation dataset using bash_run_gen.sh.

export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
            --nproc_per_node=2 main.py \
            --generation \
            --yaml_file=configs/nusc_text_with_path.yaml   \
            --batch_size=4 \
            --name=nusc_test_256x384_perldiff_bs2x8 \
            --guidance_scale_c=5 \
            --step=50 \
            --official_ckpt_name=sd-v1-4.ckpt \
            --total_iters=60000 \
            --save_every_iters=6000 \
            --val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \
            --gen_path=val_ddim50w5_256x384_perldiff_bs2x8 \

We provide two methods for measuring FID:

Option 1: Using clean_fid

  • The FID calculated by this method tends to be higher. First, you need to process the NuScenes real validation dataset and save it as 256x384 images:

    python scripts/get_nusc_real_img.py

    Then, calculate the FID:

    pip install clean-fid
    python FID/cleanfid_test_fid.py val_ddim50w5_256x384_perldiff_bs2x8/samples samples_real_256x384/samples

Option 2: Using the method provided by MagicDrive

  • This method requires modifications to the MagicDrive code:

    • Copy the generated data val_ddim50w5_256x384_perldiff_bs2x8/ to MagicDrive/data/nuscenes
    • Copy FID/configs_256x384 to the working directory MagicDrive/configs_256x384
    • Copy FID/fid_score_384.py to MagicDrive/tools/fid_score_384.py
  • Then, run FID/fid_test.sh

Citation

@article{zhang2024perldiff,
  title={PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models},
  author={Zhang, Jinhua and Sheng, Hualian and Cai, Sijia and Deng, Bing and Liang, Qiao and Li, Wen and Fu, Ying and Ye, Jieping and Gu, Shuhang},
  journal={arXiv preprint arXiv:2407.06109},
  year={2024}
}

Related Repositories

https://github.com/gligen/GLIGEN/

https://github.com/fundamentalvision/BEVFormer

https://github.com/cure-lab/MagicDrive/

https://github.com/mit-han-lab/bevfusion

https://github.com/bradyz/cross_view_transformers

Contact

If you have any questions, feel free to contact me through email ([email protected]).

About

ICCV 2025-PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published