Skip to content

Asap7772/Personalized-Text-To-Image-Diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personalized Preference Fine-tuning of Diffusion Models

This is the public implementation of Personalized Preference Fine-tuning of Diffusion Models (PPD), accepted to CVPR 2025. PPD is a comprehensive framework for personalized text-to-image generation. This project adapts from the LLaVA-NeXT repository.

Note: This codebase includes only the VLM component (Stage 1 in Figure 1 of the paper). The diffusion fine-tuning component (Stage 2) was developed during an industry internship and is not open-sourced.

📁 Project Structure

personalized-t2i/
├── user_classification/        # User preference classification models
│   └── user_classifier.py     # Neural network classifier for user identification
├── llava_embeddings/          # LLaVA-based embedding generation
│   ├── pick_a_pick_user_emb.py    # Generate user embeddings from preference pairs
│   └── pick_a_pick_user_cond.py   # Conditional user embedding generation
├── eval/                      # Evaluation frameworks
│   ├── eval_winrate_gpt4o.py    # GPT-4o-based win rate evaluation
│   ├── eval_winrate_gpt4o_userdesc.py  # User description evaluation
│   └── eval_winrate.py        # Standard win rate evaluation
├── scripts/                   # Shell scripts for automation
│   ├── run_user_classify.sh   # Run user classification training
│   ├── gen_emb.sh            # Generate embeddings
│   ├── eval_winrate_gpt4o.sh    # Run GPT 4o evaluations
│   ├── eval_winrate.sh       # Run standard evaluations
│   └── eval_winrate_gpt4o_userdesc.sh
├── requirements.txt           # Python dependencies
└── LICENSE                   # Apache 2.0 License

🛠️ Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • 16GB+ RAM

Setup

  1. Clone the repository:

    git clone <your-repo-url>
    cd personalized-t2i
  2. Create and activate a virtual environment:

    conda create -n personalized-t2i python=3.10 -y
    conda activate personalized-t2i
  3. Install dependencies:

    pip install --upgrade pip
    pip install -e ".[train]"

    Or install from requirements.txt:

    pip install -r requirements.txt

🎯 Key Components

1. User Classification (user_classification/)

The user classifier learns to identify users based on their image preference patterns using deep neural networks.

Usage:

./scripts/run_user_classify.sh

2. LLaVA Embeddings (llava_embeddings/)

Generates rich multimodal embeddings using LLaVA-Next models to understand user preferences.

Usage:

./scripts/gen_emb.sh

3. Evaluation Framework (eval/)

Evaluation suite for measuring personalization performance.

Usage:

# Standard win rate evaluation
./scripts/eval_winrate.sh

# GPT 4o based evaluation
./scripts/eval_winrate_gpt4o.sh

# User description evaluation
./scripts/eval_winrate_gpt4o_userdesc.sh

📊 Datasets

This project works with the Pick-a-Pic dataset, which contains user preference annotations for image pairs.

🎨 Usage Examples

Training a User Classifier

python user_classification/user_classifier.py \
    --dataset_name "your_dataset" \
    --batch_size 32 \
    --epochs 10 \
    --learning_rate 1e-3 \
    --wandb_project "personalized_t2i"

Generating User Embeddings

python llava_embeddings/pick_a_pick_user_emb.py \
    --num_shots 4 \
    --pretrained "lmms-lab/llava-onevision-qwen2-7b-ov-chat" \
    --output_dir "./data"

Running Evaluations

python eval/eval_winrate_gpt4o.py \
    --dataset_name "your_test_dataset" \
    --model_name "gpt-4o-mini" \
    --include_cot

📝 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

  • LLaVA-NeXT Team: This project adapts from the excellent LLaVA-NeXT repository
  • Pick-a-Pic Dataset: Thanks to the creators of the Pick-a-Pic preference dataset
  • Hugging Face: For providing the model hosting and dataset infrastructure
  • OpenAI: For GPT models used in evaluation

📚 Citation

If you use this work in your research, please cite:

@InProceedings{dang2025personalized,
    author    = {Dang, Meihua and Singh, Anikait and Zhou, Linqi and Ermon, Stefano and Song, Jiaming},
    title     = {Personalized Preference Fine-tuning of Diffusion Models},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {8020-8030}
}

🔗 Related Work


For questions or support, please open an issue on GitHub or contact the maintainers.

About

Public Implementation of PPD

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •