This is the public implementation of Personalized Preference Fine-tuning of Diffusion Models (PPD), accepted to CVPR 2025. PPD is a comprehensive framework for personalized text-to-image generation. This project adapts from the LLaVA-NeXT repository.
Note: This codebase includes only the VLM component (Stage 1 in Figure 1 of the paper). The diffusion fine-tuning component (Stage 2) was developed during an industry internship and is not open-sourced.
personalized-t2i/
├── user_classification/ # User preference classification models
│ └── user_classifier.py # Neural network classifier for user identification
├── llava_embeddings/ # LLaVA-based embedding generation
│ ├── pick_a_pick_user_emb.py # Generate user embeddings from preference pairs
│ └── pick_a_pick_user_cond.py # Conditional user embedding generation
├── eval/ # Evaluation frameworks
│ ├── eval_winrate_gpt4o.py # GPT-4o-based win rate evaluation
│ ├── eval_winrate_gpt4o_userdesc.py # User description evaluation
│ └── eval_winrate.py # Standard win rate evaluation
├── scripts/ # Shell scripts for automation
│ ├── run_user_classify.sh # Run user classification training
│ ├── gen_emb.sh # Generate embeddings
│ ├── eval_winrate_gpt4o.sh # Run GPT 4o evaluations
│ ├── eval_winrate.sh # Run standard evaluations
│ └── eval_winrate_gpt4o_userdesc.sh
├── requirements.txt # Python dependencies
└── LICENSE # Apache 2.0 License
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
-
Clone the repository:
git clone <your-repo-url> cd personalized-t2i
-
Create and activate a virtual environment:
conda create -n personalized-t2i python=3.10 -y conda activate personalized-t2i
-
Install dependencies:
pip install --upgrade pip pip install -e ".[train]"
Or install from requirements.txt:
pip install -r requirements.txt
The user classifier learns to identify users based on their image preference patterns using deep neural networks.
Usage:
./scripts/run_user_classify.sh
Generates rich multimodal embeddings using LLaVA-Next models to understand user preferences.
Usage:
./scripts/gen_emb.sh
Evaluation suite for measuring personalization performance.
Usage:
# Standard win rate evaluation
./scripts/eval_winrate.sh
# GPT 4o based evaluation
./scripts/eval_winrate_gpt4o.sh
# User description evaluation
./scripts/eval_winrate_gpt4o_userdesc.sh
This project works with the Pick-a-Pic dataset, which contains user preference annotations for image pairs.
python user_classification/user_classifier.py \
--dataset_name "your_dataset" \
--batch_size 32 \
--epochs 10 \
--learning_rate 1e-3 \
--wandb_project "personalized_t2i"
python llava_embeddings/pick_a_pick_user_emb.py \
--num_shots 4 \
--pretrained "lmms-lab/llava-onevision-qwen2-7b-ov-chat" \
--output_dir "./data"
python eval/eval_winrate_gpt4o.py \
--dataset_name "your_test_dataset" \
--model_name "gpt-4o-mini" \
--include_cot
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- LLaVA-NeXT Team: This project adapts from the excellent LLaVA-NeXT repository
- Pick-a-Pic Dataset: Thanks to the creators of the Pick-a-Pic preference dataset
- Hugging Face: For providing the model hosting and dataset infrastructure
- OpenAI: For GPT models used in evaluation
If you use this work in your research, please cite:
@InProceedings{dang2025personalized,
author = {Dang, Meihua and Singh, Anikait and Zhou, Linqi and Ermon, Stefano and Song, Jiaming},
title = {Personalized Preference Fine-tuning of Diffusion Models},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {8020-8030}
}
- LLaVA-NeXT: Open Large Multimodal Models
- Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
- Visual Instruction Tuning
For questions or support, please open an issue on GitHub or contact the maintainers.