Personalized Preference Fine-tuning of Diffusion Models

This is the public implementation of Personalized Preference Fine-tuning of Diffusion Models (PPD), accepted to CVPR 2025. PPD is a comprehensive framework for personalized text-to-image generation. This project adapts from the LLaVA-NeXT repository.

Note: This codebase includes only the VLM component (Stage 1 in Figure 1 of the paper). The diffusion fine-tuning component (Stage 2) was developed during an industry internship and is not open-sourced.

📁 Project Structure

personalized-t2i/
├── user_classification/        # User preference classification models
│   └── user_classifier.py     # Neural network classifier for user identification
├── llava_embeddings/          # LLaVA-based embedding generation
│   ├── pick_a_pick_user_emb.py    # Generate user embeddings from preference pairs
│   └── pick_a_pick_user_cond.py   # Conditional user embedding generation
├── eval/                      # Evaluation frameworks
│   ├── eval_winrate_gpt4o.py    # GPT-4o-based win rate evaluation
│   ├── eval_winrate_gpt4o_userdesc.py  # User description evaluation
│   └── eval_winrate.py        # Standard win rate evaluation
├── scripts/                   # Shell scripts for automation
│   ├── run_user_classify.sh   # Run user classification training
│   ├── gen_emb.sh            # Generate embeddings
│   ├── eval_winrate_gpt4o.sh    # Run GPT 4o evaluations
│   ├── eval_winrate.sh       # Run standard evaluations
│   └── eval_winrate_gpt4o_userdesc.sh
├── requirements.txt           # Python dependencies
└── LICENSE                   # Apache 2.0 License

🛠️ Installation

Prerequisites

Python 3.8+
CUDA-compatible GPU (recommended)
16GB+ RAM

Setup

Clone the repository:

git clone <your-repo-url>
cd personalized-t2i

Create and activate a virtual environment:

conda create -n personalized-t2i python=3.10 -y
conda activate personalized-t2i

Install dependencies:

pip install --upgrade pip
pip install -e ".[train]"

Or install from requirements.txt:

pip install -r requirements.txt

🎯 Key Components

1. User Classification (`user_classification/`)

The user classifier learns to identify users based on their image preference patterns using deep neural networks.

Usage:

./scripts/run_user_classify.sh

2. LLaVA Embeddings (`llava_embeddings/`)

Generates rich multimodal embeddings using LLaVA-Next models to understand user preferences.

Usage:

./scripts/gen_emb.sh

3. Evaluation Framework (`eval/`)

Evaluation suite for measuring personalization performance.

Usage:

# Standard win rate evaluation
./scripts/eval_winrate.sh

# GPT 4o based evaluation
./scripts/eval_winrate_gpt4o.sh

# User description evaluation
./scripts/eval_winrate_gpt4o_userdesc.sh

📊 Datasets

This project works with the Pick-a-Pic dataset, which contains user preference annotations for image pairs.

🎨 Usage Examples

Training a User Classifier

python user_classification/user_classifier.py \
    --dataset_name "your_dataset" \
    --batch_size 32 \
    --epochs 10 \
    --learning_rate 1e-3 \
    --wandb_project "personalized_t2i"

Generating User Embeddings

python llava_embeddings/pick_a_pick_user_emb.py \
    --num_shots 4 \
    --pretrained "lmms-lab/llava-onevision-qwen2-7b-ov-chat" \
    --output_dir "./data"

Running Evaluations

python eval/eval_winrate_gpt4o.py \
    --dataset_name "your_test_dataset" \
    --model_name "gpt-4o-mini" \
    --include_cot

📝 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

LLaVA-NeXT Team: This project adapts from the excellent LLaVA-NeXT repository
Pick-a-Pic Dataset: Thanks to the creators of the Pick-a-Pic preference dataset
Hugging Face: For providing the model hosting and dataset infrastructure
OpenAI: For GPT models used in evaluation

📚 Citation

If you use this work in your research, please cite:

@InProceedings{dang2025personalized,
    author    = {Dang, Meihua and Singh, Anikait and Zhou, Linqi and Ermon, Stefano and Song, Jiaming},
    title     = {Personalized Preference Fine-tuning of Diffusion Models},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {8020-8030}
}

🔗 Related Work

For questions or support, please open an issue on GitHub or contact the maintainers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personalized Preference Fine-tuning of Diffusion Models

📁 Project Structure

🛠️ Installation

Prerequisites

Setup

🎯 Key Components

1. User Classification (`user_classification/`)

2. LLaVA Embeddings (`llava_embeddings/`)

3. Evaluation Framework (`eval/`)

📊 Datasets

🎨 Usage Examples

Training a User Classifier

Generating User Embeddings

Running Evaluations

📝 License

🙏 Acknowledgments

📚 Citation

🔗 Related Work

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
eval		eval
llava_embeddings		llava_embeddings
scripts		scripts
user_classification		user_classification
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Asap7772/Personalized-Text-To-Image-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Personalized Preference Fine-tuning of Diffusion Models

📁 Project Structure

🛠️ Installation

Prerequisites

Setup

🎯 Key Components

1. User Classification (user_classification/)

2. LLaVA Embeddings (llava_embeddings/)

3. Evaluation Framework (eval/)

📊 Datasets

🎨 Usage Examples

Training a User Classifier

Generating User Embeddings

Running Evaluations

📝 License

🙏 Acknowledgments

📚 Citation

🔗 Related Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. User Classification (`user_classification/`)

2. LLaVA Embeddings (`llava_embeddings/`)

3. Evaluation Framework (`eval/`)

Packages