Skip to content

GeWu-Lab/MokA

Repository files navigation

MokA

🚀 Quick Start

🛠️ Requirements and Installation

Basic Dependencies:

  • Python == 3.9
  • Pytorch == 2.1.0
  • transformers == 4.37.2
  • deepspeed == 0.12.6

🥑 Used pre-trained weights:

Multi-modal Encoder Weights:

LLM Weights:

🌴 Prepare datasets

In this repo, we take the audio-visual-text case as an example. Pretrain based on llama2-7b-chat-hf model.

  • Download image and video pretrain dataset from Video-LLaVA;
  • Download audio pretrain dataset from AudioCaps;
  • The used fine-tuning dataset is MUSIC-AVQA. Prepare the corresponding data and annotation Here.

Set the path of pretrain dataset at:

dataset/pretrain_dataset.py

Set the path of finetuning dataset at

dataset/unified_dataset.py

🔑 Training

replace necessary path of google-bert-base-uncased, clip-vit-large-patch14 and BEATs in:

models/multimodal_encoder.py
models/unified_arch.py

🔥 Stage 1: pre-train projectors

It takes about 24h to pre-train the visual projector, using 20 A100 40g GPUs:

sh scripts/pretrain/pretrain_visual.sh

It takes about 1h to pre-train the audio projector, using 16 A100 40g GPUs:

sh scripts/pretrain/pretrain_audio.sh

We also release our pre-trained projectors for llama2-7b-chat-hf: Download audio projector checkpoint, visual projector checkpoint.

🔥 Stage 2: fine-tuning

Set the path of pre-trained projectors of line 134-135 at:

sh scripts/finetune/finetune.py

Here we take MUSIC-AVQA as an example, and it takes about 5-6h, using 16 A100 40g GPUs:

sh scripts/finetune/ft.sh

🤖 Inference

Here we take MUSIC-AVQA as an example, run

sh scripts/finetune/infer.sh

🤖 Evaluation

Here we take MUSIC-AVQA as an example, run

python evaluation.py

📃 BibTeX

@article{wei2025moka,
  title={MokA: Multimodal Low-Rank Adaptation for MLLMs},
  author={Wei, Yake and Miao, Yu and Zhou, Dongzhan and Hu, Di},
  journal={arXiv preprint arXiv:2506.05191},
  year={2025}
}

About

MokA: Multimodal Low-Rank Adaptation for MLLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published