RouterLens: Eliciting and Leveraging your Specialized MoE Experts

📍 TL;DR

Experts in Mixture-of-Experts (MoE) LLMs have been shown to specialize in different aspects (e.g., domains, tasks, etc.). However, these specializations are often suppressed by the load-balancing constraint. To better elicit specialized experts, we introduce RouterLens, a lightweight tool that effectively identifies experts. We show its effectiveness in identifying experts specialized in leveraging context (i.e., context-faithful experts). Building on this, we propose Context-faithful Expert Fine-Tuning (CEFT) — a parameter-efficient tuning approach that achieves performance comparable to full fine-tuning while significantly reducing the number of trainable parameters.

🗺️ Table of Contents

📍 TL;DR
🗺️ Table of Contents
🎯 Quick Start
📋 Quantitative Results
⚙️ Internal Working of Context-faithful Experts
©️ License
🔖 Citation

🎯 Quick Start

Installation

Build RouterLens from the source and install dependencies:

❯ git clone https://github.com/bigai-nlco/RouterLens.git
❯ cd RouterLens
❯ conda env create -f environment.yml
❯ conda activate routerlens

Eliciting Context-faithful Experts

Run the router training with:

❯ ./run_router_tuning.sh

Count the activation frequency of experts and identify the top-activated ones as context-faithful experts with:

❯ ./run_exp_act_count.sh

Efficient Context-faithful Optimization

Run the context-faithful expert tuning with:

❯ ./run_ceft_tuning.sh

📋 Quantitative Results

Figure 1: Router tuning can significantly improve the performance of MoE on context-dependent tasks, indicating the presence of experts specialized in context utilization.

Figure 2: Masking the top-activated experts from the router-tuned (RT) model (i.e., context-faithful experts, CE) significantly degrades performance on context-dependent tasks.

Figure 3: CEFT can achieve performance comparable to full fine-tuning (FFT) while requiring significantly fewer trainable parameters.

⚙️ Internal Working of Context-faithful Experts

Figure 1: Layer-wise attention gain on context and answer (CAG and AAG) for the router-tuned model over the untuned model on the NQ-Swap test set.

Figure 2: Attention gain from context-faithful experts in OLMoE-1B-7B on an NQ-Swap example. At Layer 6 (left) and Layer 12 (right), i.e., mid-level layer and deeper layer, the router-tuned model progressively increases attention to the context and answer tokens (i.e., ``1964''), illustrating a ``think twice'' mechanism. Notably, the base model fails on this example, while the router-tuned model provides the correct answer.

Figure 3: Answer Probability Gain (APG) of the router-tuned models over their untuned counterparts on the NQ-Swap test set.

©️ License

RouterLens is licensed under the MIT License. You are free to use, modify, and distribute this project under the terms of the MIT license.

🔖 Citation

@article{bai2025routerlens,
      title={Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs}, 
      author={Jun Bai and Minghao Tong and Yang Liu and Zixia Jia and Zilong Zheng},
      year={2025},
      eprint={2508.19594},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.19594}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
data		data
models		models
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
evaluation.py		evaluation.py
exp_act_count.py		exp_act_count.py
run_ceft_tuning.sh		run_ceft_tuning.sh
run_evaluation.sh		run_evaluation.sh
run_exp_act_count.sh		run_exp_act_count.sh
run_router_tuning.sh		run_router_tuning.sh
train.py		train.py
utils_data.py		utils_data.py
utils_train.py		utils_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RouterLens: Eliciting and Leveraging your Specialized MoE Experts

📍 TL;DR

🗺️ Table of Contents

🎯 Quick Start

Installation

Eliciting Context-faithful Experts

Efficient Context-faithful Optimization

📋 Quantitative Results

⚙️ Internal Working of Context-faithful Experts

©️ License

🔖 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

bigai-nlco/RouterLens

Folders and files

Latest commit

History

Repository files navigation

RouterLens: Eliciting and Leveraging your Specialized MoE Experts

📍 TL;DR

🗺️ Table of Contents

🎯 Quick Start

Installation

Eliciting Context-faithful Experts

Efficient Context-faithful Optimization

📋 Quantitative Results

⚙️ Internal Working of Context-faithful Experts

©️ License

🔖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages