Skip to content

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation [EMNLP 2024 Findings]

License

Notifications You must be signed in to change notification settings

zehao-wang/navnuances

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

1ESAT-PSI, KU Leuven, 2Peking University,
3Nanyang Technological University, 4Fudan University

Abstract

This study presents a novel evaluation framework for the Vision-Language Navigation (VLN) task. It aims to diagnose current models for various instruction categories at a finer-grained level. The framework is structured around the context-free grammar (CFG) of the task. The CFG serves as the basis for the problem decomposition and the core premise of the instruction categories design. We propose a semi-automatic method for CFG construction with the help of Large-Language Models (LLMs). Then, we induct and generate data spanning five principal instruction categories (i.e. direction change, landmark recognition, region recognition, vertical movement, and numerical comprehension). Our analysis of different models reveals notable performance discrepancies and recurrent issues. The stagnation of numerical comprehension, heavy selective biases over directional concepts, and other interesting findings contribute to the development of future language-guided navigation systems. A brief introduction of the project is available here.

Environment Setup

To evaluate on R2R-style dataset, you need to set up the Matterport3DSimulator. Please follow the instructions provided in its official repository for installation and configuration.

NavNuances Dataset

Download NavNuances data v1 from link

A. Trajectory Prediction

We follow the R2R naming convention. To include the trajectory predictions of the NavNuances splits, simply specify the split names in the validation code of standard VLN methods trained on R2R.

We provide an example of setting up the DUET model to generate predictions for the NavNuances dataset. You can check the details in baselines/VLN-DUET.

B. Evaluation

The evaluator definitions are provided in the evaluation/evaluators directory. After generating the submission file in the standard R2R format for all NavNuances splits, modify the directories in evaluation/run_eval_template.sh. Then run:

cd evaluation
sh run_eval_template.sh

This will generate the evaluation results.

NavGPT4v Model

To predict using the NavGPT4v model, follow these steps:

  1. Link Matterport3D scans v1/scans to baselines/navgpt4v/data/v1/scans
  2. Place all the evaluation splits into baselines/navgpt4v/data/R2R/annotations
  3. Set your OPENAI_API_KEY and OPENAI_ORGANIZATION environment variable.
  4. Add the evaluation splits in the line 17 of baselines/navgpt4v/NavGPT4v.py.
  5. Run the trajectory prediction script with the following command:
cd baselines/navgpt4v
DEBUG=1 sh run_pred.sh # dump intermediate observations as detailed at line 339 of navgpt4v/LLMs/openai4v.py
sh run_pred.sh  # no intermediate log

Citation

If you're using NavNuances in your research, please cite using the following BibTeX:

@inproceedings{wang2024navigating,
  title={Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation},
  author={Wang, Zehao and Wu, Minye and Cao, Yixin and Ma, Yubo and Chen, Meiqi and Tuytelaars, Tinne},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
  year={2024},
  publisher={Association for Computational Linguistics},
}

About

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation [EMNLP 2024 Findings]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published