Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

Zehao Wang¹, Minye Wu¹, Yixin Cao⁴, Yubo Ma³, Meiqi Chen², Tinne Tuytelaars¹

¹ESAT-PSI, KU Leuven, ²Peking University,
³Nanyang Technological University, ⁴Fudan University

Abstract

This study presents a novel evaluation framework for the Vision-Language Navigation (VLN) task. It aims to diagnose current models for various instruction categories at a finer-grained level. The framework is structured around the context-free grammar (CFG) of the task. The CFG serves as the basis for the problem decomposition and the core premise of the instruction categories design. We propose a semi-automatic method for CFG construction with the help of Large-Language Models (LLMs). Then, we induct and generate data spanning five principal instruction categories (i.e. direction change, landmark recognition, region recognition, vertical movement, and numerical comprehension). Our analysis of different models reveals notable performance discrepancies and recurrent issues. The stagnation of numerical comprehension, heavy selective biases over directional concepts, and other interesting findings contribute to the development of future language-guided navigation systems. A brief introduction of the project is available here.

Environment Setup

To evaluate on R2R-style dataset, you need to set up the Matterport3DSimulator. Please follow the instructions provided in its official repository for installation and configuration.

NavNuances Dataset

Download NavNuances data v1 from link

A. Trajectory Prediction

We follow the R2R naming convention. To include the trajectory predictions of the NavNuances splits, simply specify the split names in the validation code of standard VLN methods trained on R2R.

We provide an example of setting up the DUET model to generate predictions for the NavNuances dataset. You can check the details in baselines/VLN-DUET.

B. Evaluation

The evaluator definitions are provided in the evaluation/evaluators directory. After generating the submission file in the standard R2R format for all NavNuances splits, modify the directories in evaluation/run_eval_template.sh. Then run:

cd evaluation
sh run_eval_template.sh

This will generate the evaluation results.

NavGPT4v Model

To predict using the NavGPT4v model, follow these steps:

Link Matterport3D scans v1/scans to baselines/navgpt4v/data/v1/scans
Place all the evaluation splits into baselines/navgpt4v/data/R2R/annotations
Set your OPENAI_API_KEY and OPENAI_ORGANIZATION environment variable.
Add the evaluation splits in the line 17 of baselines/navgpt4v/NavGPT4v.py.
Run the trajectory prediction script with the following command:

cd baselines/navgpt4v
DEBUG=1 sh run_pred.sh # dump intermediate observations as detailed at line 339 of navgpt4v/LLMs/openai4v.py
sh run_pred.sh  # no intermediate log

Citation

If you're using NavNuances in your research, please cite using the following BibTeX:

@inproceedings{wang2024navigating,
  title={Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation},
  author={Wang, Zehao and Wu, Minye and Cao, Yixin and Ma, Yubo and Chen, Meiqi and Tuytelaars, Tinne},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
  year={2024},
  publisher={Association for Computational Linguistics},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
baselines		baselines
dset_gen		dset_gen
evaluation		evaluation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

Abstract

Environment Setup

NavNuances Dataset

A. Trajectory Prediction

B. Evaluation

NavGPT4v Model

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zehao-wang/navnuances

Folders and files

Latest commit

History

Repository files navigation

Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

Abstract

Environment Setup

NavNuances Dataset

A. Trajectory Prediction

B. Evaluation

NavGPT4v Model

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages