We introduce HieraNav, a hierarchical open-vocabulary goal navigation task, and LangMap, a human-verified benchmark providing region labels, discriminative descriptions, and navigation tasks across object, room, region, and instance levels.
LangMap covers all HM3D-Sem validation scenes, more than 400 object categories, and single-goal and mixed-level multi-goal navigation sequences. Each region and instance target is annotated with concise and detailed descriptions, enabling evaluation under different instruction styles.
We also introduce PlaNaVid, an RGB-only navigation baseline that uses bounded diverse memory without depth, 3D maps, oracle paths, or object masks.
Object, room, region, instance, and mixed-level multi-goal navigation.
Human-verified semantic labels and discriminative language annotations.
All HM3D-Sem validation scenes and over 400 open-vocabulary object categories.
Object: find any object of the target category, such as "Find the armchair."
Room: find the target object category in a room type, such as "Find the armchair in the bedroom."
Region: find the target object category in a described room instance, such as "Find the armchair in the bedroom that has a geometric rug."
Instance: find a specific object instance from its description, such as "Find the black leather armchair."
Qualitative example of PlaNaVid on multi-goal navigation.
@article{miao2026langmap,
title = {LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation},
author = {Bo Miao and Weijia Liu and Jun Luo and Lachlan Shinnick and Jian Liu and Thomas Hamilton-Smith and Yuhe Yang and Zijie Wu and Vanja Videnovic and Feras Dayoub and {Anton van den} Hengel},
journal = {arXiv preprint arXiv:2602.02220},
year = {2026}
}