Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Introduction
Dependencies
Preparation
Usage
Citation

News

2025-05-22: We release UAV-Flow, the first real-world benchmark for language-conditioned UAV imitation learning. (project page: https://prince687028.github.io/UAV-Flow)
2025-01-25: Paper, project page, code, data, envs and models are all released.

Introduction

This work presents _TOWARDS REALISTIC UAV VISION-LANGUAGE NAVIGATION: PLATFORM, BENCHMARK, AND METHODOLOGY_. We introduce a UAV simulation platform, an assistant-guided realistic UAV VLN benchmark, and an MLLM-based method to address the challenges in realistic UAV vision-language navigation.

Dependencies

Create `llamauav` environment

conda create -n llamauav python=3.10 -y
conda activate llamauav
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install LLaMA-UAV model

You can follow LLaMA-UAV to install the llm dependencies.

Install other dependencies listed in the requirements file

pip install -r requirement.txt

Additionally, to ensure compatibility with the AirSim Python API, apply the fix mentioned in the AirSim issue

Preparation

Data

To prepare the dataset, please follow the instructions provided in the Dataset Section to construct the dataset.

Model

GroundingDINO

Download the GroundingDINO model from the link groundingdino_swint_ogc.pth, and place the file in the directory src/model_wrapper/utils/GroundingDINO/.

LLaMA-UAV

To set up the model, refer to to the detailed Model Setup.

Simulator environments

Download the simulator environments for various maps from here.

The file directory of environments is as follows:

├── carla_town_envs
│   ├── Town01
│   ├── Town02
│   ├── Town03
│   ├── ...
├── closeloop_envs
│   ├── Engine
│   ├── ModularEuropean
│   ├── ModularEuropean.sh
│   ├── ModularPark
│   ├── ModularPark.sh
│   ├── ...
├── extra_envs
│   ├── BrushifyUrban
│   ├── BrushifyCountryRoads
│   ├── ...

Usage

setup simulator env server

Before running the simulations, ensure the AirSim environment server is properly configured.

Update the env executable paths env_exec_path_dict relative to root_path in AirVLNSimulatorServerTool.py.

cd airsim_plugin
python AirVLNSimulatorServerTool.py --port 30000 --root_path /path/to/your/envs

run close-loop simulation

Once the simulator server is running, you can execute the dagger or evaluation script.

# Dagger NYC
bash scripts/dagger_NYC.sh
Eval
bash scripts/eval.sh
bash scripts/metrics.sh

Paper

If you find this project useful, please consider citing: paper:

@misc{wang2024realisticuavvisionlanguagenavigation,
      title={Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology},
      author={Xiangyu Wang and Donglin Yang and Ziqin Wang and Hohin Kwan and Jinyu Chen and Wenjun Wu and Hongsheng Li and Yue Liao and Si Liu},
      year={2024},
      eprint={2410.07087},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.07087},
}

Acknowledgement

This repository is partly based on AirVLN and LLaMA-VID repositories.

--- Tranlated By Open Ai Tx | Last indexed: 2026-03-21 ---

TravelUAV

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Contents

News

Introduction

Dependencies

Create `llamauav` environment

Install LLaMA-UAV model

Install other dependencies listed in the requirements file

Preparation

Data

Model

GroundingDINO

LLaMA-UAV

Simulator environments

Usage

Eval

Paper

Acknowledgement

TravelUAV

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Contents

News

Introduction

Dependencies

Create llamauav environment

Install LLaMA-UAV model

Install other dependencies listed in the requirements file

Preparation

Data

Model

GroundingDINO

LLaMA-UAV

Simulator environments

Usage

Eval

Paper

Acknowledgement

Create `llamauav` environment