Contents
News
- 2025-05-22: We release UAV-Flow, the first real-world benchmark for language-conditioned UAV imitation learning. (project page: https://prince687028.github.io/UAV-Flow)
- 2025-01-25: Paper, project page, code, data, envs and models are all released.
Introduction
This work presents _TOWARDS REALISTIC UAV VISION-LANGUAGE NAVIGATION: PLATFORM, BENCHMARK, AND METHODOLOGY_. We introduce a UAV simulation platform, an assistant-guided realistic UAV VLN benchmark, and an MLLM-based method to address the challenges in realistic UAV vision-language navigation.
Dependencies
Create llamauav environment
conda create -n llamauav python=3.10 -y
conda activate llamauav
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118Install LLaMA-UAV model
You can follow LLaMA-UAV to install the llm dependencies.
Install other dependencies listed in the requirements file
pip install -r requirement.txt
Additionally, to ensure compatibility with the AirSim Python API, apply the fix mentioned in the AirSim issuePreparation
Data
To prepare the dataset, please follow the instructions provided in the Dataset Section to construct the dataset.
Model
GroundingDINO
Download the GroundingDINO model from the link groundingdino_swint_ogc.pth, and place the file in the directory src/model_wrapper/utils/GroundingDINO/.
LLaMA-UAV
To set up the model, refer to to the detailed Model Setup.
Simulator environments
Download the simulator environments for various maps from here.
The file directory of environments is as follows:
├── carla_town_envs
│ ├── Town01
│ ├── Town02
│ ├── Town03
│ ├── ...
├── closeloop_envs
│ ├── Engine
│ ├── ModularEuropean
│ ├── ModularEuropean.sh
│ ├── ModularPark
│ ├── ModularPark.sh
│ ├── ...
├── extra_envs
│ ├── BrushifyUrban
│ ├── BrushifyCountryRoads
│ ├── ...Usage
- setup simulator env server
Update the env executable pathsenv_exec_path_dictrelative toroot_pathinAirVLNSimulatorServerTool.py.
cd airsim_plugin
python AirVLNSimulatorServerTool.py --port 30000 --root_path /path/to/your/envs- run close-loop simulation
# Dagger NYC
bash scripts/dagger_NYC.sh
Eval
bash scripts/eval.sh
bash scripts/metrics.shPaper
If you find this project useful, please consider citing: paper:
@misc{wang2024realisticuavvisionlanguagenavigation,
title={Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology},
author={Xiangyu Wang and Donglin Yang and Ziqin Wang and Hohin Kwan and Jinyu Chen and Wenjun Wu and Hongsheng Li and Yue Liao and Si Liu},
year={2024},
eprint={2410.07087},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.07087},
}Acknowledgement
This repository is partly based on AirVLN and LLaMA-VID repositories.
--- Tranlated By Open Ai Tx | Last indexed: 2026-03-21 ---