ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning
Official codebase for the paper "ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning".
🏆 IJCAI 2025 Travel Planning Challenge (TPC@IJCAI)
We are proud to announce that ChinaTravel has been selected as the official benchmark for the Travel Planning Challenge (TPC) @ IJCAI 2025!
Official Competition Website: https://chinatravel-competition.github.io/IJCAI2025/
Participants are invited to develop novel agents that can tackle real-world travel planning scenarios under complex constraints. This competition will showcase state-of-the-art approaches in language agent research.
📝 ChangeLog
2025.09
- Upload the champion solution of TPC@IJCAI2025 DSL track. Thanks the @evergreenee for their contributions.
2025.06
- Fix error collection in the evaluation code of commonsense.
- Fix pure-neuro agent's pipeline
- Fix load_datasets from huggingface
- Update exception handling in syntax verification
2025.05
- Update logs for the latest version.
- Provide the evaluation code for the TPC.
2025.04
- Added local data loader. Users can now load custom queries locally. When specifying non-default splits_name values (e.g., "abc") for "run_exp.py", the system will automatically load corresponding files from evaluation/default_splits/abc.txt, where the TXT file contains the target query filenames.
- Detailed constraints classification. See detailed docs at Evaluation README
- Introduced LLM-modulo baseline
- Support local LLMs inference with Qwen3-8B/4B.
🚀 Quick Start
⚙️ Setup
- Create a conda environment and install dependencies:
conda create -n chinatravel python=3.9
conda activate chinatravel
pip install -r requirements.txt - Download the database and unzip it to the "chinatravel/environment/" directory
- Download the open-source LLMs (optional).
bash download_llm.sh
- Download the tokenizers.
wget https://cdn.deepseek.com/api-docs/deepseek_v3_tokenizer.zip -P chinatravel/local_llm/
unzip chinatravel/local_llm/deepseek_v3_tokenizer.zip -d chinatravel/local_llm/▶️ Running
We support deepseek (official API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with Qwen (Qwen3-8B), llama, mistral (Mistral-7B-Instruct-v0.3), etc.
export OPENAI_API_KEY=""python run_exp.py --splits easy --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits medium --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits human --agent LLMNeSy --llm Qwen3-8B --oracle_translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek
python run_exp.py --splits human --agent LLMNeSy --llm Qwen3-8B
python run_exp.py --splits human --agent LLM-modulo --llm deepseek --refine_steps 10 --oracle_translation
python run_exp.py --splits human --agent LLM-modulo --llm Qwen3-8B --refine_steps 10 --oracle_translation
Note:
- The
--oracle_translationflag enables access to annotated ground truth including: hard_logic_py: Executable verification DSL codehard_logic_nl: The corresponding constraint descriptions- Example annotation structure:
{
"hard_logic_py": [
"
total_cost=0
for activity in allactivities(plan):
total_cost+=activity_cost(activity)
total_cost += innercity_transport_cost(activity_transports(activity))
result=(total_cost<=1000)
",
"
innercity_transport_set=set()
for activity in allactivities(plan):
if activity_transports(activity)!=[]:
innercity_transport_set.add(innercity_transport_type(activity_transports(activity)))
result=(innercity_transport_set<={'taxi'})
"
],
"hard_logic_nl": ["总预算为1800元", "市内交通选择taxi"],
}
``
- The LLM-module method requires oracle_translation mode for its symbolic refinement process
📊 Evaluation
bash
python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation
python eval_exp.py --splits human --method LLMNeSy_deepseek
python eval_exp.py --splits human --method LLM-modulo_deepseek_10steps_oracletranslation
python eval_exp.py --splits human --method LLM-modulo_Qwen3-8B_10steps_oracletranslationbash python eval_tpc.py --splits tpc_phase1 --method YOUR_METHOD_NAMEIn TPC@IJCAI2025, the evaluation code is provided in theeval_tpc.pyfile. You can run the evaluation code as follows:
python:chinatravel/agent/your_agent.py from .base import BaseAgentBaseAgent📚 Docs
🛠️ Advanced Development
1. Develop Your Own Agent Algorithm
To develop your own agent algorithm, you need to inherit the
class fromchinatravel/agent/base.pyand add the logic for your algorithm to theinit_agentfunction inchinatravel/agent/load_model.py. We provide an empty agent example namedTPCAgent.BaseAgentSteps:
- Inherit the
class: Create a new Python file in thechinatravel/agentdirectory and define your own agent class, inheriting fromBaseAgent.
class YourAgent(BaseAgent): def __init__(self, kwargs): super().__init__(kwargs) # Initialization logic
def act(self, observation): # Implement the decision - making logic of the agent pass
- Add code to the init_agent function: Open the chinatravel/agent/load_model.py file and add support for your new agent in the init_agent function.
python:
def init_agent(kwargs):
# ... existing code ...
elif kwargs["method"] == "YourMethodName":
agent = YourAgent(
kwargs
)
# ... existing code ...
return agent
2. Develop Your Own Local LLM
To develop your own local large-language model (LLM), you need to inherit the AbstractLLM class from chinatravel/agent/llms.py and add the corresponding local LLM inference code in llms.py. We provide an empty LLM example named TPCLLM.
Steps:
- Inherit the AbstractLLM class: Define your own LLM class in the chinatravel/agent/llms.py file, inheriting from AbstractLLM.
python
class YourLLM(AbstractLLM):
def __init__(self):
super().__init__()
# Initialization logic
self.name = "YourLLMName"def _get_response(self, messages, one_line, json_mode): # Implement the response logic of the LLM response = "Your LLM response" if json_mode: # Handle JSON mode pass elif one_line: # Handle one - line mode response = response.split("\n")[0] return response
- Add code to the init_agent function: Open the chinatravel/agent/load_model.py file and add support for your new llm in the init_llm function.
python:
def init_llm(kwargs):
# ... existing code ...
elif llm_name == "glm4-plus":
llm = YourLLM()
# ... existing code ...
return llm
3. Run Your Code Using Experiment Scripts
After completing the above development, you can use the experiment scripts to run your code.
Example of running:
bash
python run_tpc.py --splits easy --agent TPCAgent --llm TPCLLM
python run_exp.py --splits easy --agent YourMethodName --llm YourLLMName
@misc{shao2024chinatravelrealworldbenchmarklanguage, title={ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning}, author={Jie-Jing Shao and Xiao-Wen Yang and Bo-Wen Zhang and Baizhi Chen and Wen-Da Wei and Guohao Cai and Zhenhua Dong and Lan-Zhe Guo and Yu-feng Li}, year={2024}, eprint={2412.13682}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2412.13682}, } `` English | 简体中文 | 繁體中文 | 日本語 | 한국어 | हिन्दी | ไทย | Français | Deutsch | Español | Italiano | Русский | Português | Nederlands | Polski | العربية | فارسی | Türkçe | Tiếng Việt | Bahasa Indonesia | অসমীয়াThe results will be saved in theresults/YourMethodName_YourLLMName_xxxdirectory, e.g.,results/TPCAgent_TPCLLM.✉️ Contact
If you have any problems, please contact Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang.
📌 Citation
If our paper or related resources prove valuable to your research, we kindly ask for citation.
--- Tranlated By Open Ai Tx | Last indexed: 2025-10-17 ---