News

GIFT Tong Qin team publishes paper at IROS, a premier flagship conference on robotics: AVP Scene Graph: Hierarchical Visual Language Mapping and Navigation for Autonomous Valet Parking

Published at:2025-09-21

The Innovation Center of Intelligent Connected Electric Vehicles at Shanghai Jiao Tong University, under the guidance of Associate Professor Tong Qin, has published a research paper titled "AVP Scene Graph: Hierarchical Visual Language Mapping and Navigation for Autonomous Valet Parking" at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), a leading international conference on robotics. The paper primarily explores a visual-language scene graph designed for Autonomous Valet Parking (AVP) tasks, which effectively enhances the flexibility of AVP functions. The first author is Xiangru Mou, a 2024 Ph.D. candidate at GIFT, and the corresponding author is Associate Professor Tong Qin.

论文封面

Research Background

Autonomous Valet Parking has become a key application scenario for autonomous driving: drivers can complete automatic cruising control and automatic vehicle recall with just a smartphone command. The implementation of this function relies on vehicle-mounted sensors collecting data during the first entry into a parking lot and constructing a highly-precise parking map in advance. When the vehicle enters the same parking lot again, it can directly recall the existing map, specify a target parking space, and then activate the AVP function to achieve fully autonomous, safe, and precise parking from the entrance to the parking space.

Current Research

Existing parking maps face three major challenges:

1.Limited information dimensions, with only pre-defined categories (e.g., lane lines, arrows) and fail to sufficiently capture useful environmental information (e.g., signs, guide markers);

2.Poor readability and extensibility, hindering the intuitive interaction between the driver and the system;

3.Complex structure, resulting in low efficiency in map reading and updates.

Therefore, it is urgent to construct an AVP map that is well-structured, information-rich, efficient for retrieval, and suitable for human-machine interaction to address current technological gaps.

Research Results

To solve the above problems, this paper constructs a multi-layer scene map for AVP tasks: first, a visual language model is used to extract open-world semantics, which are embedded into the nodes of a vector map; then, the map is abstracted into scenes through bottom-up feature fusion, ultimately forming a clear and informative multi-layer scene map.

In addition, the paper proposes a top-down navigation method: first, a large model is used to achieve human-machine interaction, extracting navigation goals from the driver's instructions; then, the target is located in a top-down manner by leveraging the efficient retrieval of the graph structure; finally, the optimal route was generated from the path planning algorithm to achieve Autonomous Valet Parking.

图片1

Method Framework Diagram

As shown in the figure below, traditional maps exhibit shortcomings in restricted semantics and limited human-machine interaction; whereas the proposed method can extract navigation goals from human instructions such as "Navigate me to the garage exit" or "Park the car next to the elevator hall on B2 floor", achieving flexible and accurate as expected by drivers.

图片2

                       

 

Author Profile

牟相如

Xiangru Mou

2024 Ph.D. candidate of Global Institute of Future Technology, SJTU. Research interests: Automatic parking mapping, end-to-end autonomous driving.

陈丰毅

Fengyi Chen

2024 Ph.D. candidate of Global Institute of Future Technology, SJTU. Research interests: Robot imitation learning, reinforcement learning.

陈思源

Siyuan Chen

2023 Master's student of School of Automation and Intelligent Sensing, SJTU. Research interests: Autonomous driving planning and control, V2X systems.

秦通

Tong Qin

Associate Professor of Global Institute of Future Technology, SJTU. Prof Qin earned his Ph.D. degree from the Department of Electronic and Computer Engineering at the Hong Kong University of Science and Technology and previously worked at Intelligent Automotive Solution BU of Huawei. Qin was selected as one of Huawei's "Genius Youth." During his tenure as a perception and SLAM expert at Huawei, he contributed to the development of the Huawei ADS autonomous driving system, delivering lead-edge solutions that have been commercially deployed in multiple vehicle models. In recent years, he has published over ten high-quality papers as first/corresponding author in top-tier robotics journals and conferences such as TRO, JFR, RAL and ICRA. He received the IROS 2018 Best Student Paper Award and a TRO Best Paper Nomination Award. Research interests: Autonomous driving perception, mapping and localization; end-to-end AI large models; mobile robot SLAM.