GIFT Associate Professor Songan Zhang's Team Publishes Paper on Traffic Regulation Knowledge-Driven Reward Design for Reinforcement Learning in Autonomous Driving
Associate Professor Songan Zhang at the Global Institute of Future Technology, Shanghai Jiao Tong University, has published a paper titled "ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving" in the journal IEEE Robotics and Automation Letters (RA-L). The paper addresses the challenge that reward functions in reinforcement learning (RL) for autonomous driving often fail to capture traffic responsibility and effectively comply with regulatory constraints. Researchers propose a novel method, ROAD, which integrates vision-language models, traffic regulation knowledge graphs, and mechanisms to determine accident responsibility. By incorporating principles of responsibility allocation from traffic regulations into the RL training process, the method enables the agent not only to improve task success rates but also to reduce collision risks for which the ego vehicle is primarily responsible. This offers a new direction for enhancing the safety, compliance, and social adaptability of autonomous driving decision-making models. The first author of the paper is Yongming Chen, a master's student at GIFT, and the corresponding author is Associate Professor Songan Zhang.

Research Background
Autonomous driving decision-making is a key challenge in intelligent connected ego vehicles. In recent years, imitation learning and reinforcement learning have been widely adopted for decision-making and planning learning in autonomous driving. Imitation learning replicates human driving behaviors, but it often suffers from insufficient generalization in long-tail scenarios, complex interactions, and out-of-distribution environments. In contrast, reinforcement learning enables agents to learn driving policies through trial-and-error interactions with the environment. It can continuously refine decision-making in simulation, showing strong potential in complex traffic scenarios.
However, the efficacy of RL depends heavily on the reward function. Traditional RL for autonomous driving typically incorporates factors such as collisions, boundary violations, goal arrival, speed, and lane-keeping into its reward functions. Yet such rewards often rely on manual design and lack fine-grained modeling of accident responsibility. For example, in complex traffic scenarios, a collision may not be entirely caused by the ego vehicle - some collisions result from ego vehicles violating yielding rules, while others may be caused by abnormal behavior of other traffic participants. Applying the same penalty for all collisions confuses two different situations: ‘active violation by the ego vehicle’ versus ‘passive exposure to risk’.
The collision penalty above can lead to two problems. If the collision penalty is too severe, the agent may learn an overly conservative ‘frozen robot’ policy that avoids any interaction to prevent collisions. Conversely, if the collision penalty is too mild, the agent may treat collisions as an acceptable stochastic cost in exchange for progress rewards. Therefore, enabling RL reward functions to better distinguish between accident responsibility and comply with traffic regulations is essential to improving the safety and compliance of autonomous driving policies.
Current Research
Existing studies have attempted to leverage large language models (LLMs) and vision-language models (VLMs) to provide richer semantic feedback for RL. For instance, VLMs can understand traffic scene images and generate interpretations or evaluations of driving behaviors based on text prompts. This opens new possibilities for reward design: models can assess whether an ego vehicle complies with traffic rules or shows risky behavior based on both image and language, thereby helping construct more semantic and context-aware reward signals.
However, VLMs still face hallucination and unstable reasoning, a prominent issue in safety-critical scenarios. For autonomous driving, if the model misinterprets right‑of‑way relationships, misjudges accident scenarios, or produces plausible but inaccurate responsibility determinations without regulatory grounding, it may deliver erroneous reward signals to the RL agent. Therefore, it is also essential to introduce structured, retrievable, and interpretable knowledge of traffic regulations.
Furthermore, most existing RL reward functions still focus on the occurrence of a collision rather than on determining the responsible party. Although such designs reduce crash rates to some extent, they struggle to guide agents toward behaviors that comply with traffic rules and social driving norms. For example, in intersections without signals and roundabouts, driving policies need not only to avoid collisions but also to understand yielding rules, right‑of‑way priorities, and responsibility attribution. Hence, the reward function design for RL in autonomous driving must evolve from "outcome‑based penalty" toward "responsibility‑awareness."
Research Results
To address these issues, the paper proposes the ROAD (Responsibility‑Oriented Reward Design for Autonomous Driving) framework, which integrates knowledge of traffic regulations, vision‑language model reasoning, and reinforcement learning reward design. The core idea of the framework is that when the ego vehicle is involved in a collision, instead of imposing a uniform penalty for all collisions, the system dynamically adjusts the penalty according to the ego vehicle’s degree of responsibility in the accident, enabling the reward function to reflect the responsibility‑allocation principles embedded in traffic regulations.

Figure 1: Overall framework of ROAD
The paper first constructs a Traffic Regulation Knowledge Graph (TRKG). This graph extracts key nodes—such as driving scenarios, applicable standards, and information on responsibility determination—from traffic regulation texts and organizes them systematically. For example, in typical scenarios like intersections and roundabouts, TRKG provides rule‑based grounds for way-giving, priority passage and responsibility attribution. The researchers employ LLMs to extract regulatory triples, forming an interpretable knowledge structure that supports responsibility reasoning.

Figure 2: TRKG stored in a Neo4j database
Second, the paper combines TRKG with a vision‑language model to establish a pipeline for determining accident responsibility. The system first uses the VLM to identify traffic scenes and ego vehicle interaction information from accident images, then retrieves relevant regulatory provisions and responsibility criteria from TRKG, and finally reasons about the ego vehicle's responsibility type (primary, shared, or secondary). This retrieval-augmented mechanism aligns responsibility judgments with traffic regulations and reduces hallucinations.
Experimental results show that compared to the baseline policy, ROAD achieves a task success rate of 73.2% at intersections (+8.2 percentage points) and 54.0% at roundabouts (+11.2 percentage points). Meanwhile, the proportion of collisions with primary responsibility by the ego vehicle decreases by 13.5 percentage points at intersections and 5.7 percentage points at roundabouts. These results demonstrate that responsibility‑oriented rewards not only enhance model performance but also effectively reduce collisions for which the ego vehicle is primarily responsible.
Furthermore, the results of ablation studies and sensitivity analyses indicate that introducing TRKG‑RAG significantly improves judgment accuracy relative to a baseline using only GPT‑4o, demonstrating that structured traffic regulation knowledge enhances the reliability of responsibility reasoning. Analysis of the penalty scaling factor also reveals that simply increasing collision penalties is not equivalent to responsibility‑oriented rewards; overly severe penalties may lead to overly conservative behavior, whereas a reasonable responsibility weighting mechanism achieves a better balance between safety and travel efficiency.
In conclusion, the main contributions of ROAD are as follows:
First, it proposes a traffic regulation knowledge graph for determining responsibility in autonomous driving, mitigating hallucinations in vision-language models during traffic responsibility reasoning.
Second, it introduces accident responsibility allocation into RL reward functions, shifting reward design from uniform collision penalties to responsibility‑aware penalties.
Third, it demonstrates, in complex intersection and roundabout scenarios, that the method simultaneously improves success rates and reduces collisions with the primary responsibility by the ego vehicle. This work provides a new technical pathway toward safer, more compliant, and more interpretable RL‑based autonomous driving systems.
Paper link:
https://ieeexplore.ieee.org/document/11457333
Author Profiles

Yongming Chen
Master's student at the Global Institute of Future Technology, SJTU. Research interests: reinforcement learning for autonomous driving, knowledge graphs, large language models, and multimodal intelligent systems.

Songan Zhang
Tenure-Track Associate Professor at the Global Institute of Future Technology, SJTU, and a member of the Innovation Center of Intelligent Connected Electric Vehicles. Prof. Zhang received the B.S. and M.S. degrees in automotive engineering from Tsinghua University in 2013 and 2016, respectively, and the Ph.D. degree in mechanical engineering from the University of Michigan, USA, in 2021, under the supervision of Prof. Huei Peng, Director of Mcity. Upon graduation, she joined Ford Motor Company as a Researcher and concurrently served as the Committee Chair for the Robotics Proposal Review Panel of the Ford-University of Michigan Joint Program. She has published over 30 papers in journals and conferences, including T-ITS, T-IV, CVPR, and ICCV. Research areas: Decision-making and control algorithms for intelligent ego vehicles and robotics, reinforcement learning, meta-reinforcement learning, industrial embodied AI, and AI-assisted aircraft engine design.


