Humans may be one of the biggest obstacles to the operation of fully autonomous vehicles on urban streets. If a robot is to guide a vehicle safely through downtown Boston, it must be able to predict what drivers, pedestrians and cyclists nearby will do next.
However, behavior prediction is a difficult problem. The current AI solutions are either too simple (they may assume that pedestrians always walk in a straight line), too conservative (robots just leave their cars in the parking lot to avoid pedestrians), or can only predict the next action of a road user (the road usually carries many users at the same time).
MIT researchers have designed a seemingly simple solution to this complex challenge. They divide the behavior prediction problem of multiple road users into small pieces and solve each problem separately, so the computer can solve this complex task in real time.
Their behavior prediction framework first guesses the relationship between two road users - which car, cyclist or pedestrian has the right of way and which road user will give way - and uses these relationships to predict the future trajectory of multiple road users.
Compared with the real traffic flow in the huge data set compiled by waymo, an automatic driving company, these estimated trajectories are more accurate than those of other machine learning models. MIT's technology even surpasses waymo's recently released model. Moreover, because researchers break down the problem into simpler parts, their technology uses less memory.
"This is a very intuitive idea, but it has not been fully explored before, and the effect is quite good. Simplicity is definitely an advantage. We are comparing our model with other most advanced models in the field, including the model of waymo, a leading company in the field. Our model has achieved top performance on this challenging benchmark. This has great potential in the future." Huang Xin, a graduate student in the Department of Aeronautics and Astronautics and a research assistant at Brian Williams laboratory, a professor in the Department of Aeronautics and Astronautics and a member of the computer science and Artificial Intelligence Laboratory (CSAIL), said he was the co leader of the study.
Huang Xin and Williams also wrote the paper with three researchers from Tsinghua University in China: CO first author Sun Qiao, Gu Junru and senior author Zhao Xing. The study will be presented at the conference on computer vision and pattern recognition.
Multiple small models
The researchers' machine learning method is called M2i, which requires two inputs: the past trajectory of cars, bicycles and pedestrians interacting in the traffic environment (such as crossroads extending in all directions), and a map including street location, Lane configuration, etc.
Using this information, a relationship predictor infers which of the two road users owns the right of way first, and classifies one person as a passer-by and one as a passer-by. Then, a prediction model called marginal predictor guesses the trajectory of passers-by, because the behavior of this agent is independent.
The second prediction model, called conditional predictor, then guesses what the yielding agent will do according to the behavior of the passing agent. The system predicts some different trajectories of the transferor and the transferor, calculates the probability of each trajectory separately, and then selects the six joint results with the greatest possibility of occurrence.
M2i outputs a prediction of how these road users will move in traffic in the next 8 seconds. In one example, their method slows down a car so that pedestrians can cross the road, and then accelerates after they clear the intersection. In another example, vehicles wait for several vehicles to pass before turning from a small street to a busy main road.
Although this preliminary study focuses on the interaction between two road users, M2i can infer the relationship between many road users, and then guess their trajectory by connecting multiple marginal and conditional predictors.
Real world driving test
The researchers trained the model using waymo's open motion dataset, which contains millions of real traffic scenes involving vehicles, pedestrians and cyclists, recorded by lidar (light detection and ranging) sensors and cameras installed on the company's autonomous vehicles. They are particularly concerned about the presence of multiple agents.
To determine accuracy, they compared six prediction samples (weighted by their confidence) of each method with the actual trajectories of cars, bicycles and pedestrians in a scene. Their method is the most accurate. It is also better than the baseline model in the index called overlap rate; If two tracks overlap, there is a collision. The overlap rate of M2i is the lowest.
"Instead of just building a more complex model to solve this problem, we have adopted a way of thinking that is more like human reasoning and interaction with others. Human beings will not reason about the combination of all hundreds of future behaviors. We make decisions quite quickly," Huang Xin said.
Another advantage of M2i is that it makes it easier for users to understand the decisions of the model because it decomposes the problem into smaller parts. Huang Xin said that in the long run, this may help users give more trust to autonomous vehicle.
However, the framework cannot explain the interaction between two agents, for example, when two cars move forward on a four-way stop, because the driver is not sure who should give way.
They plan to address this limitation in their future work. They also want to use their method to simulate the real interaction between road users, which can be used to verify the planning algorithm of autonomous vehicle, or create a large number of synthetic driving data to improve the performance of the model.
"Predicting the future trajectory of multiple interacting road users is insufficient and challenging for fully automatic driving in complex scenes. M2i provides a very promising prediction method, and its relationship predictor can distinguish between agents predicted as edges or conditional, which greatly simplifies the problem." Masayoshi tomizuka, a distinguished professor in the Department of mechanical engineering at the University of California, Berkeley, and Wei Zhan, an assistant professional researcher, wrote in an email. "The prediction model can capture the internal relationship and interaction of road users to achieve the most advanced performance." The two were not involved in the study.