This self-driving startup is using generative AI to predict traffic
Waabi says its new model can anticipate how pedestrians, trucks, and bicyclists move using lidar data.
Self-driving company Waabi is using a generative AI model to help predict the movement of vehicles, it announced today.
The new system, called Copilot4D, was trained on troves of data from lidar sensors, which use light to sense how far away objects are. If you prompt the model with a situation, like a driver recklessly merging onto a highway at high speed, it predicts how the surrounding vehicles will move, then generates a lidar representation of 5 to 10 seconds into the future (showing a pileup, perhaps). Today’s announcement is about the initial version of Copilot4D, but Waabi CEO Raquel Urtasun says a more advanced and interpretable version is deployed in Waabi’s testing fleet of autonomous trucks in Texas that helps the driving software decide how to react.
While autonomous driving has long relied on machine learning to plan routes and detect objects, some companies and researchers are now betting that generative AI — models that take in data of their surroundings and generate predictions — will help bring autonomy to the next stage. Wayve, a Waabi competitor, released a comparable model last year that is trained on the video that its vehicles collect.
Waabi’s model works in a similar way to image or video generators like OpenAI’s DALL-E and Sora. It takes point clouds of lidar data, which visualize a 3D map of the car’s surroundings, and breaks them into chunks, similar to how image generators break photos into pixels. Based on its training data, Copilot4D then predicts how all points of lidar data will move. Doing this continuously allows it to generate predictions 5-10 seconds into the future.
Waabi is one of a handful of autonomous driving companies, including competitors Wayve and Ghost, that describe their approach as “AI-first.” To Urtasun, that means designing a system that learns from data, rather than one that must be taught reactions to specific situations. The cohort is betting their methods might require fewer hours of road-testing self-driving cars, a charged topic following an October 2023 accident where a Cruise robotaxi dragged a pedestrian in San Francisco.
Waabi is different from its competitors in building a generative model for lidar, rather than cameras.
“If you want to be a Level 4 player, lidar is a must,” says Urtasun, referring to the automation level where the car does not require the attention of a human to drive safely. Cameras do a good job of showing what the car is seeing, but they’re not as adept at measuring distances or understanding the geometry of the car’s surroundings, she says.
Though Waabi’s model can generate videos showing what a car will see through its lidar sensors, those videos will not be used as training in the company’s driving simulator that it uses to build and test its driving model. That’s to ensure any hallucinations arising from Copilot4D do not get taught in the simulator.
The underlying technology is not new, says Bernard Adam Lange, a PhD student at Stanford who has built and researched similar models, but it’s the first time he’s seen a generative lidar model leave the confines of a research lab and be scaled up for commercial use. A model like this would generally help make the “brain” of any autonomous vehicle able to reason more quickly and accurately, he says.
“It is the scale that is transformative,” he says. “The hope is that these models can be utilized in downstream tasks” like detecting objects and predicting where people or things might move next.
Copilot4D can only estimate so far into the future, and motion prediction models in general degrade the farther they’re asked to project forward. Urtasun says that the model only needs to imagine what happens 5 to 10 seconds ahead for the majority of driving decisions, though the benchmark tests highlighted by Waabi are based on 3-second predictions. Chris Gerdes, co-director of Stanford’s Center for Automotive Research, says this metric will be key in determining how useful the model is at making decisions.
“If the 5-second predictions are solid but the 10-second predictions are just barely usable, there are a number of situations where this would not be sufficient on the road,” he says.
The new model resurfaces a question rippling through the world of generative AI: whether or not to make models open-source. Releasing Copilot4D would let academic researchers, who struggle with access to large data sets, peek under the hood at how it’s made, independently evaluate safety, and potentially advance the field. It would also do the same for Waabi’s competitors. Waabi has published a paper detailing the creation of the model but has not released the code, and Urtasun is unsure if they will.
“We want academia to also have a say in the future of self-driving,” she says, adding that open-source models are more trusted. “But we also need to be a bit careful as we develop our technology so that we don’t unveil everything to our competitors.”
Deep Dive
Artificial intelligence
Large language models can do jaw-dropping things. But nobody knows exactly why.
And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.
OpenAI teases an amazing new generative video model called Sora
The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.
Google DeepMind’s new generative model makes Super Mario–like games from scratch
Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.
Responsible technology use in the AI age
AI presents distinct social and ethical challenges, but its sudden rise presents a singular opportunity for responsible adoption.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.