This robot can tidy a room without any help
A new system helps robots navigate homes they’ve never seen before with a little help from open-source AI models.
Robots are good at certain tasks. They’re great at picking up and moving objects, for example, and they’re even getting better at cooking.
But while robots may easily complete tasks like these in a laboratory, getting them to work in an unfamiliar environment where there’s little data available is a real challenge.
Now, a new system called OK-Robot could train robots to pick up and move objects in settings they haven’t encountered before. It’s an approach that might be able to plug the gap between rapidly improving AI models and actual robot capabilities, as it doesn’t require any additional costly, complex training.
To develop the system, researchers from New York University and Meta tested Stretch, a commercially available robot made by Hello Robot that consists of a wheeled unit, a tall pole, and a retractable arm, in a total of 10 rooms in five homes.
While in a room with the robot, a researcher would scan their surroundings using Record3D, an iPhone app that uses the phone’s lidar system to take a 3D video to share with the robot.
The OK-Robot system then ran an open-source AI object detection model over the video’s frames. This, in combination with other open-source models, helped the robot identify objects in that room like a toy dragon, a tube of toothpaste, and a pack of playing cards, as well as locations around the room including a chair, a table, and a trash can.
The team then instructed the robot to pick up a specific item and move it to a new location. The robot’s pincer arm did this successfully in 58.5% of cases; the success rate rose to 82% in rooms that were less cluttered. (Their research has not yet been peer reviewed.)
The recent AI boom has led to enormous leaps in language and computer vision capabilities, allowing robotics researchers access to open-source AI models and tools that didn’t exist even three years ago, says Matthias Minderer, a senior computer vision research scientist at Google DeepMind, who was not involved in the project.
“I would say it’s quite unusual to be completely reliant on off-the-shelf models, and that it’s quite impressive to make them work,” he says.
“We’ve seen a revolution in machine learning that has made it possible to create models that work not just in laboratories, but in the open world,” he adds. “Seeing that this actually works in a real physical environment is very useful information.”
Because the researchers’ system used models that weren’t fine-tuned to this particular project, when the robot couldn’t find the object it was instructed to look for it simply stopped in its tracks instead of trying to work out a solution. That significant limitation is one reason the robot was more likely to succeed in tidier environments—fewer objects meant fewer chances for confusion, and a clearer space for navigation.
Using ready-made open-source models was both a blessing and a curse, says Lerrel Pinto, an assistant professor of computer science at New York University, who co-led the project.
“On the positive side, you don’t have to give the robot any additional training data in the environment, it just works,” he says. “On the con side, it can only pick an object up and drop it somewhere else. You can’t ask it to open a drawer, because it only knows how to do those two things.”
Combining OK-Robot with voice recognition models could allow researchers to deliver instructions simply by speaking to the robot, making it easier for them to experiment with readily available datasets, says Mahi Shafiullah, a PhD student at New York University who co-led the research.
“There is a very pervasive feeling in the [robotics] community that homes are difficult, robots are difficult, and combining homes and robots is just completely impossible,” he says. “I think once people start believing home robots are possible, a lot more work will start happening in this space.”
Deep Dive
Artificial intelligence
Large language models can do jaw-dropping things. But nobody knows exactly why.
And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.
OpenAI teases an amazing new generative video model called Sora
The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.
Google DeepMind’s new generative model makes Super Mario–like games from scratch
Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.
Responsible technology use in the AI age
AI presents distinct social and ethical challenges, but its sudden rise presents a singular opportunity for responsible adoption.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.