The dark secret behind those cute AI-generated animal images
Google Brain has revealed its own image-making AI, called Imagen. But don't expect to see anything that isn't wholesome.
Another month, another flood of weird and wonderful images generated by an artificial intelligence. In April, OpenAI showed off its new picture-making neural network, DALL-E 2, which could produce remarkable high-res images of almost anything it was asked to. It outstripped the original DALL-E in almost every way.
Now, just a few weeks later, Google Brain has revealed its own image-making AI, called Imagen. And it performs even better than DALL-E 2: it scores higher on a standard measure for rating the quality of computer-generated images, and the pictures it produced were preferred by a group of human judges.
“We’re living through the AI space race!” one Twitter user commented. “The stock image industry is officially toast,” tweeted another.
Many of Imagen’s images are indeed jaw-dropping. At a glance, some of its outdoor scenes could have been lifted from the pages of National Geographic. Marketing teams could use Imagen to produce billboard-ready advertisements with just a few clicks.
But as OpenAI did with DALL-E, Google is going all in on cuteness. Both firms promote their tools with pictures of anthropomorphic animals doing adorable things: a fuzzy panda dressed as a chef making dough, a corgi sitting in a house made of sushi, a teddy bear swimming the 400-meter butterfly at the Olympics—and it goes on.
There’s a technical, as well as PR, reason for this. Mixing concepts like “fuzzy panda” and “making dough” forces the neural network to learn how to manipulate those concepts in a way that makes sense. But the cuteness hides a darker side to these tools, one that the public doesn’t get to see because it would reveal the ugly truth about how they are created.
Most of the images that OpenAI and Google make public are cherry-picked. We only see cute images that match their prompts with uncanny accuracy—that’s to be expected. But we also see no images that contain hateful stereotypes, racism, or misogyny. There is no violent, sexist imagery. There is no panda porn. And from what we know about how these tools are built—there should be.
It’s no secret that large models, such as DALL-E 2 and Imagen, trained on vast numbers of documents and images taken from the web, absorb the worst aspects of that data as well as the best. OpenAI and Google explicitly acknowledge this.
Scroll down the Imagen website—past the dragon fruit wearing a karate belt and the small cactus wearing a hat and sunglasses—to the section on societal impact and you get this: “While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized [the] LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes. Imagen relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.”
It's the same kind of acknowledgement that OpenAI made when it revealed GPT-3 in 2019: “internet-trained models have internet-scale biases.” And as Mike Cook, who researches AI creativity at Queen Mary University of London, has pointed out, it’s in the ethics statements that accompanied Google’s large language model PaLM and OpenAI’s DALL-E 2. In short, these firms know that their models are capable of producing awful content, and they have no idea how to fix that.
For now, the solution is to keep them caged up. OpenAI is making DALL-E 2 available only to a handful of trusted users; Google has no plans to release Imagen.
That’s fine if these were simply proprietary tools. But these firms are pushing the boundaries of what AI can do and their work shapes the kind of AI that all of us live with. They are creating new marvels, but also new horrors—and moving on with a shrug. When Google’s in-house ethics team raised problems with the large language models, in 2020 it sparked a fight that ended with two of its leading researchers being fired.
Large language models and image-making AIs have the potential to be world-changing technologies, but only if their toxicity is tamed. This will require a lot more research. There are small steps to open these kinds of neural network up for widespread study. A few weeks ago Meta released a large language model to researchers, warts and all. And Hugging Face is set to release its open-source version of GPT-3 in the next couple of months.
For now, enjoy the teddies.
Deep Dive
Artificial intelligence
Large language models can do jaw-dropping things. But nobody knows exactly why.
And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.
OpenAI teases an amazing new generative video model called Sora
The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.
Google DeepMind’s new generative model makes Super Mario–like games from scratch
Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.
Responsible technology use in the AI age
AI presents distinct social and ethical challenges, but its sudden rise presents a singular opportunity for responsible adoption.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.