These new tools could make AI vision systems less biased

Two new papers from Sony and Meta describe novel methods to make bias detection fairer.

Melissa Heikkiläarchive page

September 25, 2023

Stephanie Arnett/MITTR | Getty

Computer vision systems are everywhere. They help classify and tag images on social media feeds, detect objects and faces in pictures and videos, and highlight relevant elements of an image. However, they are riddled with biases, and they’re less accurate when the images show Black or brown people and women. And there’s another problem: the current ways researchers find biases in these systems are themselves biased, sorting people into broad categories that don’t properly account for the complexity that exists among human beings.

Two new papers by researchers at Sony and Meta propose ways to measure biases in computer vision systems so as to more fully capture the rich diversity of humanity. Both papers will be presented at the computer vision conference ICCV in October. Developers could use these tools to check the diversity of their data sets, helping lead to better, more diverse training data for AI. The tools could also be used to measure diversity in the human images produced by generative AI.

Traditionally, skin-tone bias in computer vision is measured using the Fitzpatrick scale, which measures from light to dark. The scale was originally developed to measure tanning of white skin but has since been adopted widely as a tool to determine ethnicity, says William Thong, an AI ethics researcher at Sony. It is used to measure bias in computer systems by, for example, comparing how accurate AI models are for people with light and dark skin.

But describing people’s skin with a one-dimensional scale is misleading, says Alice Xiang, the global head of AI ethics at Sony. By classifying people into groups based on this coarse scale, researchers are missing out on biases that affect, for example, Asian people, who are underrepresented in Western AI data sets and can fall into both light-skinned and dark-skinned categories. And it also doesn’t take into account the fact that people’s skin tones change. For example, Asian skin becomes darker and more yellow with age while white skin becomes darker and redder, the researchers point out.

Thong and Xiang’s team developed a tool—shared exclusively with MIT Technology Review—that expands the skin-tone scale into two dimensions, measuring both skin color (from light to dark) and skin hue (from red to yellow). Sony is making the tool freely available online.

Thong says he was inspired by the Brazilian artist Angélica Dass, whose work shows that people who come from similar backgrounds can have a huge variety of skin tones. But representing the full range of skin tones is not a novel idea. The cosmetics industry has been using the same technique for years.

“For anyone who has had to select a foundation shade … you know the importance of not just whether someone’s skin tone is light or dark, but also whether it’s warm toned or cool toned,” says Xiang.

Sony’s work on skin hue “offers an insight into a missing component that people have been overlooking,” says Guha Balakrishnan, an assistant professor at Rice University, who has studied biases in computer vision models.

Measuring bias

Right now, there is no one standard way for researchers to measure bias in computer vision, which makes it harder to compare systems against each other.

To make bias evaluations more streamlined, Meta has developed a new way to measure fairness in computer vision models, called Fairness in Computer Vision Evaluation (FACET), which can be used across a range of common tasks such as classification, detection, and segmentation. Laura Gustafson, an AI researcher at Meta, says FACET is the first fairness evaluation to include many different computer vision tasks, and that it incorporates a broader range of fairness metrics than other bias tools.

To create FACET, Meta put together a freely available data set of 32,000 human images and hired annotators from across the world to label them. The annotators were asked to label the images with 13 different visual attributes, such as their perceived age, skin tone, gender representation, hair color and texture, and so on. They also asked the annotators to label people based on what they were doing or what their profession seemed to be, such as hairdresser, skateboarder, student, musician, or gymnast. This, the researchers say, adds nuance and accuracy to bias evaluation.

Meta then used FACET to evaluate how state-of-the-art vision models performed on different groups of people; the results pointed to big disparities. For example, the models were better at detecting people with lighter skin, even if they have dreadlocks or coily hair.

Because people around the world bring their own biases to the way they evaluate images of other people, Meta’s efforts to recruit geographically diverse annotators are positive, says Angelina Wang, a PhD researcher at Princeton, who has studied bias in computer vision models.

The fact that Meta has made its data freely available online will also help researchers. Annotating data is very expensive, so it’s only really accessible to big tech companies at a large scale. “This is a welcome addition,” says Balakrishnan.

But Wang warns it’s wise to be realistic about how much impact these systems can have. They will likely lead to small improvements rather than transformations in AI.

“I think we’re still far from nearing something that actually captures how humans represent themselves, and likely we will never reach it,” she says.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.