Bias in AI is pervasive. From dermatological models that discriminate against patients with dark skin to exam-scoring algorithms that disadvantage public school students, one needn’t look far for examples of prejudice encoded in AI systems. But how do these biases arise in the first place? Researchers at Columbia University sought to uncover this by tasking 400 AI engineers with creating algorithms that made over 8.2 million predictions about 20,000 people. In a coauthored study accepted to the NeurIPS 2020 machine learning conference, the researchers conclude that biased predictions are mostly caused by imbalanced data but that the demographics of engineers also play a role.
“Across a wide variety of theoretical models of behavior, biased predictions are responsible for demographic segregation and outcome disparities in settings including labor markets, criminal justice, and advertising,” the researchers wrote. “Research and public discourse on this topic have grown enormously in the past five years along with a growth in programs to introduce ethics into technical training. However, few studies have attempted to evaluate, audit, or learn from these interventions or connect them back to theory.”
The researchers recruited 80% of the engineers they evaluated through a bootcamp that taught AI techniques at a computer science graduate or advanced undergraduate degree level. The remaining 20% were freelance machine learning contractors who’d worked in the industry an average of about four years. All 400 were given the same assignment: Develop an algorithm to predict math performance from job applications and apply it to 20,000 people who don’t appear in a training dataset.
For the purposes of the study, the engineers were divided into groups in which certain engineers were given data featuring realistic (i.e., biased) sample selection problems while others received data featuring no sample selection problems. A third group was provided the same training data as the first group in addition to a non-technical reminder about the possibility of algorithmic bias, and a fourth was given this data and reminder as well as a simple whitepaper about sample selection correction methods in machine learning.
Unsurprisingly, the researchers found that the algorithms developed by engineers with better training data exhibited less bias. Moreover, this subset of engineers spent more hours working on their algorithms, suggesting that the marginal benefit of development became higher with higher-quality data.
But training data or lack thereof wasn’t the only source of bias, according to the researchers. As alluded to earlier, they also found that prediction errors were correlated within demographic groups, particularly gender and ethnicity. Two white male programmers’ algorithmic prediction errors were more likely to be correlated with each other; in contrast with female programmers, white males had a tendency to double down on errors. No such effect was observed among female and male engineers of East Asian, Indian, Black, and Latinx descent.
This gender disparity might be explained by studies showing women in computer science are socialized to feel they have to achieve perfection. A survey conducted by Supporting Women in Information Technology, based in Vancouver, found that women who pursue a computer science degree say they’re less confident then their male counterparts when using a computer. A more recent work published by Gallup and Google reveals that American girls in seventh through 12th grades express less confidence than boys in their ability to learn computer science skills.
Bias could be mitigated somewhat by the reminders, the researchers say, but the results were mixed on technical guidance interventions. Programmers who understood the whitepaper successfully reduced bias, but most didn’t follow the advice, resulting in algorithms worse than programmers given the reminders.
The researchers caution their paper isn’t the final word on sources of algorithmic bias and that their subject pool, while slightly more diverse than the U.S. software engineering population, contained mostly male (71%), East Asian (52%), and white (28%) engineers. Engineers recruited from the bootcamp were also less experienced in general and only 31% had been employed by “a household-name company” at the time of the study’s publication.
However, the coauthors believe that their work could serve as an important stepping stone toward identifying and addressing the causes of AI bias in the wild. “Questions about algorithmic bias are often framed as theoretical computer science problems. However, productionized algorithms are developed by humans, working inside organizations who are subject to training, persuasion, culture, incentives, and implementation frictions,” they wrote. “An empirical, field experimental approach is also useful for evaluating practical policy solutions.”
Of course, it’s difficult — if not impossible — to completely rid algorithms of bias. Facial recognition models fail to recognize Black, Middle Eastern, and Latinx people more often than those with lighter skin. AI researchers from MIT, Intel, and Canadian AI initiative CIFAR have found high levels of bias from some of the most popular pretrained models. And algorithms developed by Facebook have proven to be 50% more likely to disable the accounts of Black users compared with white users.
That aside, however, the new paper adds to the growing chorus of voices calling for greater attention to the issue of bias in AI.