However, there’s potential for improvement to ensure fairer evaluations.
Last year, University of Washington graduate student Kate Glazko noticed recruiters using OpenAI’s ChatGPT and other AI tools to summarize resumes and rank candidates while she was searching for research internships. Automated screening has long been used in hiring, but Glazko, a doctoral student in UW’s Paul G. Allen School of Computer Science & Engineering, studies how generative AI can amplify real-world biases, such as those against disabled individuals. She questioned how such a system might rank resumes suggesting a disability.
In a new study, UW researchers discovered that ChatGPT consistently ranked resumes with disability-related honors and credentials, like the “Tom Wilson Disability Leadership Award,” lower than those without such mentions. When asked to explain these rankings, the AI displayed biased perceptions of disabled people, such as stating a resume with an autism leadership award had “less emphasis on leadership roles,” implying the stereotype that autistic individuals aren’t good leaders.
However, when the researchers customized the AI with instructions to avoid ableism, the tool reduced bias for all but one of the disabilities tested. Five of the six implied disabilities — deafness, blindness, cerebral palsy, autism, and the general term “disability” — showed improved rankings, though only three ranked higher than resumes without disability mentions. The findings were presented on June 5 at the 2024 ACM Conference on Fairness, Accountability, and Transparency in Rio de Janeiro.
“AI-driven resume ranking is becoming more common, yet there’s limited research on its safety and effectiveness,” said Glazko, the study’s lead author. “Disabled job seekers often wonder if they should include disability credentials on their resumes, even when humans review them.”
The researchers used a public CV of one of the study’s authors and created six enhanced versions, each implying a different disability by including four disability-related credentials: a scholarship, an award, a DEI panel seat, and membership in a student organization. They then used ChatGPT’s GPT-4 model to rank these enhanced CVs against the original for a real “student researcher” job listing at a major U.S.-based software company, running each comparison 10 times. In 60 trials, the enhanced CVs ranked first only 25% of the time.
“In a fair system, the enhanced resume should rank first every time,” said senior author Jennifer Mankoff, a UW professor in the Allen School. “I can’t imagine a job where someone recognized for their leadership skills shouldn’t rank higher than someone with the same background but without those recognitions.”
When asked to explain the rankings, GPT-4’s responses showed explicit and implicit ableism, like claiming a resume mentioning depression had “additional focus on DEI and personal challenges,” detracting from the core technical aspects of the role.
“Some of GPT’s descriptions stereotyped resumes based on disability and incorrectly suggested that involvement with DEI or disability was a negative,” Glazko said. “For example, it hallucinated the concept of ‘challenges’ into the depression resume comparison, even though ‘challenges’ weren’t mentioned. Stereotypes were evident.”
The researchers explored whether the system could be trained to be less biased by using the GPTs Editor tool to provide written instructions promoting disability justice and DEI principles. In the revised experiment, the customized system ranked the enhanced CVs higher than the control 37 out of 60 times. However, improvements were inconsistent: the autism CV ranked first only three times out of 10, and the depression CV only twice, unchanged from the original results.
“Users must be aware of the system’s biases when employing AI for real-world tasks,” Glazko said. “Otherwise, recruiters using ChatGPT won’t know that, even with instructions, bias can persist.”
The researchers highlight ongoing efforts by organizations like ourability.com and inclusively.com to improve outcomes for disabled job seekers facing biases, with or without AI. They stress the need for more research to document and mitigate AI biases, including testing other systems like Google’s Gemini and Meta’s Llama, considering more disabilities, exploring bias intersections with other attributes such as gender and race, and investigating whether further customization can more consistently reduce biases across disabilities.
“It’s crucial to study and document these biases,” Mankoff said. “We’ve learned much and hope to contribute to broader discussions on ensuring technology is implemented equitably and fairly, not only regarding disability but also other minoritized identities.”
Additional co-authors include Yusuf Mohammed, a UW undergraduate; Venkatesh Potluri, a UW doctoral student; and Ben Kosa, who completed the research as a UW undergraduate and is now a doctoral student at the University of Wisconsin–Madison. This research was funded by the National Science Foundation, donors to UW’s Center for Research and Education on Accessible Technology and Experiences (CREATE), and Microsoft.