Bias in content strategy — Boye & Company

By Janus Boye

In the past year Marli Mesibov has done significant research into chatbots, conversational interfaces, and voice UI. What she has found shows racism and bias in many areas of automated content. The reason is clear: an algorithm is only as strong as the strategy, taxonomy, and user testing that creates it.

In order to develop AI that is delightful, it must first be unbiased. We can do that in the people we hire to develop our products, and the ways in which we test and create the initial content.

In a recent member call, Marli reviewed concrete tactics for developing unbiased AI. Marli is Content Strategy Lead at Verily Life Sciences in Cambridge, MA and also a past Boye conference speaker.

The conversation started with voice user interfaces.

The problem with voice recognition

On her opening slide introducing herself, Marli actually put this as an extra bullet:

Super frustrated with my Google Assistant because it doesn’t listen to me

As she said, it’s often the case for women, that when the Google device doesn’t respond to them, a male voice will have a much higher success rate.

That’s because speech recognition, even here in 2022, performs worse for women than it does for men. There’s a bias in the AI that powers the content strategy, which is unfortunately optimised for that male version of the so-called accentless accent used by radio announcers and others considered flawlessly perfect, but unfortunately not the usual accent spoken by so many others.

Marli cited a Harvard Business Review article which covered this back in 2019. In the article titled Voice Recognition Still Has Significant Race and Gender Biases, the author opens with:

“As with facial recognition, web searches, and even soap dispensers, speech recognition is another form of AI that performs worse for women and non-white people.”

To illustrate the problem using numbers, Marli shared these 3 interesting figures for voice user interfaces:

95% overall success: Design in Tech report (PDF) from 2017 shows a 95% success rate in voice UI correctly responding to voices. In the five years since, no one has reported a higher success rate.
53% Scottish English. However, that rate goes down to 53% when we move from American English to Scottish English
13% lower for women. Most voice UI was tested with men, not women. Women’s voices have a 13% lower success rate.

Unfortunately, this is not specific to Voice UI! There’s bias across all channels.

Bias across all types of AI and UX

To get to the scale of the problem, Marli also shared how the content bias unfortunately exist across all types of AI and UX, including:

AI Assistants: AI assistants like Alexa and Siri need to respond to voice commands. If the AI assistant isn’t tested with a variety of accents and vocal pitches, it won’t work.
Chatbots: Chatbots are written with branching logic to respond to key words or phrases. The words and phrases are based on the creation team’s assumptions about what the end-user needs.
Data-based products: We are actively building products that take our assumptions, and build them into the branching logic and machine learning that we set up.

To illustrate the problem, Marli shared the example of Perspective, a free API that uses machine learning to identify "toxic" comments, making it easier to host better conversations online. To quote Marli:

I tested 14 sentences for "perceived toxicity" using Perspectives. Least toxic: I am a man. Most toxic: I am a gay black woman. Come on pic.twitter.com/M4TF9uYtzE
— Jessamyn West (@jessamyn) August 24, 2017

APIs like Perspective initially failed, as they took broad social data, but didn’t comb it to remove the bias. The mistake: assuming data is neutral

The popular tweet from Jessamyn West back in 2017 shows how it went wrong.

Testing is key, but as Marli pointed out, the real problem is the lack of diversity in the teams building and testing the systems with:

9/10ths being cisgender men
9/10ths being able-bodied, white, American accents

These 9/10ths will have similar assumptions about phrasing and AI needs.

Finally, there’s also a common misunderstanding about the ‘edge cases’, with women being told to speak with a lower voice to get AI systems to hear them and apps being build for the 80%, with everything else as an “edge case”.

Let’s move onto Marli’s advice on what you can do.

What can you do?

Using the famous Gandhi quote - “Be the change you want to see in the world” - Marli shared good advice on what we can do in our own projects to make content strategy less biased.

First, she adviced to make the industry representative:

Hire diverse teams
Mentor people who don’t look or sound like you
Ask yourself why underrepresented groups are underrepresented

Marli gave us all a friendly challenge in one of her final slides

She also challenged us to look at our assumptions:

Create for “edge” cases
Focus on behaviors over demographics
Make your assumptions explicit, not implicit

And then, the final one and of my personal favorites: Keep learning

Read up on how to be anti-racist and anti-biased
Share your findings with coworkers
Avoid defensiveness in favor of being better

Learn more about bias in content strategy

During the call, Marli recommended Design for Real Life, a 2016 book by Eric A. Meyer and Sara Wachter-Boettcher which focuses on identifying stress cases and designing with compassion, so that you'll create experiences that support more of your users, more of the time.

Marli also shared this excellent guide with 5 Steps to Take as an Antiracist Data Scientist

Finally, you can also download the slides (PDF).