Empowering African Languages: The WAXAL Initiative Explained

The African Meridian Newsroom · Accra, Ghana · 1 July 2026

For years, the developers building Africa’s voice-enabled technology have run into the same wall: the artificial intelligence models powering virtual assistants, transcription tools and voice search were trained almost entirely on English, Chinese and European languages, leaving the continent’s roughly 2,000 indigenous languages with almost no digital footprint to build from.

Google and a coalition of African universities are trying to close that gap. Earlier this year the group launched WAXAL, an open speech dataset covering 21 Sub-Saharan African languages — including Hausa, Yoruba, Igbo, Swahili, Akan, Ewe, Fulani and Lingala — built from more than 1,250 hours of transcribed speech and over 20 hours of studio-quality recordings for synthetic voice work. Unlike earlier efforts collected remotely, WAXAL was assembled over three years by African institutions themselves, led by Makerere University in Uganda, the University of Ghana and Digital Umuganda in Rwanda, with more than 7,000 volunteers at the University of Ghana alone contributing their voices.

Google Research Africa’s head, Aisha Walcott-Bryant, has described WAXAL as scientific infrastructure rather than simply a dataset, saying it gives African researchers and entrepreneurs the tools to build technology “on their own terms, in their own languages,” with the potential to eventually reach more than 100 million people. Researchers at Makerere and the University of Ghana say the dataset has already begun powering student- and faculty-led projects in agriculture, education and health technology.

But not everyone is convinced that data alone will save Africa’s languages. At this year’s AfroCuration gathering at the Kwame Nkrumah University of Science and Technology in Kumasi, linguists and cultural advocates working in Twi, Kusaal, Dagbani, Ewe, Mooré and Gurene warned that digital tools cannot substitute for language transmission in the home and classroom.

“Whether we like it or not, AI has come to stay,” said linguist Professor Kofi Agyekum. “The world is going AI. We cannot be left behind.” But he cautioned against over-reliance on the technology: “AI is good, but it should not be at the expense of our own culture. How well can AI interpret Gurene culture, interpret Ewe customs and institutions, or interpret Akan symbols? We must receive AI with some caution so that we do not erode our culture.”

Other participants pointed to more basic gaps: many African languages still have little or no presence on platforms like Wikipedia. “AfroCuration is an event that bridges the gap between our culture and the knowledge about our culture out there,” said the Global Open Initiative Foundation’s Abdulfatai Mustapha, whose organisation has helped build Dagaare, Kusaal and Mooré Wikipedia editions from scratch.

For KNUST senior lecturer Dr Victoria Ogunnike Faleke, the stakes go beyond documentation. “Language is our identity, and identity lost is cultural loss,” she said. “Where there is no language, there is no society.”

Taken together, the two efforts — one building the raw data AI needs to speak African languages, the other insisting that no algorithm can replace a language passed from parent to child — capture a debate playing out across the continent: how to use powerful new tools without losing the cultures those tools are meant to serve.

From Google’s WAXAL Dataset to Ghana’s Language Scholars, Africa Debates How AI Should Meet Indigenous Tongues

Like this:

Related

Africa

From Google’s WAXAL Dataset to Ghana’s Language Scholars, Africa Debates How AI Should Meet Indigenous Tongues

Share this:

Like this:

Related

Africa

Discover more from African Meridian