A roadmap to help AI technologies speak African languages
From text-generating ChatGPT to voice-activated Siri, artificial intelligence-powered tools are designed to aid our everyday life — as long as you speak a language they support. These technologies are out of reach for billions of people who don’t use English, French, Spanish or other mainstream languages, but researchers in Africa are looking to change that. In a study published August 11 in the journal Patterns, scientists draw a roadmap to develop better AI-driven tools for African languages.
“It doesn’t make sense to me that there are limited AI tools for African languages,” says first author and AI researcher Kathleen Siminyu of the Masakhane Research Foundation, a grassroots network of African scientists who aim to spur accessible AI tools for those who speak African languages. “Inclusion and representation in the advancement of language technology is not a patch you put at the end — it’s something you think about up front.”
Many of these tools rely on a field of AI called natural language processing, a technology that enables computers to understand human languages. Computers can master a language through training, where they pick up on patterns in speech and text data. However, they fail when data in a particular language is scarce, as seen in African languages. To fill the gap, the research team first identified key players involved in developing African language tools and explored their experience, motivation, focuses, and challenges. These people include writers and editors who create and curate content, as well as linguists, software engineers, and entrepreneurs who are crucial in establishing the infrastructure for language tools.
Interviews with the key players revealed four central themes to consider in designing African language tools:
- First, bearing the impact of colonization, Africa is a multilingual society where African language is central to people’s cultural identities and is key to societal participation in education, politics, economy, and more.
- Second, there is a need to support African content creation. This includes building basic tools such as dictionaries, spell checkers, and keyboards for African languages and removing financial and administrative barriers for translating government communications to multiple national languages, which includes African languages.
- Third, the creation of African language technologies will benefit from collaborations between linguistics and computer science. Also, there should be focus on creating tools that are human centered, which help individuals unlock greater potential.
- Fourth, developers should be mindful of communities and ethical practices during the collection, curation, and use of data.
“There’s a growing number of organizations working in this space, and this study allows us to coordinate efforts in building impactful language tools,” says Siminyu. “The findings highlight and articulate what the priorities are, in terms of time and financial investments.”
Next, the team plans to expand the study and include more participants to understand the communities that AI language technologies may impact. They will also address barriers that may hinder people’s access to the technology. The team hopes their study could serve as a roadmap to help develop a wide range of language tools, from translation services to misinformation-catching content moderators. The findings may also pave the way to preserve indigenous African languages.
“I would love for us to live in a world where Africans can have as good quality of life and access to information and opportunities as somebody fluent in English, French, Mandarin, or other languages,” says Siminyu.
###
UNESCO and the Knowledge for All Foundation supported this study through funding and administrative support.
Patterns, Siminyu et al. “Consultative engagement of stakeholders toward a roadmap for Africa language technologies.” https://www.cell.com/patterns/fulltext/S2666-3899(23)00189-7
Related editorial:
Patterns, Wang "Different natural languages, equal importance" https://cell.com/patterns/fulltext/S2666-3899(23)00190-3
Patterns (@Patterns_CP), published by Cell Press, is a data science journal publishing original research focusing on solutions to the cross-disciplinary problems that all researchers face when dealing with data, as well as articles about datasets, software code, algorithms, infrastructures, etc., with permanent links to these research outputs. Visit: https://www.cell.com/patterns. To receive Cell Press media alerts, please contact press@cell.com.
JOURNAL
Patterns
METHOD OF RESEARCH
Survey
SUBJECT OF RESEARCH
People
ARTICLE TITLE
Consultative Engagement of Stakeholders Towards a Roadmap for African Language Technologies
ARTICLE PUBLICATION DATE
11-Aug-2023
Turning ChatGPT into a ‘chemistry assistant’
Developing new materials requires significant time and labor, but some chemists are now hopeful that artificial intelligence (AI) could one day shoulder much of this burden. In a new study in the Journal of the American Chemical Society, a team prompted a popular AI model, ChatGPT, to perform one particularly time-consuming task: searching scientific literature. With that data, they built a second tool, a model to predict experimental results.
Reports from previous studies offer a vast trove of information that chemists need, but finding and parsing the most relevant details can be laborious. For example, those interested in designing highly porous, crystalline metal-organic frameworks (MOFs) — which have potential applications in areas such as clean energy — must sort through hundreds of scientific papers describing a variety of experimental conditions. Researchers have previously attempted to coax AI to take over this task; however, the language processing models they used required significant technical expertise, and applying them to new topics meant changing the program. Omar Yaghi and colleagues wanted to see if the next generation of language models, which includes ChatGPT, could offer a more accessible, flexible way to extract information.
To analyze text from scientific papers, the team gave ChatGPT prompts, or instructions, guiding it through three processes intended to identify and summarize the experimental information the manuscripts contained. The researchers carefully constructed these prompts to minimize the model’s tendency to make up responses, a phenomenon known as hallucination, and to ensure the best responses possible.
When tested on 228 papers describing MOF syntheses, this system extracted more than 26,000 factors relevant for making roughly 800 of these compounds. With these data, the team trained a separate AI model to predict the crystalline state of MOFs based on these conditions. And finally, to make the data more user friendly, they built a chatbot to answer questions about it. The team notes that, unlike previous AI-based efforts, this one does not require expertise in coding. What’s more, scientists can shift its focus simply by adjusting the narrative language in the prompts. This new system, which they dub the “ChatGPT Chemistry Assistant,” could also be useful in other fields of chemistry, according to the researchers.
The authors acknowledge funding from the National Institutes of Health, the Kavli ENSI Graduate Student Fellowship and the Bakar Institute of Digital Materials for the Planet.
The American Chemical Society (ACS) is a nonprofit organization chartered by the U.S. Congress. ACS’ mission is to advance the broader chemistry enterprise and its practitioners for the benefit of Earth and all its people. The Society is a global leader in promoting excellence in science education and providing access to chemistry-related information and research through its multiple research solutions, peer-reviewed journals, scientific conferences, eBooks and weekly news periodical Chemical & Engineering News. ACS journals are among the most cited, most trusted and most read within the scientific literature; however, ACS itself does not conduct chemical research. As a leader in scientific information solutions, its CAS division partners with global innovators to accelerate breakthroughs by curating, connecting and analyzing the world’s scientific knowledge. ACS’ main offices are in Washington, D.C., and Columbus, Ohio.
To automatically receive news releases from the American Chemical Society, contact newsroom@acs.org.
Follow us: Twitter | Facebook | LinkedIn | Instagram
JOURNAL
Journal of the American Chemical Society
ARTICLE TITLE
ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis