Friday, February 06, 2026

 

New AI model enables native speakers and foreign learners to read undiacritized Arabic texts with greater fluency






University of Sharjah

Diacritics with noise incorporation 

image: 

The effectiveness of noise incorporation by comparing the performance of AraBERT-Enhanced-Noisy with AraBERT-Enhanced on various examples. The noisy model correctly diacritized sentences with common spelling errors, distinguished between valid and nonsense words, and accurately handled transliterated words.Credit: Information Processing & Management (2026). DOI: https://doi.org/10.1016/j.ipm.2025.104345

view more 

Credit: Information Processing & Management (2026). DOI: https://doi.org/10.1016/j.ipm.2025.104345




By Ifath Arwah, University of Sharjah

Reading an Arabic newspaper, a book, or academic prose fluently, whether digital or in print, remains challenging for many native speakers, let alone learners of Arabic as a foreign language.

The difficulty largely stems from the nature of Arabic writing, which relies heavily on consonants. Without diacritics, which mark short vowels, it becomes extremely hard to achieve accurate pronunciation, proper contextual understanding, and clear meaning.

Now, scientists at the University of Sharjah report that they have developed a new machine-learning system designed to overcome these challenges.

The system mainly targets problems that existing programs face when encountering undiacritized Arabic script, writing that lacks the vowel marks necessary to pronounce words correctly, a process linguists refer to as diacritization.

The presence of diacritics in Arabic is vital not only for how a word is pronounced, but also for semantics. A single word can have multiple, entirely different meanings, depending on how it is articulated.

“Diacritization in Arabic is crucial for correct pronunciation, for differentiating words, and for improving text readability. Diacritics, which represent short vowels, are placed above or below letters. Without them, Arabic becomes challenging for non-native speakers, language learners, and even many native speakers,” the researchers explain in their study published in the journal Information Processing and Management.

The study proposes “a framework for developing robust, context-aware Arabic diacritization models. The methodology included dataset enhancement, noise injection, context-aware training, and the development of SukounBERT.v2 using a diverse corpus,” they note.

New leap in Arabic diacritization research

Linguists employ eight diacritics in Arabic orthography to produce distinct vocalizations of the same word to clarify its meaning and context. Classical Arabic texts typically go without diacritical marks, and the same is true for most standard Arabic materials as well as scripts representing the language’s diverse dialects.

While recent years have seen considerable advances in Arabic diacritization research,  “existing models struggle to generalize across the diverse forms of Arabic and perform poorly in noisy, error-prone environments,” the authors note. Their work aims to remove current impediments by allowing existing AI models to furnish accurate vowel marks that support fluent, unambiguous reading.

According to the researchers, “These limitations may be tied to problems in training data and, more critically, to insufficient contextual understanding. To address these gaps, we present SukounBERT.v2, a BERT-based Arabic diacritization system that is built using a multi-phase approach.”

SukounBERT is an AI-driven model designed to restore diacritics to Arabic writing.  The authors’ newly introduced SukounBERT.v2 builds on earlier models. It is specifically constructed to address earlier versions’ shortcomings, such as poor generalization across different Arabic varieties and reduced performance in noisy or error-prone environments.

“We refine the Arabic Diacritization (AD) dataset by correcting spelling mistakes, introducing a line-splitting mechanism, and by injecting various forms of noise into the dataset, such as spelling errors, transliterated non-Arabic words, and nonsense tokens,” the authors note.

They add, “Furthermore, we develop a context-aware training dataset that incorporates explicit diacritic markings and the diacritic naming of classical grammar treatises.”

The Sukoun Corpus and diacritization research

The authors’ method draws on the Sukoun Corpus, a large-scale, diverse dataset comprising over 5.2 million lines and 71 million tokens from a variety of Arabic written sources, including dictionaries, poetry, and purpose-crafted contextual sentences.

They further augment their corpus with a token-level mapping dictionary that enables minimal or micro-diacritization without sacrificing accuracy. “This is a previously unreported feature in Arabic diacritization research. Trained on this enriched dataset, SukounBERT.v2 delivers state-of-the-art performance with over 55% relative reduction in Diacritic Error Rate (DER) and Word Error Rate (WER) compared to leading models.”

According to the authors, their approach benefits both native speakers and learners of Arabic as a foreign language by reducing perceptual noise and avoiding “garden path” effects, a cognitive process that results in misleading linguistic cues that can momentarily lead readers to a false interpretation.

The approach does not recommend restoring excessive diacritics, as nearly every letter of the Arabic alphabet already carries a diacritic. Instead, it adopts the strategy of “minimal” rather than “full” diacritization, offering native speakers and learners of Arabic “essential phonetic cues that enhance word recognition and comprehension, bridging the gap between structured textbook language and authentic, largely unvowelized texts found in newspapers, literature, and everyday media.”

By striking a balance between semantic precision and cognitive efficiency, “minimal diacritization aligns with modern publishing practices and accommodates diverse reader profiles. As the authors emphasize, the approach makes it “an optimal strategy for enhancing real-world reading performance across proficiency levels.”

Revolutionizing modern Arabic diacritization

Research on automating Arabic diacritization has gained momentum as the number of the language’s more than 400 million native speakers and over 100 million people worldwide learning or using it as a second or foreign language increases. Moreover, manual diacritization remains both complex and time-consuming, and although linguists have historically depended on limited but useful rule-based systems to navigate Arabic language intricacies, the method is no longer practical for the massive proliferation of digital texts.

 The authors point out that SukounBERT.v2 relies heavily on contextual clues to resolve ambiguities in meaning and pronunciation. A plethora of research shows that the presence of diacritics greatly enhances reading and comprehension skills, enabling readers to access a precise semantic representation of words that are otherwise difficult to infer undiacritized script.

Describing SukounBERT.v2 as a “state-of-the-art” model, the authors report that it outperforms existing open-source models by a substantial margin. They note that “the implementation of minimal diacritization using a token-level mapping dictionary enhanced the system’s practicality by providing accurate yet readable output with only essential diacritics.”

Unlike earlier AI-driven models that primarily emphasize accuracy, SukounBERT.v2 “introduces a more comprehensive strategy that enhances robustness, context awareness, and adaptability.”

One of the model’s most notable innovations is its minimal diacritization approach, “which optimally balances readability and phonetic accuracy, ensuring that only essential diacritics are retained without compromising meaning. Moreover, the inclusion of context-aware training data allows the model to infer grammatical roles more effectively, resolving structural ambiguities in Arabic text.”

Despite these advancements, the authors acknowledge limitations, notably the scarcity of diacritized modern standard Arabic datasets, which continues to impede the progress of research in the field.

They conclude that addressing this gap will require “the development of large-scale, open-source MSA datasets to enhance model performance across different Arabic varieties. Furthermore, while SukounBERT.v2 achieves high accuracy, its lack of interpretability remains a challenge, limiting transparency in decision-making.”

 

UCLA study reveals dual forces driving SARS-CoV-2 Evolution: Immune pressure and viral fitness




New research clarifies the complex role of neutralizing antibodies in shaping disease outcomes and the evolution of SARS-CoV-2




Immunity & Inflammation

Neutralizing antibodies interacting with SARS-CoV-2 spike proteins 

image: 

Conceptual illustration representing interactions between the immune system and SARS-CoV-2. The image reflects broader processes involved in antibody responses and viral adaptation during infection.

view more 

Credit: National Institutes of Health (NIH) from Openverse | Image source link: https://openverse.org/image/d64c06e0-8828-49d5-bb21-f4f76692a61f?q=Coronavirus&p=26





While widespread vaccination and infection have established population-level immunity, the relationship between antibody-driven immunity, viral evolution, and disease severity has remained unclear. On January 27, 2026, Prof. Genhong Cheng at the University of California, Los Angeles (UCLA), published a brief report titled “Impacts of neutralizing antibody responses on SARS-CoV-2 evolution and its associated disease progression” in Immunity & Inflammation. This study addresses the gap through a single-center, retrospective longitudinal cohort.

Antibody Levels Paint a Complex Clinical Picture

The researchers analyzed serum samples from individuals infected during the initial pandemic wave (pre-vaccine), during and after the Omicron wave. For the first-wave patients, they categorized them based on serum virus neutralization titers into lower (S25), middle (S50), and upper (S75) antibody-level groups.

The analysis uncovered a subtle relationship between antibody levels and disease. Patients with lower neutralizing antibody levels experienced symptoms for a longer duration and required more PCR tests, suggesting a prolonged battle to clear the virus. Surprisingly, none of the S25 patients developed severe disease, whereas some in the middle and higher groups required pharmacological intervention or respiratory support. The study also noted that higher antibody levels were negatively correlated with baseline health status, indicating that healthier individuals may mount a more robust initial response. These results underscore that “neutralizing antibodies are a key, but not sole, determinant of COVID-19 clinical outcomes,” the authors pointed out.

The Evolutionary Clue: Escaping Immunity vs. Gaining Entry

A core finding of the research illuminates the driving force of viral evolution. By testing serum from patients infected at different times against a panel of historical variants, the team observed a persistent decline in neutralization potency against newer strains. Antibodies generated from early infections showed a remarkable reduction, or even complete loss, of effectiveness against newer variants such as Omicron sub-lineages. In contrast, antibodies elicited by newer infections retained some neutralizing capacity against contemporaneous viruses, indicating the immune system adapts, but lags behind.

This pattern confirms that population-wide antibody responses exert a powerful selective pressure, driving the virus to mutate and evade detection. However, escape alone is insufficient for a variant to become dominant. The researchers highlighted the case of XBB.1.5. “While it possessed a similar ability to evade antibodies as its predecessor XBB.1, a single mutation (S486P) in its spike protein significantly increased its affinity for the human ACE2 receptor,” the authors highlighted. This enhanced binding capability provided the critical fitness advantage that propelled XBB.1.5 to global dominance. The study thus establishes that successful variants must evolve under dual constraint: reducing vulnerability to neutralization while maintaining or improving their efficiency in infecting cells.

Implications for Public Health and Surveillance

This long-term cohort study offers several important implications. It provides a molecular epidemiological explanation for the seemingly contradictory clinical observation of mild-but-prolonged illness, linking it directly to lower antibody efficacy. Furthermore, it definitively shows that pre-existing population immunity is a primary driver of viral evolution.

These findings emphasize that our immune history actively shapes the virus's future,” the authors noted. “Monitoring must therefore account for both immune escape potential and changes in receptor binding, as these combined traits define the next successful variant.

This understanding reinforces the need for alert genomic surveillance that tracks these dual characteristics. It also provides a data-driven foundation for designing improved vaccination strategies, potentially focusing on antigens that elicit broad protection against evolving viral fitness landscapes.

 

****************

 

Reference

DOI: https://doi.org/10.1007/s44466-025-00020-2

 

About Immunity & Inflammation

Immunity & Inflammation is a newly launched open-access journal co-published by the Chinese Society for Immunology and Springer Nature under the leadership of Editors-in-Chief Prof. Xuetao Cao and Prof. Jules A. Hoffmann. Immunity & Inflammation aims to publish major scientific questions and cutting-edge advances that explore groundbreaking discoveries and insights across the spectrum of immunity and inflammation, from basic science to translational and clinical research.
Website: https://link.springer.com/journal/44466

 

About Authors

Prof. Genhong Cheng from UCLA

Prof. Cheng is a Distinguished Professor in the Department of Microbiology, Immunology & Molecular Genetics at UCLA and a specially appointed professor at the Guangzhou Laboratory. He serves as a fellow of the American Association for the Advancement of Science (AAAS) and a fellow of the American Academy of Microbiology. He has received numerous awards including the Stohlman Award from the Leukemia & Lymphoma Society. His interdisciplinary research spans infection, immunity, cancer, and metabolism.

 

Dr. Lulan Wang from UCLA

Dr. Wang is a postdoctoral researcher at UCLA. He is a recipient of training grants from the National Institutes of Health and of the Amazon AWS DDI Award. His research focuses on epidemiology, vaccinology, and artificial intelligence applications.

 

Funding information

This project was supported by the Research Funds from US National Institute of Health (R01AI158154-01).

Neuroticism may be linked with more frequent sexual fantasies




Personality trait study also reports less frequent fantasizing among people who are more conscientious or agreeable




PLOS

Associations between big five personality traits, facets, and sexual fantasies 

image: 

Neuroticism may be linked with more frequent sexual fantasies.

view more 

Credit: Mohamed_hassan, Pixabay, CC0 (https://creativecommons.org/publicdomain/zero/1.0/)




People with a relatively neurotic personality report having more frequent sexual fantasies, while people who are relatively conscientious or agreeable report less frequent fantasizing. Emily Cannoot of Michigan State University, U.S., and colleagues present these findings from their new 5,225-person study in the open-access journal PLOS One on February 4, 2026.

Prior research suggests that sexual fantasies are common and might benefit people’s happiness and relationships. A deeper understanding of links between people’s personality characteristics, how often they have sexual fantasies, and what they tend to fantasize about could help inform efforts by clinicians and mental health professionals to improve sexual wellbeing. However, few studies have explored potential links between personality traits and sexual fantasies.

Cannoot and colleagues analyzed data from 5,225 adults in the U.S. who completed two standardized questionnaires. The first captured overall frequency of fantasizing as well as frequency of fantasizing about certain themes, which fell into four broad categories: exploratory (including the theme “participating in an orgy”), intimate (including “making love outdoors in a romantic setting”), impersonal (including “watching others have sex”), or sadomasochistic (including “being forced to do something”).

The second questionnaire captured the widely accepted Big Five personality traits: extraversion, agreeableness, conscientiousness, neuroticism, and open-mindedness. It also assessed subcomponents of the Big Five; for instance, depression or anxiety as facets of neuroticism, and compassion or respectfulness as facets of agreeableness.

Statistical analysis of the data showed that people who scored high in conscientiousness and agreeableness reported less frequent sexual fantasizing across all four categories. Taking a closer look, those results were primarily driven by respectfulness and responsibility. Meanwhile, people with a high neuroticism score, in particular those with more depressive personalities, reported more frequent sexual fantasies. No significant associations were seen between extraversion or open-mindedness and frequency of sexual fantasies. These findings held true across the four different categories of fantasies.

Future research could expand on these findings, such as by including participants from other countries or examining whether people’s personalities and sexual fantasizing habits co-develop over time.

The authors say: “One implication of the current work is that individual differences in personality might be useful in predicting variation in sexual fantasy frequencies, although they are not wholly redundant with each other (and some associations are relatively small or modest). Knowing these associations further advances the predictive power of personality while showing that variation in sexual fantasies is common.”

 

 

In your coverage, please use this URL to provide access to the freely available article in PLOS Onehttps://plos.io/463regT

Citation: Cannoot E, Moors AC, Chopik WJ (2026) Associations between big five personality traits, facets, and sexual fantasies. PLoS One 21(2): e0329745. https://doi.org/10.1371/journal.pone.0329745

Author countries: U.S.

Funding: The author(s) received no specific funding for this work.