LA REVUE GAUCHE - Left Comment

Tuesday, September 06, 2022

Researchers propose new and more effective model for automatic speech recognition

Consumer-focused new framework improves automatic speech recognition by including context information, an improved solution for video conferencing and live interviews

Peer-Reviewed Publication

TSINGHUA UNIVERSITY PRESS

Integrating pre-training for Acoustic Speech Recognition models — IMAGE: THE PHONETIC-SEMANTIC PRE-TRAINING (PSP) FRAMEWORK USES “NOISE-AWARE CURRICULUM” LEARNING TO EFFECTIVELY IMPROVE THE PERFORMANCE OF ASR IN NOISY ENVIRONMENTS. INTEGRATING WARM-UP, SELF-SUPERVISED LEARNING, AND FINE-TUNING. view more
CREDIT: CAAI ARTIFICIAL INTELLIGENCE RESEARCH, TSINGHUA UNIVERSITY PRESS

Popular voice assistants like Siri and Amazon Alexa have introduced automatic speech recognition (ASR) to the wider public. Though decades in the making, ASR models struggle with consistency and reliability, especially in noisy environments. Chinese researchers developed a framework that effectively improves the performance of ASR for the chaos of everyday acoustic environments.

Researchers from the Hong Kong University of Science and Technology and WeBank proposed a new framework - phonetic-semantic pre-training (PSP) and demonstrated the robustness of their new model against synthetic highly noisy speech datasets.

Their study was published in CAAI Artificial Intelligence Research on Aug. 28.

“Robustness is a long-standing challenge for ASR,” said Xueyang Wu from the Hong Kong University of Science and Technology Department of Computer Science and Engineering. “We want to increase the robustness of the Chinese ASR system with a low cost.”

ASR uses machine-learning and other artificial intelligence techniques to automatically translate speech into text for uses like voice-activated systems and transcription software. But new consumer-focused applications increasingly call for voice recognition to work better — handle more languages and accents, and perform more reliably in real-life situations like video conferencing and live interviews.

Traditionally, training the acoustic and language models that comprise ASR requires large amounts of noise-specific data, which can be time- and cost-prohibitive.

The acoustic model (AM) turns words into a “phones,” which are sequences of basic sounds. The language model (LM) decodes phones into natural-language sentences, usually with a two-step process: a fast but relatively weak LM generates a set of sentence candidates, and a powerful but computationally expensive LM selects the best sentence from the candidates.

“Traditional learning models are not robust against noisy acoustic model outputs, especially for Chinese polyphonic words with identical pronunciation,” Wu said. “If the first pass of the learning model decoding is incorrect, it is extremely hard for the second pass to make it up.”

The newly proposed framework PSP makes it easier to recover misclassified words. By pre-training a model that translates the AM outputs directly to sentence along with the full context information, researchers can help the LM efficiently recover from the noisy outputs of the AM.

The PSP framework allows the model to improve through a pre-training regime called noise-aware curriculum that gradually introduces new skills, starting easy and gradually moving into more complex tasks.

“The most crucial part of our proposed method, Noise-aware Curriculum Learning, simulates the mechanism of how human beings recognize a sentence from noisy speech,” Wu said.

Warm-up is the first stage, where researchers pre-train a phone-to-word transducer on a clean phone sequence, which is translated from unlabeled text data only — to cut back on the annotation time. This stage “warms up” the model, initializing the basic parameters to map phone sequences to words.

In the second stage, self-supervised learning, the transducer learns from more complex data generated by self-supervised training techniques and functions. Finally, the resultant phone-to-word transducer is fine-tuned with real-world speech data.

The researchers experimentally demonstrated the effectiveness of their framework on two real- life datasets collected from industrial scenarios and synthetic noise. Results showed that the PSP framework effectively improves the traditional ASR pipeline, reducing the relative character error rates by 28.63% for the first dataset and 26.38% for the second.

In next steps, researchers will investigate more effective PSP pre-training methods with larger unpaired datasets, seeking to maximize the effectiveness of pretraining for noise-robust LM.

Other contributors include Rongzhong Lian, Di Jiang, Yuanfeng Song, Weiwei Zhao, and Qian Xu, and Qiang Yang from WeBank Co. Ltd. Qian Xu and Qiang Yang are also affiliated with The Hong Kong University of Science and Technology.

CAAI Artificial Intelligence Research is a new journal jointly sponsored by Chinese Association for Artificial Intelligence (CAAI) and Tsinghua University. This is the first paper published in the journal.

About CAAI Artificial Intelligence Research

CAAI Artificial Intelligence Research is a peer-reviewed journal jointly sponsored by Chinese Association for Artificial Intelligence (CAAI) and Tsinghua University. The journal aims to reflect the state-of-the-art achievement in the field of artificial intelligence and its application, including knowledge intelligence, perceptual intelligence, machine learning, behavioral intelligence, brain and cognition, and AI chips and applications, etc. Original research and review articles from all over the world are welcome for rigorous peer-review and professional publishing support.

About SciOpen

SciOpen is a professional open access resource for discovery of scientific and technical content published by the Tsinghua University Press and its publishing partners, providing the scholarly publishing community with innovative technology and market-leading capabilities. SciOpen provides end-to-end services across manuscript submission, peer review, content hosting, analytics, and identity management and expert advice to ensure each journal’s development by offering a range of options across all functions as Journal Layout, Production Services, Editorial Services, Marketing and Promotions, Online Functionality, etc. By digitalizing the publishing process, SciOpen widens the reach, deepens the impact, and accelerates the exchange of ideas.

DOI

10.26599/AIR.2022.9150001

ARTICLE TITLE

A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

EP-WXT pathfinder catches first wide-field snapshots of X-ray universe

Reports and Proceedings

CHINESE ACADEMY OF SCIENCES HEADQUARTERS

EP-WXT Pathfinder targets a region of the Galactic center at the core of the Milky Way — IMAGE: EP-WXT PATHFINDER TARGETS A REGION OF THE GALACTIC CENTER AT THE CORE OF THE MILKY WAY. INSET SHOWS THE 800-SECOND TIME-LAPSE PHOTOGRAPH FROM THE OBSERVATION. view more
CREDIT: CAS/ESA/GAIA/DPAC

EP-WXT Pathfinder, the experimental version of a module that will eventually be part of the wide-field X-ray telescope (WXT) aboard the astronomical satellite Einstein Probe (EP), released its first results Aug. 27 from an earlier test flight. These include an 800-second X-ray time-lapse photograph of a region of the Galactic center, a dense area at the core of our home galaxy, the Milky Way.

These mark the first wide-field X-ray snapshots of our universe available to the public so far, captured by the first truly wide-field X-ray focusing imaging telescope ever flown in space.

The results were reported by scientists from the Chinese Academy of Sciences (CAS) at the Second China Space Science Assembly held in Taiyuan, China.

Since the first detection of X-ray signals from the depths of the universe 60 years ago, no wide-field X-ray focusing telescope has been available for X-ray surveys and monitoring until Pathfinder.

The Pathfinder was sent into orbit to verify the module's in-orbit performance. The experimental journey is meant to pave the way for the future in-orbit science operation of EP as it makes observations in the soft X-ray waveband.

EP will explore open questions in time-domain astrophysics through observation of transients. The mission is sponsored by CAS in cooperation with the European Space Agency (ESA) and the Max Planck Institute for Extraterrestrial Physics and is expected to fly by the end of 2023.

The WXT test module covers a field of view up to 340 square degrees (18.6°×18.6°) wide, which makes it the first truly wide-field X-ray focusing imaging telescope. X-ray imaging by bending light rays (focusing) is notoriously difficult due to the high energy of X-ray photons; and it is even more difficult to obtain clear images from a wide field of view. Thanks to a state-of-the-art technology called lobster-eye micropore optics, the test module boasts a field of view at least 100 times those of other focusing X-ray imagers. The complete WXT to fly aboard EP will be composed of 12 such identical modules, covering a field of view up to 3,600 square degrees wide.

During the test flight, Pathfinder conducted a total of four days of in-orbit experimental observations and obtained authentic X-ray spectra and images based on real measurements.

The key components of Pathfinder include the X-ray imaging mirror assembly, which features an array of 36 micropore lobster-eye plates and a focal-plane detector composed of four sets of large-format imaging sensors.

Even though these results are still preliminary and extensive data processing must be done, the test flight demonstrates that even a one-shot observation can cover X-ray sources from all directions within the observed patch of sky, including stellar-mass black holes and neutron stars. The observation also captured the brightening of X-rays from a binary system containing a neutron star. The data from these observations provide information about how X-ray radiation from such celestial bodies changes over time, as well as the X-ray spectra of these celestial bodies. The images and spectra resulting from the test observations are highly consistent with simulations.

The instrument also targeted a number of other X-ray sources, including the Large Magellanic Cloud (LMC), one of our neighboring galaxies. The results demonstrate that even a one-shot observation can cover the whole of this galaxy, detecting multiple X-ray sources, including black holes, neutron stars and supernova remnants. The instrument's clear imaging of a distant quasar, 3C 382, at a distance of 810 million light-years, reveals its capacity to detect relatively faint X-ray sources. In its future observations, the imager is expected to effectively monitor the X-ray variability of celestial bodies and discover new transient sources.

According to Dr. YUAN Weimin, principal investigator (PI) of the EP mission and researcher at the National Astronomical Observatories of the Chinese Academy of Sciences (NAOC), initial results show that "the instrument operates smoothly" and meets EP WXT module requirements. "It's exciting to see the decade-long effort bearing its first fruit," he smiled.

Other researchers involved with the EP mission were also satisfied.

Dr. ZHANG Chen, PI of the WXT mirror assembly, said the results promise "abundant, high-quality data" after the probe is launched.

Prof. Paul O'Brien, ESA-appointed scientist for the EP mission and researcher at the University of Leicester, said the results are "really impressive."

"We have been waiting for a true wide-field, soft X-ray imager for many decades, so it is wonderful to see the WXT test module in flight on EP-WXT Pathfinder," said Prof. Richard Willingale, Prof. O'Brien's colleague at the University of Leicester.

The preliminary X-ray “time-lapse photograph” (right) in 0.5–4 keV band as the result from a 700-second one-shot observation on the Large Magellanic Cloud (LMC), our neighbor galaxy, in comparison with the DSS optical image of LMC.

CREDIT

CAS/DSS

CAPTION

X-ray image of the Cygnus Loop nebula (2.5-degree diameter) obtained with several observations totaling 2,400 seconds.

CREDIT

CAS

Climate anxiety an important driver for climate action – new study

Peer-Reviewed Publication

UNIVERSITY OF BATH

The first-ever detailed study of climate anxiety among the UK adult population suggests that whilst rates are currently low, people’s fears about the future of the planet might be an important trigger for action when it comes to adapting our high-carbon lifestyles to become more environmentally friendly.

Interest in climate or eco-anxiety – characterised by the American Psychological Association as the chronic fear of environmental doom that comes from observing the impacts of climate change – has risen over recent years. A high-profile University of Bath study in 2021 found it to be particularly prevalent among young people right across the world.

This latest study, led by a team from the Centre for Climate Change and Social Transformations, also based at the University of Bath, sought the views of 1,338 UK adults over two time points (in 2020 and 2022) to delve deeper into the prevalence of climate anxiety, factors that predict it, and whether it could predict individual behavioural changes and climate action.

Despite over three-quarters of the UK public saying they are worried about climate change, only 4.6% of the public reported experiencing climate anxiety in 2022 (only fractionally higher than in 2020, when 4% reported this). Younger people and those with higher generalised anxiety were more likely to experience eco-anxiety.

However, climate anxiety was not always a negative; for many it could be a motivating force for taking action to reduce emissions. This included saving energy, buying second-hand, borrowing, renting, or repurposing items. Lifestyle changes such as cutting down on red meat were not related to climate anxiety, despite being highly effective for reducing emissions.

Significantly, the study found that media exposure – for example TV images of raging storms or heatwaves - rather than direct, personal experiences of climate impacts predicted climate anxiety. The authors say there are important implications of these findings for organisations responsible for communicating climate change.

The study published in the Journal of Environmental Psychology coincides with a new briefing paper from the Centre for Climate Change & Social Transformations focused on UK public preferences for low-carbon lifestyles. Its analysis suggests that lifestyle changes (for example, reducing car use or eating less meat), are increasingly seen as both feasible and desirable.

Environmental psychologist at the University of Bath, Professor Lorraine Whitmarsh MBE, led the study. She explained: “With increasing media coverage of climate impacts, such as droughts and fires in the UK and devastating flooding in Pakistan, climate anxiety may well increase. Our findings suggest this can spur some people to take action to help tackle the issue – but we also know there are barriers to behaviour change that need to be addressed through more government action.”

In the paper, the authors emphasise the importance of the media as a motivating force for the lifestyle changes required as we decarbonise. They suggest that the media and public discourse about climate anxiety has the power to create a positive vision for a greener, cleaner future which is significantly less dependent on fossil fuels.

Lois Player, co-author of the study also from the Department of Psychology at the University of Bath, explained: “Our results suggest that the media could play an important role in creating positive pro-environmental behaviour change, but only if they carefully communicate the reality of climate change without inducing a sense of hopelessness.”

JOURNAL

Journal of Environmental Psychology

DOI

10.1016/j.jenvp.2022.101866

METHOD OF RESEARCH

Survey

SUBJECT OF RESEARCH

People