Faye (Jiaqi) Feng

Thesis Title (PhD)

EXPERT PERFORMERS’ MULTISENSORY INTEGRATION DURING SPEECH-IN-SPEECH RECOGNITION AND ADAPTATION TO ‘NOISE’ IN ADVERSE CONDITIONS

“Can you hear me okay?”, a classic question before kicking off video calls, is to reaffirm that the meeting participants can hear the talker clearly. Noise comes in various forms as part of our everyday lives. When the audio is disturbed by background noise or overlapping speeches either in a Zoom meeting or during the face-to-face conversation, a thumbs up, a nod, or eye contacts from the talker ‘communicate’ by conveying informative cues to listeners. Visual cues ‘speak’ without words.

In a noisy listening environment, hearers grasp informative cues available in the visual channel or from multiple sources to perceive obscured speech sounds. The rationale behind is that hearers adapt their perception to multimodal information in the presence of noise, enabling multisensory integration (Stein & Meredith, 1993). This makes it easier for hearers to perceive speeches against noise (e.g., Begau et al., 2021; Crosse et al., 2016).

However, the benefit of multisensory integration in speech perception tops out at a certain level. Specifically, the degree of the audiovisual enhancement is subject to the level difference between noise and a target speech (e.g., van de Rijt et al., 2019). There has been little research on whether and to what extent multisensory integration can improve individual hearers’ speech perception against a variety of noise or ‘noisy’ interferences, such as when listeners of different language proficiency must perceive two target speeches at the same time. The proposed research is thus framed within multisensory integration theories to ‘unravel the unknown’ by investigating an expert group of language users, interpreters, in an extremely challenging listening condition, simultaneous interpreting. Simultaneous interpreters speak while listening about 70% of the time (Chernov, 1994) to be able to complete the demanding linguistic task that requires a high level of language proficiency and speech processing skills. A systematic empirical study on how expert language users may take advantage of audiovisual integration to adapt to the overwhelming listening condition, such as simultaneous interpreting, will provide valuable implications for language learners and a wider group of hearers to utilise informative visual cues, strategically adapting to the intricate ‘noisy’ world.

Supervisory Team

Yashyuan (Michael) Jin, School of Modern Languages, Newcastle University

Kai Alter, Faculty of Medical Sciences, Newcastle University