SURVEY ON DEEP LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
Abstract
The term "visual speech," which refers to the visual domain of speech, has gained popularity because of its many uses in fields like public safety, healthcare, military defense, and entertainment. Deep learning methods have greatly aided the advancement of visual speech learning as a potent AI tactic. Recently, spontaneous audio-visual speech recognition systems (AVSRs) have demonstrated remarkable performance, particularly in tasks with restricted vocabulary, by significantly outperforming human speech recognition capabilities, particularly in acoustically loud environments. Globally, research and development of spontaneous speech identification systems on the basis of the processing of audio and visual data is ongoing. The focus of this paper is to analogize various deep learning approaches for AV speech recognition.
Author
RADHIKA SREEDHARAN
Download