Generating Personalized Synthetic Speech for Progressive Dysarthria Using Severity-Appropriate Adaptation Strategies for Neural Text-to-Speech and Voice Conversion - PROJECT SUMMARY More than 2 million Americans have a complex communication disorder that impairs their ability to talk. The loss of speech is among the most debilitating effects of neurological diseases like amyotrophic lateral sclerosis (ALS), where 95% will progressively lose their ability to speak and get trapped in a state of isolation. Communication devices with electronic voice output allow patients to augment or replace verbal communication as their speech deteriorates. The text (alphabet, messages) available on these devices is accessed directly using functioning body parts (fingers, head, eyes), and the selected text is converted to speech through text-to-speech (TTS) technology. Electronic TTS voices available on current devices have limited options in terms of age, sex, and/or dialect, which diminishes the experience of a genuine discourse because neither the user nor their communication partner can relate to the device voice. Voice is an integral part of a person’s identity and without a voice that captures this identity, users tend to withdraw from interactions, greatly reducing their quality of life, and leading to low acceptance of the technology. Personalized TTS voice options are a critical need for the ALS population in order for them to be able to communicate freely in the face of major life changes. The long-term goal of this research is software-based, high-performance personalized speech synthesis that can be used on mobile platforms and commercial speech devices by people with communication disorders. Our short-term goal is to investigate innovative methods that leverage state-of-the-art, end-to-end neural TTS, to generate intelligible, natural, and personalized synthetic speech for people who already exhibit speech loss from ALS. Neural TTS has significantly outperformed the previous generations of TTS technology, and has lowered the barrier to develop high-quality TTS systems. While it is clearly desirable to use neural TTS, the need for large quantities of high-quality speech data prohibits training such a system directly for those with ALS. We address this problem through our two specific aims in this exploratory project: (i) adapt neural TTS output by using voice conversion to personalize TTS voice options for ALS and (ii) adapt neural TTS input features and network parameters to personalize TTS voice options for ALS. Our methods for both aims will preserve TTS speech intelligibility and naturalness while enhancing voice similarity, by using modest amounts of speech data from persons with ALS. Our adapted neural TTS system is expected to generate personalized synthetic speech that has the voice characteristics of individual ALS users along with intelligibility and naturalness to promote communication and listening comfort. The project goals align with NIH-NIDCD’s priority area related to “Advancing Research in Novel Augmentative and Alternative Communication (AAC) Approaches”. The project outcomes are expected to provide a significant number of people who have communication disorders from varying etiologies (ALS, stroke, trauma) with personalized vocal expression and social identity.