Expanding articulatory information from ultrasound imaging of speech using MRI-based image simulations and audio measurements - PROJECT SUMMARY
Ultrasound imaging provides articulatory feedback useful for remediating speech sound disorders,
which affect 5% of children and cause long-term deficits in social health and employment in adulthood.
However, ultrasound imaging can be difficult to interpret for clinicians and individuals, limiting the
understanding of articulatory data and ultrasound biofeedback therapy speech outcomes. A likely source of
difficulty is the articulatory information missing from ultrasound images, such as the tongue tip and reference
vocal tract structures (e.g., palate) that cannot be consistently imaged with ultrasound due to air.
Much of this missing information from ultrasound can be ascertained in magnetic resonance imaging
(MRI) because MRI images the entire vocal tract. Comparing ultrasound images and MRI will improve
interpretation of ultrasound images by confirming that certain characteristics of ultrasound images (e.g.,
obscured tongue tip, double edge artifacts) occur from characteristics of tongue shapes; as well, models can
be trained to predict from ultrasound images the articulatory information shown in MRI. However, articulatory
variability prevents direct comparison between these images. A novel approach to avoid variability is to
simulate ultrasound wave propagation in tissue segmented from MRI. Recent advancements in deep learning
have also demonstrated ability to address the inverse problem of predicting articulation from acoustic data.
Thus, to meet the needs of improving ultrasound image interpretation, the goal for this proposal is to use
simulated ultrasound images and neural network models to characterize and predict articulatory information
missing from 2D midsagittal ultrasound images. These models will be trained on MRI and audio data.
We will characterize missing articulatory information by developing efficient simulation of ultrasound
images from MRI tissue segmentation. One hypothesis that will be tested is the guideline for using the lower
edge of double edge artifacts in ultrasound images as the tongue surface. To test this guideline for a greater
range of data (including disordered child speakers and different simulated probe rotations), double edge
artifacts will be compared with tissue maps used to generate the simulated images. Another comparison will
estimate the amount of tongue tip typically missing in /r/ tongue shapes. We will then develop a deep learning
model that trains on information from MRI to predict midsagittal vocal tract shapes (including the tongue tip and
palate) from the inputs of tongue contours from ultrasound and audio. With these aims, we will add insight to
ultrasound imaging for speech and provide a tool with future applications in expanding articulatory information,
e.g., testing outcomes of using more complete vocal tract information in ultrasound biofeedback therapy.
Training for this fellowship will occur at the University of Cincinnati, with opportunities to visit labs at two
additional institutions. The proposed plan provides training from a range of investigators in topics such as
ultrasound imaging and application to speech research, developing skills needed for my future goals.