PROJECT SUMMARY
Early word learning is a major developmental achievement that rests on a foundation of visual category
learning: to learn that the word “dog” refers to a category dog that includes chihuahuas and excludes wolves,
children must make an impressive visual generalization. However, deep neural networks—our best models of
category learning—are unable to learn from the same visual diet as children, limiting our ability to construct
mechanistic accounts of early category and word learning. While infants learn the categories that words refer
to while experiencing a few categories (e.g., spoons, cups) dramatically more often than others (and while
experiencing certain categories as drawings or illustrations), current models learn from uniform distributions of
categories where exemplars are photos taken from the adult perspective. The proposed work will overcome
these limitations and use deep neural networks to understand how children’s everyday visual experiences
interact with statistical learning mechanisms to yield the category representations that support early word
learning. In Aim 1 (K99 phase), I will determine how variability in children’s visual experiences relates to early
word learning outcomes. To do so, I will collect a representative dataset of the categories in the infant view
using a parent-report measure and photographs taken from the infant perspective, and determine whether
variance in visual experience with different categories predicts which words are learned earlier in development.
In Aim 2 (K99/R00 phase) I will evaluate how well current models and infants learn from diverse sets of
realistic visual inputs using looking-time experiments and model simulations, evaluating whether networks with
more neurally plausible architectures are better predictors of infant learning. In Aim 3 (R00 phase), I will adapt
an existing deep neural network for infant categorization. To do so, I will build output layers on top of a
state-of-the-art unsupervised model of object segmentation to identify the categories in the infant view and to
make principled generalizations from frequently experienced to infrequently experienced but similar
categories—much like young children in early development. The empirical findings and resulting computational
model will provide insight into the relevant visual experiences for learning the categories that words refer to.
This understanding of how typically-developing children learn rapidly and efficiently in everyday environments
is essential to improve interventions for children struggling to learn the categories that words refer to, including
late talkers, children with ASD, and children recovering from blindness (e.g., after cataract surgery). This award
will build upon my strong background in visual category recognition and provide me with relevant training in
both early language acquisition and deep neural networks via interdisciplinary workshops, coursework, and the
scientific expertise of a team of mentors and consultants. This award will thus facilitate my transition to become
an independent investigator at the forefront of cognitive development, vision science, and machine learning.