Linking the dopaminergic mechanisms of reinforcement learning and timing - Project Summary The ability to keep track of elapsed time is essential for many forms of reward-guided learning and behavior. Midbrain dopamine (DA) neurons are implicated in both reinforcement learning (RL) and timing, however, these two functions have largely been studied in parallel rather than under a unified framework. One of the most elegant demonstrations that DA dynamics contain timing information is found in trace conditioning tasks, when reward is unexpectedly omitted. This form of negative reward prediction error appears as a brief inhibition, or dip, in DA activity around the time of expected reward. While this DA dip is usually interpreted as a negative reinforcement signal, it also represents a highly effective neurophysiological readout of predicted reward timing. However, despite being a well-known phenomenon, it is still unclear how DA neurons learn to predict the timing of reward. This project seeks to address this significant gap in understanding by studying the neural circuit mechanisms underlying the timing of the DA reward omission dip. We will examine the role of DA signals and neural dynamics in the ventral striatum, an area implicated in reward and temporal processing. The project’s central hypothesis is that DA and striatal timing processes are interdependent, with the striatum relying on DA signals to learn a more refined representation of time, and DA neurons relying on this striatal code in order to predict the timing of reward, and thus undergo properly timed dips if reward is omitted. The project will combine experimental and computational approaches to measure, perturb, and model DA, striatal dynamics, and licking behavior in mice engaged in classical trace conditioning tasks, with rewards omitted on a subset of trials in order to probe timing processes. Aim 1 will examine how the temporal precision of DA reward omission dips and striatal dynamics changes across learning. This will provide a crucial test of the prediction that the representation of time in these circuits is a plastic property and not fixed, as assumed by most reinforcement learning models. Additionally, this will reveal whether specific striatal populations, namely either D1 or D2 receptor expressing projection neurons, become better at encoding time over the course of learning. Aim 2 will pursue causal evidence that DA and striatal dynamics mutually influence each other’s temporal coding properties, by transiently manipulating activity in one circuit and monitoring changes in the temporal precision of the other circuit. Here we will also determine whether these manipulations alter the temporal precision of licking behavior. Last, Aim 3 will seek to develop a computational framework for describing the experimental results. We will build and test predictions of temporal difference reinforcement learning (TDRL) as well as biologically inspired recurrent neural network models constrained by experimentally observed data. Taken together, this highly synergistic experimental and computational effort is expected to lead to a more unified understanding of the mechanisms underlying RL and timing.