Circuit mechanisms of arbitration between distinct reinforcement learning systems - PROJECT SUMMARY Animals can exhibit goal-directed behaviors in novel environments, despite limited experience with them. How does the brain make and use inferences about the underlying statistics and generative structure of environments to guide behavior? The field of reinforcement learning refers to this capacity as “model-based” reasoning, meaning that it relies on an internal model of the structure of the world. Critically, this internal model can be used to flexibly estimate the best actions by mental simulation or planning, without direct experience. In contrast, in “model-free” reinforcement learning, an agent chooses the best action based on direct experience, without explicit knowledge of the underlying sequential transition structure of a task or environment. Model-based and model-free mechanisms coexist in the brain and are mediated by distinct circuits, although the neural circuit mechanisms by which the brain arbitrates between these decision systems remains unknown. Theoretical and behavioral studies suggest that human brains use the system that yields value estimates with the lowest uncertainty. The lateral orbitofrontal cortex (lOFC) is a compelling candidate to perform arbitration because while it is implicated in model-based reasoning, for instance by enabling inferences about hidden task states, it lies upstream of the dorsal striatum, which is critical for both model-based and model- free decision making. Intriguingly, we have found that lOFC neurons project exclusively to the dorsolateral striatum (DLS), a region critical for model-free behavior, and not the dorsomedial striatum (DMS), which is critical for model-based behavior. We hypothesize that projection specific neural circuits in lOFC arbitrate between these systems by suppressing the model-free system. I will use state-of-the-art viral, electrophysiological, and computational methods to determine whether DLS-projecting lOFC neurons mediate uncertainty-based arbitration between decision-making systems (Aim 1) and characterize the underlying circuit logic that supports arbitration (Aim 2). By optogenetically tagging DLS-projecting lOFC neurons I will selectively characterize and perturb their activity while monitoring the behavioral strategy rats use in a task with latent structure. To determine how arbitration is instantiated in the dorsal striatum I will optogenetically activate OFC→DLS neurons while recording from different genetic cell types in the striatum, in vivo and in vitro. We predict that OFC→DLS neurons enable model-based behavior by activating inhibitory interneurons to suppress the DLS and the model-free system.