Foveated Search Model for Real World Scenes - Studying eye movements has been central to understanding active vision, attention, and cognition. Computational models have helped advance the field by assessing the visual features and computations guiding eye movements and have helped understand human visual and cognitive dysfunctions. However, even with 20 years of computational models, we are still far from adequately modeling eye movements and decisions in natural tasks with real-world images. Models often miss incorporating how our vision degrades towards the visual periphery, do not incorporate a human’s intention (task), and critically do not have a learned understanding of scenes and objects nor language to guide the fixations. Our goal is to combine developments in powerful vision Transformer models with computational models of human vision to create a Foveated Search Transformer Model (FST) that can understand simple linguistic instructions to execute eye movements that gather information for the task with an understanding of other objects in the scene. Our work will focus on visual search for objects in real-world scenes “never seen” by the model. We hypothesize that the developed FST model will reach human accuracy levels and will capture some of the landmark eye movement behaviors such as manipulations of context (location, size, and semantic relationship of the target object to the surrounding scene). ). We also hypothesize that the model will predict human behavior and fixations better than baseline models such as Saliency, Deep Gaze, and a version of the FST model with disabled contextual understanding. To achieve our goal, we propose two specific aims. SA1. To develop a Foveated Transformer Search (FST) model that learns eye movements that are task-optimizing, understands scene semantics, and captures landmark contextual effects of human search; SA2. To develop a visual-language Foveated Search Transformer (FST-L) model that can interpret language and search for specific targets with descriptive details provided in a sentence. The developed FST models will be compared to human eye movements and search decisions as well as baseline models. If successful, the newly developed model will open many new avenues of research on eye movements with more naturalistic tasks and allow prediction of the functional impact of visual disorders in eye movements and subsequent perceptual decisions. The model will also provide a tool to expand current investigations of search-related neural activity using computational models.