From Human-Powered to Automated Video Description for Blind and Low Vision Users - Project Summary Approximately 12 million people in the United States have been diagnosed with a visual impairment. These individuals face unique challenges in our modern environment, where much critical information related to education, employment, entertainment, and community is presented in the form of digital videos. Inaccessible information can result in social exclusion or become life threatening if individuals require access to it in order to make decisions related to their health and safety. For example, in a personal or global health crisis, individuals may need to access the mass amounts of information conveyed via videos or dynamic infographics in order to make informed decisions. To address this need, the online platform YouDescribe allows blind and low vision (BLV) users to request amateur volunteers to create video descriptions, also referred to as audio descriptions (AD), of YouTube videos. However, the platform has been unable to keep up with the overwhelming demand, and 92.5% of videos on the YouDescribe user wish list remain undescribed. The overall objective of this proposal is to build an AI-driven system, suitable for use on a wide-scale, to automatically generate descriptions of online videos, as well as answer questions asked by BLV users about the content of videos. The rationale for this project is that AI-based tools are necessary to facilitate timely access to the deluge of new videos appearing on the Internet every day. The proposed work encompasses three specific aims: 1) develop an AI-based tool in collaboration with sighted describers that more efficiently produces video descriptions and increases the availability of accessible videos. The goal is to create an AI-driven NarrationBot that will decrease the time required for novice volunteers to produce video descriptions by 80%; 2) develop an AI-based tool in collaboration with BLV individuals that offers user-driven access to visual information in online videos. The goal is to develop an AI-driven QABot that allows users to pause a video, ask questions about content, and receive immediate answers (e.g., “What breed is the dog?”, “German shepherd”) that are accurate 80% of the time; and 3) develop and publicly release large-scale datasets to improve machine learning for video accessibility. These novel datasets will be used to increase the quality and accuracy of NarrationBot and QABot until AI-generated descriptions and answers need minimal intervention from human volunteers and can serve BLV users directly. The proposed research is innovative because it focuses on videos, whereas existing AI-driven efforts to address this problem have focused primarily on static photos or images. It is also one of only a few efforts to directly partner with BLV individuals to develop AI-driven systems that produce visual descriptions or answer visual questions. The proposed research is significant because it will result in open-source, AI-driven tools that will give BLV individuals unprecedented control over their ability to independently navigate the information-rich world of online videos, thus improving their health and wellbeing.