Supervised Weighted Distances: Patient Similarity to Explain Clinical AI Models - Summary Explaining complex, multivariate predictive models to clinicians and life scientists is difficult, but it is necessary to increase their trust and to allow them to properly apply the model to data from their own patients or study participants. Although certain model visualizations may be helpful for data scientists, they are not always helpful for clinicians and life scientists. Explaining that similar patients get similar estimates and visualizing such patient neighborhoods in a consistent manner, regardless of the type of underlying model (e.g., logistic regression, random forests, deep neural networks, etc.) may be helpful to them. In the context of a specific predictive model (e.g., a polygenic risk score, a deterioration index), seemingly very dissimilar patients may get close estimates if many variables are considered: there could be a mismatch between a neighborhood-based explanation and model behavior. A way of preventing this mismatch is to build context- based patient neighborhoods for visualization. Context is captured by considering for the construction of patient neighborhoods only the dimensions (or variables) that are important for the predictive model. We find that context-based visualizations can work for deep neural network models based on real-world data and we propose to develop robust algorithms (using both data- and clinical-knowledge-driven approaches) and a tool that will help clinicians and life scientists understand the estimates of predictive models by looking into these patient neighborhoods built from supervised weighted distances for different clinical problems. We also propose to develop versions of the algorithms that can work with federated databases in a privacy-protecting manner. Our algorithms and tools will be developed in coordination with clinician-informaticians who will select relevant models, develop representative test cases, and help recruit clinicians for a formative evaluation. We will develop and evaluate our visualization tool under the guidance of human-computer interaction experts. We expect the neighborhood-based explanation to be intuitive and interactive, so clinicians can ask “what if” questions and explore the placing of various patients in models built for their clinical domains: emergency medicine, pulmonary medicine, pediatrics, internal medicine and endocrinology medicine. These models can be ones developed in- house by our team or by others: the privacy-protecting federated version of our algorithm and tool will allow the proposed visualizations to work even without having direct access to the database of patients used to build the models.