Data-driven and science-informed methods for the discovery of biomedical mechanisms and processes - Abstract Text Data-driven discovery methods are a novel class of methodologies and computational approaches, revolutionizing the modeling, prediction, and control of complex systems, while remaining scientifically explainable and interpretable. These methods learn governing equations directly from data and have found considerable success in a wide range of applications including turbulence, climate, robotics, and autonomy. However, the first generation of these methods has proven poorly suited to the study of biomedical data. To realize the full potential of data-driven approaches, they must be extended and adapted to deal with the noise, sparsity, and variability intrinsic to experiments with living organisms. My group has extended the seminal Sparse Identification of Nonlinear Dynamics (SINDy) method to the Weak form SINDy (WSINDy). Weak form equations are a transform of the original data that enables learning of the equations even in the presence of substantial noise and sparsity. The approach effectively recasts scientific discovery from proposing and validating/refuting a single scientific hypothesis to simultaneously proposing (in many cases) more than 10^180 hypotheses and using sparse regressing to prune the hypotheses which are not supported by the data. Moreover, our approach currently takes on the order of minutes on a standard laptop. The overarching goals of this research are to use the WSINDy method to investigate the 1) individual cell-based drivers for collective cell migration and 2) data-driven inference for unobserved processes in infectious disease dynamics as well as 3) extend WSINDy to infer stochastic dynamical systems and discover critical, but hidden, compartments. The first goal continues a long collaboration with Xuedong Liu (CU-Boulder). We have adapted WSINDy to create individualized models of each cell in a migrating colony. We learn the interaction rules and can classify them according to cell type. The plan is to continue expanding the capabilities of WSINDy in this context to hopefully learn the biochemical dynamics unique to each cell. This would be the first coupling of data-driven models for inter- and intra-cell processes. It will lead us closer to understanding how cells make decisions that lead to the emergent collective motion in wound healing. The second goal expands a collaboration with Beth Carlton (an epidemiologist) in infectious disease dynamics centered around the COVID-19 modeling team (of which we are both members). During our efforts to develop a compartmental model for advising the State Epidemiologist and the Governor, several questions arose that could be efficiently answered by extensions to WSINDy. In particular, we will develop data-driven inference for infection and recovery rates as well as the distribution of dwell times in the infection timeline. The last goal involves extensions of WSINDy to learn models for situations that frequently arise in biomedical phenomenon. First, we plan to learn stochastic dynamical systems. Previous efforts were only able to infer either drift term or mean field equations. By recasting WSINDy to evaluate moments of the data, we can learn the stochastic models directly. Lastly, inference regarding unobserved compartments is challenging, but via an extension to WSINDy we plan to discover unobserved variables and their equations.