Machine Learning-Based Predictive Models for Disease Cure and Computationally Efficient Methods in High-Dimensional Settings - Cure and Computationally Efficient Methods in High-Dimensional Settings With recent advancements in screening, diagnosis, and treatment, early-stage detection of diseases has become commonplace, leading to a substantial number of clinically cured patients. In the realm of early-stage diseases, timely identification of cured patients based on pre-treatment characteristics is crucial to protect them from the added risks of high-intensity treatments. Equally vital is the early identification of uncured patients, enabling prompt intervention before their conditions progress to advanced stages with limited therapeutic options. Thus, there is an immense need for predictive models that can leverage patient survival data and available information on patient-related characteristics (or features) for highly accurate predictions of cured or uncured status. Existing state-of-the-art models capable of such prediction come with several drawbacks that make them hard to meet the increasing needs for advanced applications. These include a lack of biological motivation and rigid model assumptions, issues of non-robustness and global convergence with associated estimation methods, inefficiency in handling high-dimensional data, limited integration into clinical practice due to knowledge gaps in evaluating predictive accuracy and model utility, and the absence of readily available software packages, demanding rich programming expertise for successful implementation. This proposal seeks to address the aforementioned issues by developing next-generation models and associated methods that prioritize reduced complexity and lower computational costs. The goal is to achieve highly accurate predictions of the cured or uncured status, particularly when dealing with high-dimensional and/or unstructured data. The innovative approach involves integrating machine learning (ML) with modern predictive models to capture complex patterns within the data. It is hypothesized that capturing such complex patterns will significantly enhance the accuracy of predicting cures and lead to improved predictions of the survival distribution for uncured patients. The long-term goal is to study complex processes of disease occurrence (e.g., birth and death processes of competing risks) and explore the integration of advanced ensemble learning algorithms to develop more powerful predictive models. Building on previous experience in model and method development, the PI's research program will: (i) develop novel biologically motivated ML-based predictive models that can capture the patient population as a mixture of cured and uncured patients; (ii) develop new computationally efficient estimation and feature selection methods capable of handling high-dimensional/unstructured data; (iii) develop new methods for assessing model's predictive accuracy and its clinical utility, while examining the distinctions between these two metrics; and (iv) develop model validation methods and R software packages to facilitate the seamless implementation of the proposed models and methods. Successful completion of this research will aid in treatment assignment and the need to develop effective adjuvant therapies for the overall benefit of patients.