Reliable Question-Answering Frameworks for Clinical Decision Support using Domain-specific Large Language Models - PROJECT SUMMARY/ABSTRACT Timely and accurate clinical decision-making is critical for the quality of healthcare delivery, impacting everyone from individual patients to entire public health systems. Clinicians often raise questions in their practice for decision-making (averaging two questions for every three patients seen), but rarely have time or resources to get evidence-based answers, leading to sub-optimal patient care decisions and even diagnostic error. This is particularly true for emergency departments (EDs) with chaotic, time-pressured, and high-stakes decision environments. Artificial intelligence (AI) driven question-answering (QA) systems can fill this gap, by providing real-time answers and predictive analytics, aiding clinicians in timely, accurate decision-making. Addressing this critical need, the rise of Large Language Models (LLMs), offers a transformative approach to understand complex questions and generate human-like responses. Despite their promise, two critical issues hinder the adoption of LLMs in clinical practice. The foremost challenge is their unreliability. LLMs can generate incorrect medical information, which has devastating outcomes such as misdiagnosis. The second hurdle is the lack of transparency. Many of these systems produce answers without providing reasoning and justification, making their responses less useful and undermining the trust of clinicians. The overall objective of this proposal is to develop and validate a clinically reliable and transparent LLM-based QA system and translate it into a clinical chatbot for clinical decision support, providing clinicians with accurate evidence-based information in high-stakes scenarios like EDs. During the K99 phase, I will develop novel clinically accurate LLMs (CliniGPT) with multi-modality clinical data guided by the clinical-specific pre-training and fine-tuning framework (Aim 1). During the R00 phase, I will develop and validate the retrieval-augmented medical QA (CliniQARet) framework, to guide CliniGPT in generating reliable answers to clinical questions in the ED setting (Aim 2). Using the best model from Aim 1 and Aim 2, I will build the clinical chatbot following user-centered principles, delivering evidence-based, timely support for common ED scenarios including chest pain, headache, fever, and abdominal pain, to enhance decision-making. I will develop and validate the software in a simulated EHR environment using real patient data and recruiting ED clinicians (Aim 3). The expected outcomes are a real-time, user-centered ED clinical chatbot; open-source clinically accurate LLMs; an open-source reliable and trustworthy clinical QA framework; an open-source framework for pretraining, fine-tuning, and evaluating clinical LLMs focusing on reliability; an open-source framework of constructing and integrating multi-modal clinical datasets to enrich and ground the system’s clinical knowledge. During the K99 phase, the PI will be mentored by experts in clinical NLP and LLM, emergency medicine, and clinical informatics, and requires additional training in clinical, evidence-based and emergency medicine. This application will provide the necessary training to supplement the PI’s expertise in clinical NLP and clinical medicine and help her transition into an independent career in biomedical data science.