PROJECT SUMMARY/ABSTRACT
Timely and accurate clinical decision-making is critical for the quality of healthcare delivery, impacting everyone
from individual patients to entire public health systems. Clinicians often raise questions in their practice for
decision-making (averaging two questions for every three patients seen), but rarely have time or resources to get
evidence-based answers, leading to sub-optimal patient care decisions and even diagnostic error. This is
particularly true for emergency departments (EDs) with chaotic, time-pressured, and high-stakes decision
environments. Artificial intelligence (AI) driven question-answering (QA) systems can fill this gap, by providing
real-time answers and predictive analytics, aiding clinicians in timely, accurate decision-making. Addressing this
critical need, the rise of Large Language Models (LLMs), offers a transformative approach to understand complex
questions and generate human-like responses. Despite their promise, two critical issues hinder the adoption of
LLMs in clinical practice. The foremost challenge is their unreliability. LLMs can generate incorrect medical
information, which has devastating outcomes such as misdiagnosis. The second hurdle is the lack of transparency.
Many of these systems produce answers without providing reasoning and justification, making their responses
less useful and undermining the trust of clinicians. The overall objective of this proposal is to develop and validate
a clinically reliable and transparent LLM-based QA system and translate it into a clinical chatbot for clinical
decision support, providing clinicians with accurate evidence-based information in high-stakes scenarios like EDs.
During the K99 phase, I will develop novel clinically accurate LLMs (CliniGPT) with multi-modality clinical data
guided by the clinical-specific pre-training and fine-tuning framework (Aim 1). During the R00 phase, I will develop
and validate the retrieval-augmented medical QA (CliniQARet) framework, to guide CliniGPT in generating
reliable answers to clinical questions in the ED setting (Aim 2). Using the best model from Aim 1 and Aim 2, I will
build the clinical chatbot following user-centered principles, delivering evidence-based, timely support for common
ED scenarios including chest pain, headache, fever, and abdominal pain, to enhance decision-making. I will
develop and validate the software in a simulated EHR environment using real patient data and recruiting ED
clinicians (Aim 3). The expected outcomes are a real-time, user-centered ED clinical chatbot; open-source
clinically accurate LLMs; an open-source reliable and trustworthy clinical QA framework; an open-source
framework for pretraining, fine-tuning, and evaluating clinical LLMs focusing on reliability; an open-source
framework of constructing and integrating multi-modal clinical datasets to enrich and ground the system’s clinical
knowledge. During the K99 phase, the PI will be mentored by experts in clinical NLP and LLM, emergency
medicine, and clinical informatics, and requires additional training in clinical, evidence-based and emergency
medicine. This application will provide the necessary training to supplement the PI’s expertise in clinical NLP and
clinical medicine and help her transition into an independent career in biomedical data science.