PROJECT SUMMARY/ABSTRACT
Large Language Models (LLMs) represent the latest advancement in Natural Language Processing (NLP) and
Artificial Intelligence (AI), holding tremendous potential to revolutionize biomedical and healthcare applications.
Extensive research has demonstrated the effectiveness of LLMs in a range of biomedical and health
applications, ranging from medical question answering to summarizing systematic reviews and AI-assisted
disease diagnosis. However, the major barriers to applying LLMs in biomedical and health applications are
factual incorrectness – where LLM-generated responses are inaccurate or incomplete – and unfaithful
reasoning – where LLM-generated responses lack supporting evidence, contradict existing evidence, or even
rely on hallucinated evidence. Such issues further pose the risk of propagating misinformation, potentially
leading to misdiagnosis or incorrect treatment recommendations. Addressing these issues has been
challenging, primarily due to three fundamental obstacles: (1) from the data perspective, LLMs may capture
misinformation from lower-quality or unauthorized sources in the general domain data during pretraining, lack
access to accurate and up-to-date biomedical knowledge, and consequently generate inaccurate, outdated, or
unfaithful results; (2) from the methods perspective, there is a lack of mechanisms for fact-checking and
evidence attribution throughout the lifecycle of LLMs when applied to biomedical and health studies, spanning
from training/fine-tuning to inference and post-hoc analysis; (3) from the accountability perspective, few
approaches have evaluated their effectiveness in biomedical and health downstream applications. Our overall
objective in this proposal is to systematically address the issue of factuality and unfaithful reasoning of LLMs in
biomedicine and healthcare. The specific aims include (1) from the data perspective, establishing a self-
augmentation framework to teach LLMs to automatically select and use relevant biomedical digital resources to
augment their responses; (2) from the methods perspective, developing an LLM curator by stimulating fact-
checking and evidence attribution performed in biocuration via a multi-stage, multi-task instruction tuning
pipeline; (3) from the methods perspective, introducing a step-level automated feedback-guided paradigm for
LLMs to reflect and improve from its intermediate responses via fact-checking and evidence attribution; and (4)
from the accountability perspective, evaluating the methods in downstream use cases. The proposed work is
expected to address factual incorrectness and unfaithful reasoning of LLMs – the key barrier to their use in
biomedical and health domains – and make LLMs generate accurate and trustworthy responses to advance
biomedical discovery and healthcare. It is also expected to refine the current development and evaluation
pipelines of LLMs in biomedical and health domains by making fact-checking and evidence attribution essential
components and providing related benchmarks, methods, and tools to facilitate the implementation.