Addressing Factual Inaccuracy and Unfaithful Reasoning of Large Language Models in Biomedicine and Healthcare - Large Language Models (LLMs) represent the latest advancement in Natural Language Processing (NLP) and Artificial Intelligence (AI), holding tremendous potential to revolutionize biomedical and healthcare applications. Extensive research has demonstrated the effectiveness of LLMs in a range of biomedical and health applications, ranging from medical question answering to summarizing systematic reviews and AI-assisted disease diagnosis. However, the major barriers to applying LLMs in biomedical and health applications are factual incorrectness – where LLMgenerated responses are inaccurate or incomplete – and incorrect reasoning – where LLM-generated responses lack supporting evidence, contradict existing evidence, or even rely on hallucinated evidence. Such issues further pose the risk of propagating errors, potentially leading to incorrect diagnosis or treatment recommendations. Addressing these issues has been challenging, primarily due to three fundamental obstacles: (1) from the data perspective, LLMs may capture errors from lower-quality or unauthorized sources in the general domain data during pretraining, lack access to accurate and up-to-date biomedical knowledge, and consequently generate inaccurate, or outdated results; (2) from the methods perspective, there is a lack of mechanisms for fact-checking and evidence attribution throughout the lifecycle of LLMs when applied to biomedical and health studies, spanning from training/fine-tuning to inference and posthoc analysis; (3) from the accountability perspective, few approaches have evaluated their effectiveness in biomedical and health downstream applications. Our overall objective in this proposal is to systematically address the issue of factuality and reasoning of LLMs in biomedicine and healthcare. The specific aims include (1) from the data perspective, establishing a self-augmentation framework to teach LLMs to automatically select and use relevant biomedical digital resources to augment their responses; (2) from the methods perspective, developing an LLM curator by stimulating fact-checking and evidence attribution performed in biocuration via a multi-stage, multitask instruction tuning pipeline; (3) from the methods perspective, introducing a steplevel automated feedback-guided paradigm for LLMs to reflect and improve from its intermediate responses via fact-checking and evidence attribution; and (4) from the accountability perspective, evaluating the methods in downstream use cases. The proposed work is expected to address factuality and reasoning issues of LLMs – the key barrier to their use in biomedical and health domains – and make LLMs generate accurate responses to advance biomedical discovery and healthcare. It is also expected to refine the current development and evaluation pipelines of LLMs in biomedical and health domains by making fact-checking and evidence attribution essential components and providing related benchmarks, methods, and tools to facilitate the implementation.