Quality assurance for safe use of AI systems in radiotherapy - Project Summary Artificial intelligence (AI) systems are increasingly used in radiation oncology for tasks such as image reconstruction and registration, autosegmentation, synthetic CT generation, and treatment planning. However, AI design fundamentally challenges existing quality assurance (QA) paradigms which imperils the quality and safety of AI for clinical use. Addressing the unmet need of QA for clinical AI is critical as the potential for performance degradation of AI systems in the clinic is high. Domain shift - when the distribution of data used during training is different from the distribution of data encountered during deployment - is a critical problem that can lead to significant errors in AI performance. This is a common occurrence in clinical environments, where scanner performance varies over time due to changes in imaging protocols or sequences, equipment degradation, or replacement with a different make or model. Monitoring clinical AI system performance for signs of domain shift is of utmost importance to ensure safe and high-quality use. Development of robust QA tools and practices to verify and monitor the performance of AI systems is therefore critical as these systems enter the clinical arena. In this project, we will develop a new type of QA approach amenable for closed-source, clinical AI systems. Our approach supported by our preliminary data is to design a series of detectors which monitor the input imaging data and AI system output for changes and link these changes to an actionable tolerance through a prediction model, without the need to access the AI system internals. Our overall hypothesis is that the expected performance of clinical AI systems is predicted within 5% error by monitoring only the AI system inputs and outputs. In Specific Aim 1, we will develop a QA framework for AI systems which were trained with a ground truth set of labels, using autosegmentation as a model system. We will build compression algorithms to encode features from the distribution of inputs (images) and separately on the distribution of outputs (contours). A prediction model taking these distributions as input and predicting the contour accuracy will be built. We will develop the QA framework on a set of existing AI systems including two commercial AI and several in-house autosegmentation algorithms. In Specific Aim 2, we will focus on AI systems which do not use a ground truth during training, using synthetic CT generation as a model system. A similar approach as in SA1 using compression to build distributions of input and output latent features will be used. Instead of predicting accuracy (which requires a ground truth), we will develop a model to monitor the distribution of outputs. In Specific Aim 3, we will deploy our quality assurance frameworks in a prospective clinical study involving multiple institutions and evaluate effectiveness in ensuring the safe and high-quality deployment of clinical AI systems. We will also share our frameworks and data with the broader community to promote best practices in AI quality assurance. We expect that our QA framework will significantly improve the safety and effectiveness of clinical AI systems in radiation oncology, by ensuring that these systems are robust to domain shift and other sources of error.