Improve Statistical Methods for Profiling of Healthcare Providers - PROJECT SUMMARY Healthcare provider profiling is of nationwide importance. In order to identify extreme (poor or excellent) performance and to intervene as necessary, outcomes of patients associated with specific healthcare providers are routinely monitored by both government and private payers. This monitoring can help patients make more informed decisions, and can also aid consumers, stakeholders, and payers in identifying providers where improvement may be needed, and even closing or fining those with extremely poor outcomes. Our endeavor is motivated by the study of end-stage renal disease (ESRD), which represents 7.2% of the entire Medicare budget and is of interest due to its heavy burden on patients, families, and the healthcare system. Existing profiling approaches for analyzing large-scale ESRD registry data assume the risk adjustment is perfect and the between-provider variation is entirely due to the quality of care, which is often invalid. As a result, these methods disproportionately identify larger providers, although they need not be “extreme.'' To address this problem, Aim 1 develops an individualized empirical null approach for profiling healthcare providers to account for the unexplained between-provider variation due to imperfect risk adjustment. The national dialysis data contains more than 3,000 comorbidities from over 2,000,000 patients who are treated from more than 7,000 facilities. The goal is to select important comorbidity indexes for risk adjustment of provider profiling. However, the use of large-scale databases introduces computational difficulties, particularly when the event of interest is recurrent, and the numbers of sample size and the dimension of parameters are large. Traditional methods that perform well for moderate sample sizes and low-dimensional data do not scale to such massive data. Another challenging aspect of the national dialysis dataset is that patient information is updated sequentially. How to integrate streaming recurrent event data adds another level of difficulty. In view of these difficulties, Aim 2 proposes a nested divide-and-conquer-based boosting procedure for high-dimensional variable selection with large-scale clustered recurrent event data. The proposed procedure is further combined with a model updating procedure based on the time-dependent Kullback-Leibler discrimination information to integrate streaming recurrent event data. Finally, the COVID-19 pandemic has dramatically changed how healthcare care is delivered, and statisticians have an important role to play in supporting providers and patients through this evolution. Aim 3 proposes a latent illness-death model to account for temporal and geospatial variation of COVID prevalence in the provider profiling. This analysis is needed to evaluate provider performance more accurately, to help physicians focus on groups of patients with excess risk, and to aid providers in determining corrective actions to improve their performance. The research in Aim 4 is to develop publicly available software to enable the utilization of the proposed approaches.