Statistical methods and designs for correlated outcome and covariate errors in studies of HIV/AIDS - PROJECT SUMMARY/ASBTRACT Electronic health record (EHR) and other routinely collected data are often used as cost-effective data sources for HIV/AIDS research. These data sources, however, are known to be prone to errors, typically across multiple variables, which can lead to biased study results and misleading conclusions. In addition, EHR data sources often lack gold-standard measurements that are needed to clearly define the presence or absence of co-morbidities (e.g., liver fibrosis). To address limitations of EHR data sources, researchers can validate or collect additional data on a subsample of their patient records. By combining the rich, but error-prone EHR data on all study subjects with the gold-standard / validated data collected on a subsample of subjects, researchers can improve study estimates. Specifically, researchers can eliminate the bias of estimates had they only used the EHR data, and they can improve the precision (e.g., narrower confidence intervals) of study estimates had they only used the subsample with gold-standard / validated data. In earlier research, we developed statistical methods and software to combine EHR data with validated sub-samples of data. We developed optimal, multi-wave designs for targeting records for data validation. Importantly, we applied these methods to multiple HIV studies using retrospective observational data from the International epidemiology Databases to Evaluate AIDS (IeDEA). However, in our applications, we have encountered additional challenges that have not yet been addressed. In particular, there is great potential in combining expensive, prospectively collected, gold-standard data that are sparsely measured (e.g., once per year) on a sub-sample of patients with EHR data that are collected much more frequently on a larger number of patients. We will develop methods to handle this setting, and we will develop statistical designs to better select which participants should be approached for prospective data collection and which patient records should be validated. We will also develop statistical methods to address other challenges encountered with using EHR data, including how to incorporate validation data into studies when inclusion in the study is error-prone, and methods to address more complex types of data (e.g., interval censored data), for which there are a lack of techniques to handle error-prone data. Our methods and designs will focus on extensions of multiple imputation, maximum likelihood, and generalized raking techniques. Open source tools and tutorials will be developed to help researchers to implement these novel methods and study designs. The methods and designs will be applied to data from the IeDEA network to estimate the incidence of and risk factors for liver fibrosis/steatosis and frailty among people living with HIV in East Africa and Latin America.