Project Summary
Genomic data are vital for advancing medical research and achieving breakthroughs. However, disclosure of
genomic data has serious privacy implications that can lead to a loss of trust from data contributors and restricting
researchers’ access to data. To facilitate data-driven genomic research, it is crucial to address the privacy risks
in data sharing and to develop privacy-preserving solutions to protect study participants. This project will study
the privacy risks in realistic attack models and develop privacy methods that balance individual privacy and the
utility of shared data. Overall, the proposed solutions will enable institutions to share high utility data while
providing strong privacy assurance to data contributors, facilitating data collection and improving data usability.
In the first aim, a privacy-preserving data publication framework will be developed to “safely anonymize” genomic
data and optimize the released data toward application needs. The framework will protect individuals from re-
identification and also prevent inference attacks that may be conducted using publicly available phenotypes (e.g.,
eye/hair color). In the second aim, customizable privacy solutions will be developed against realistic adversarial
models when data statistics are released. Building on recent privacy models, the proposed solutions will account
for the adversary's external knowledge and customizable sensitive information to effectively strike a balance
between privacy and utility, improving data usability compared to standard differential privacy models. This
project will advance current solutions for genomic data anonymization and improve the usability of differential
privacy and its variants, with the goal of facilitating highly usable and privacy-preserving data sharing. This work
will widen the access to genomic data, promote transparency, and facilitate reproducibility for genomic
applications. This project is in line with the mission of the National Human Genome Research Institute (NHGRI),
as the proposed techniques enhance data sharing and promote collaborative genomic research.
The applicant’s career goal is to become an independent investigator with a primary appointment in a biomedical
informatics program, with a focus on genome privacy technologies, at a major US research university. His long-
term objective is to develop new privacy-preserving technologies for data sharing and data analytics, in order to
facilitate collaborative research efforts in genomics and precision medicine. The applicant proposes a carefully
designed career development plan, which includes a variety of training activities to complement his computer
science skills with additional biomedical knowledge and smooth his transition into an independent researcher.
The UCSD Health Department of Biomedical Informatics will serve as an exceptional platform for his career
development, given the experience of several faculty in privacy technologies, computational biology, genomic
medicine, and close collaboration with other institutions worldwide.