Project Summary
Cultured cell lines have been widely used for basic research to study cell function, as models for
disease, and for drug screening. Correct identification of the cell lines used is necessary to make
the right scientific conclusions and replicate experiments. Cell lines in culture can be
contaminated by foreign cells, which may rapidly displace the original cells. The routine
verification of the identity of cultured cells should be performed, but a majority of laboratories do
not monitor the identity of their cell lines, and many cell lines are misidentified. Analyses of cells
submitted to major repositories such as the DSMZ (Deutsche Sammlung von Mikroorganismen
und Zellkulturen) and the ATCC (American Type Culture Collection) have found that 15-40% of
cell lines submitted by investigators are misidentified. The costs, effort, and time required to
confirm the identity of cell lines have been a barrier to adoption of cell line identification as a
routine quality control measure. Current technologies for identifying cultured cells are limited,
and their cost is a barrier for small scale use. In this SBIR phase I application, we aim to develop
a novel tool for cell line and cell type identification using DNA replication timing (RT) fingerprints,
which are RT values at specific genomic regions. In our previous studies, we have discovered
that DNA RT was highly specific to different cell lines and cell types and this specificity can be
exploited for the purpose of cell line/type identification (PLoS Comp Biol, 2011; Genome
Research, 2010; Genome Research 2015). A patent for RT fingerprint identification and use has
been issued in 2016. Recently, we have applied this technology to identify common markers
between distinct progeroid diseases (PNAS, 2017). A novel segmentation method, called iSeg,
for segmenting genomic and epigenomic data has also been developed in our lab (BMC
Bioinformatics, In Press), which can be used to further improve the identification of RT
fingerprints. We propose to collect a large number of DNA RT profiles for a diverse set of cell
lines and cell types and develop RT fingerprints for their identification. A web server will be built
to take users’ input of RT data and output the cell line best matched with the input data. The web
server can also take data of new cell lines from users to allow continually developing our models
and database. RT fingerprints can be measured cost-effectively using polymerase chain reaction
(PCR) experiment. Since the total cost for obtaining RT fingerprint for one sample is around $100,
which can be further reduced when scaled up, our method makes it possible to routinely check
the identity of cultured cells.