Coming with the new century, integration of computer technology into medical practice has enabled
scientists to collect massive volumes of electronic health records (EHR) and, in the meantime, deep
learning has been developed as the major tool of massive data analysis. However, the EHR data are
heterogeneous [varied much for different groups of patients] and fragmented [consisting of a high
proportion of missing values], which poses a significant barrier to the applicability and generalizability of
current deep neural networks. This project aims to build a health prediction system based on a new type
of stochastic neural network (StoNet) with massive, heterogeneous, and fragmented data, while
considering integration of the omics, imaging and EHR data in training the system. The StoNet is
formulated as a composition of many simple regressions; it is asymptotically equivalent to the deep neural
network (DNN) in function approximation as the training sample size becomes large, but its structure is
more flexible for dealing with the complexity of EHR data. The StoNet is trained by an adaptive stochastic
gradient Markov chain Monte Carlo (MCMC) algorithm. By leveraging on the flexible structure of the
StoNet and the sophisticated adaptive stochastic gradient MCMC algorithm, this project provides a
rigorous statistical framework for deep learning with massive, heterogeneous and fragmented EHR data.
We show that the StoNet forms a bridge from linear models to deep learning, enabling many of the theory
and methods developed for linear models to be transferred to deep learning. In particular, we show the
sparse learning theory developed for linear models with the Lasso penalty can be transferred to the
StonNet, leading to an innovative consistent sparse deep learning method; we address the data
heterogeneity issue by replacing each regression of the first hidden layer of the StoNet by a mixture
regression; and we address the missing data issue by training the StoNet with an adaptive stochastic
gradient MCMC algorithm where the missing data are imputed as for a linear model with multiple
imputation methods. The Markovian structure of the StoNet enables the network parameters to be locally
learned with fragmented data and leads to an innovative way for nonlinear sufficient dimension reduction
of high-dimensional data, facilitating integration of different types of data in StoNet training. We also show
the prediction uncertainty of the StoNet can be easily quantified with a recursive application of Eve's law.