Program Summary (Abstract)
Health-related studies generally involve more than one longitudinal response composed of multiple types of
data, such as binary, ordinal, nominal or continuous variables. Since these responses are collected from the
same individual or unit, it is desirable to analyze them jointly instead of separately to understand the data as
a whole. The multivariate probit models have been widely utilized for analyzing multivariate longitudinal binary
and ordinal data and especially for mixed binary/ordinal and continuous data due to the assumption of the
latent multivariate normal variables. However, this only option of the underlying multivariate normal variables
makes limited model comparisons and diagnostics. Furthermore, the identifiable multivariate probit models
constrain the covariance matrix of the latent multivariate normal variables to be a correlation matrix, which
brings a rigorous task for both likelihood-based estimation and Markov chain Monte Carlo (MCMC) sampling.
Similar issues also exist in multinomial probit models for analyzing nominal data.
In this proposal we focus on developing MCMC methods to analyze multivariate mixed longitudinal data with
three main purposes. The first purpose is to use scale mixtures of multivariate normal (SMMVN) distributions,
which provide flexible multivariate distributions for latent variables, such as multivariate normal, multivariate-
t and multivariate logistic distributions. The second purpose is to propose identifiable models using SMMVN
distributions and develop the MCMC sampling methods. The third purpose is to tackle the model identification
issue by proposing non-identifiable models and develop MCMC methods to circumvent a Metropolis-Hastings
algorithm to sample restricted covariance matrices by a Gibbs sampling covariance matrix without restrictions.
The Specific Aims are to: (1) Construct both identifiable and non-identifiable multivariate models for
multivariate longitudinal binary/ordinal data with SMMVN distributions and develop the MCMC sampling
methods; (2) Construct both identifiable and non-identifiable multivariate models for multivariate longitudinal
nominal data with SMMVN distributions and develop the MCMC sampling methods; (3) Extend the
multivariate models proposed in (1) and (2) to multivariate mixed longitudinal data and develop the MCMC
sampling methods for data with missing values and perform model assessment; (4) Implement, distribute,
support and maintain user friendly software packages for the methods proposed in this application.
This proposal is consistent with the objectives of NIH AREA Program (R15) by enhancing the infrastructure
of research and education at Michigan Technological University (MTU). This application will offer a unique
opportunity to expose a diverse group of undergraduates and graduates to health-related research involving
statistical theories, statistical applications, computational methods and data applications at the cutting-edge
of modern research and strengthen the health-related research and research environment at MTU.