Bayesian Models for Sparse Regression Analysis of High Dimensional Data* | Chapter individual record
abstract

© Oxford University Press 2011. All rights reserved. This paper considers the task of building efficient regression models for sparse multivariate analysis of high dimensional data sets, in particular it focuses on cases where the numbers q of responses Y = (y k,1 ≤ k ≤ q) and p of predictors X = (x j, 1 ≤ j ≤ p) to analyse jointly are both large with respect to the sample size n, a challenging bi-directional task. The analysis of such data sets arise commonly in genetical genomics, with X linked to the DNA characteristics and Y corresponding to measurements of fundamental biological processes such as transcription, protein or metabolite production. Building on the Bayesian variable selection set-up for the linear model and associated efficient MCMC algorithms developed for single responses, we discuss the generic framework of hierarchical related sparse regressions, where parallel regressions of y k on the set of covariates X are linked in a hierarchical fashion, in particular through the prior model of the variable selection indicators γ kj, which indicate among the covariates x j those which are associated to the response y k in each multivariate regression. Structures for the joint model of the γ kj, which correspond to different compromises between the aims of controlling sparsity and that of enhancing the detection of predictors that are associated with many responses (\"hot spots\"), will be discussed and a new multiplicative model for the probability structure of the γ kj will be presented. To perform inference for these models in high dimensional set-ups, novel adaptive MCMC algorithms are needed. As sparsity is paramount and most of the associations expected to be zero, new algorithms that progressively focus on part of the space where the most interesting associations occur are of great interest. We shall discuss their formulation and theoretical properties, and demonstrate their use on simulated and real data from genomics.

authors
author list (cited authors)
Richardson, S., Bottolo, L., & Rosenthal, J. S.
publication date
2011
publisher
citation count

25