In modern society, high-dimensional data has become ubiquitous ranging from genomics, neuroimaging, and the stock market to text mining. Analysing these high-dimensional data sets has posed great challenges for researchers and thus this creates an urgent need to develop new methodologies and tools for analysing them. The estimation of sparse precision matrices provides an important tool for revealing the dependencies between variables in these data sets. This estimation problem has been extensively studied by frequentist methods in the literature whereas its Bayesian treatments are still limited. It is well known that Bayesian approaches can readily provide the uncertainty quantification for parameters in the model through their posterior distributions when compared to frequentist methods. Firstly, we introduce a new empirical Bayesian approach for estimating sparse precision matrices under the normal-beta prime (NBP) shrinkage prior. The hyperparameters in our proposed graphical normal-beta prime (GNBP) model do not need to be prespecified and all of them can be obtained via the maximum marginal likelihood (MML) method. This data-driven choice of hyperparameters from the MML enables our proposed method to accommodate the estimation of precision matrices with different sparsity levels whereas the existing frequentist and Bayesian approaches can only handle either a sparse setting or a dense setting. In addition, we establish the posterior convergence rate of the induced posterior under some sparsity assumptions on the precision matrices. We then extend the GNBP to the tGNBP for the purpose of handling heavy-tailed data with a multivariate-t distribution where the sparsity pattern in the inverse scale matrix only indicates that the corresponding two random variables are conditionally uncorrelated. We also show that the proposed tGNBP method can handle both normal data and multivariate t-distributed data. Although the proposed GNBP method has attractive posterior concentration properties and empirical performances, it has a heavy computational burden issue especially when the dimension p is large. This issue is very commonly seen in exact likelihood based methods including the GLASSO and the GHS since each iteration of the resulting optimization algorithms or MCMC algorithms involves the inversion of a (p-1) by (p-1) matrix. To mitigate this issue, we propose a new empirical quasi-Bayesian approach for estimating sparse precision matrices which incorporates the NBP shrinkage prior into a quasi-Bayesian framework characterised by a quasi-likelihood. The proposed quasiGNBP method inherits the feature of self-adaptivity from the GNBP method to accommodate the estimation of precision matrices with different sparsity levels. More importantly, it is much more computationally efficient than existing exact likelihood based Bayesian approaches. We also obtain the posterior convergence rate of the induced quasi-posterior. Similarly, we extend the quasiGNBP to the quasi-tGNBP in order to handle both normal data and multivariate t-distributed data. Finally, we demonstrate that our proposed methods exhibit excellent performance in simulated data when compared to some state-of-the-art methods in the literature. We then apply these methods to a human gene expression data set and use their estimated precision matrices to draw gene regulatory networks.
Date of Award | 31 Dec 2023 |
---|
Original language | English |
---|
Awarding Institution | - The University of Manchester
|
---|
Supervisor | Yang Han (Supervisor) & Christiana Charalambous (Supervisor) |
---|
- Graphical Models
- Posterior Convergence Rate
- Block Gibbs Sampler
- High-dimensional Data
- Empirical Bayes Estimation
- Sparse Precision Matrices Estimation
- Quasi-Empirical Bayes Estimation
BAYESIAN ESTIMATION OF SPARSE PRECISION MATRICES FOR HIGH-DIMENSIONAL DATA
Zheng, P. (Author). 31 Dec 2023
Student thesis: Phd