AbstractMarkov chain Monte Carlo (MCMC) is a key algorithm in computational statistics, allowing for a large amount of versatility in both model and data. However, as datasets grow larger and models grow more complex, many popular MCMC algorithms become too computationally expensive to be practical. Recent progress has been made by developing MCMC algorithms based on Piecewise Deterministic Markov Processes (PDMPs). In particular, one variant the ZigZag sampler (a PDMP) has been shown to have the remarkable property of super-efficiency, meaning that the computational effort required to draw a sample from the posterior distribution in Bayesian inference need not grow with the size of the dataset. However, the ZigZag sampler also has some limitations that may create barriers to its adoption by practitioners. This work focuses on developing algorithms based on the ZigZag dynamics which try to overcome such limitations. The first contribution establishes a reliable test distribution that has curved correlation structure, which will be used in the second contribution. PDMPs have so far only been implemented for models where certain gradients can be bounded to allow for simulation via thinning. This is not possible in many statistical contexts, and in others, even where a bound is available, it introduces inefficiencies that negate the effects of super-efficiency. The second contribution presents the NuZZ algorithm, which is applicable to general statistical models, without the need for bounds on the gradient of the log posterior. The third contribution introduces FiZZ, a ZigZag-based algorithm that can use subsample dependent bounds on the switching rate at the price of selecting the subsample from a non uniform distribution. Moreover, FiZZ can be used to analyse large datasets with several missing values efficiently, in a fully Bayesian way.
|Date of Award||1 Aug 2021|
|Supervisor||Simon Cotter (Supervisor) & Thomas House (Supervisor)|
- ZigZag Process
- Missing Data