Contributions to finite mixture models with applications

  • Hok Shing Kwong

Student thesis: Phd

Abstract

This thesis is composed of 8 chapters. Chapter 1 is the introductory chapter and the main contributions are in Chapter 2 to Chapter 8. Each chapter contains contributions towards finite mixture models. Inspired by Tarnopolski [Monthly Notices of the Royal Astronomical Society, 458(2), 2024-2031], Chapter 2 studies statistical properties of gamma ray bursts (GRBs). We show that GRB duration distributions may exhibit non-Gaussian properties. By showing GRB duration distributions can be better modelled by finite mixture of 2 power exponential distributions, we suggest the possibility that GRB duration distributions may exhibit non-Gaussian tail behaviour, instead of asymmetry as suggested by Tarnopolski [Monthly Notices of the Royal Astronomical Society, 458(2), 2024-2031]. Motivated by Luckstead and Devadoss [Physica A: Statistical Mechanics and its Applications, 465, 573-578], where city size distribution of the US was found to exhibit power law behaviour in both upper and lower tails. Chapter 3 studies the power law behaviours of city size distributions of the US and India. We propose that the observed power laws are not true power law and argue that the observed power laws are results of heterogeneity in city growth process. To support our proposal, we show that city size distributions can be better modelled by finite mixture of exponential type distributions, instead of power law distributions. Motivated by Lin and Meng [Physica A: Statistical Mechanics and its Applications, 490, 533-541], where velocity distribution of marathons were found to be switching between normal distribution and lognormal distribution at different intervals. Chapter 4 studies the dynamics of marathon in details. We quantify the change of distribution, and show that the change can be accounted by the pacing strategies of different groups of runners. We propose that there exists two distinct mechanisms to determine velocity change of a runner. Subsequently, we derive a finite mixture regression model to describe velocity change of runners throughout a marathon. Motivated by the lack of multivariate skew distributions that are flexible and practical in literature. We introduce a novel class of multivariate skew distributions, skew elliptical distribution with independent skewing functions (SELIS) , in Chapter 5. Some statistical properties, and some special cases of SELIS are studied. We present an algorithm to estimate parameters for SELIS using quasi-maximum likelihood estimation (QMLE). We show that SELIS is more computationally feasible than other multivariate skew distributions of similar complexity in literature. We also show that a nested form of SELIS, SELIS with diagonal skewing matrix, which can be expressed in closed-form, is quick to fit, and is more capable to model asymmetry in higher dimensions than other closed-form multivariate skew distributions An extension to Chapter 5, we study the use of a special case of SELIS, multivariate skew $t$ distribution with independent logistic skewing function (MSTIL), in finite mixture model (FM-MSTIL) in Chapter 6. We propose an EM-type algorithm for FM-MSTIL, and show that the proposed algorithm allow parameter estimation of FM-MSTIL under a feasble time frame via some numerical examples. We also propose a divisive hierarchical method to obtain initial parameters, and to obtain optimal number of clusters for FM-MSTIL. We show that the proposed method allow FM-MSTIL-R, the nested form of FM-MSTIL with diagonal skewing matrix, to be an efficient and high performance clustering algorithm on the FlowCap-I challenge. Both variance and entropy are commonly used measures for uncertainty. There exists many cases where variance is infinite and entropy is finite. In Chapter 7, we derive an upper bound illustrating the relationship between variance and entropy of random variables following a special class of multimodal distributions. We also derive a sharp upper bound for the $k$-th absolute central moment proportional to entropy power for a special class of unimodal distributions. The generalized exponentiated exponential Lindley distribution (GEEL) is a novel three parameter distribution due to Hussain et al. [Journal of Agricultural, Biological and Environmental Statistics, 23(1), 63-82]. They studied its properties including estimation issues and illustrated applications to four data sets. In Chapter 8, we correct errors in the derivatives of the likelihood function in the original paper and we present the weakness of GEEL when the sample size is small. We also show that several known distributions are more appropriate than GEEL in many cases.
Date of Award31 Dec 2020
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorGeorgi Boshnakov (Supervisor) & Saraleesan Nadarajah (Supervisor)

Keywords

  • finite mixture model
  • clustering
  • em algorithm

Cite this

'