Ensemble methods are frequently applied to classification problems, and gen-erally improve upon the performance of individual models. Diversity is consideredto be an important factor in this performance improvement; in the literature thereis strong support for the idea that high diversity is crucial in ensembles. Votingmargins provide an alternative explanation of the behaviour of ensembles; theyhave been prominently used in the interpretation of the Adaboost algorithm, andthe literature suggests that large margins are beneficial. In this thesis, we exam-ine these two quantities - which in both cases the literature suggests should beincreased - and show that (in 2-class problems) they are inversely related, highdiversity corresponding to small absolute margins. From this it can be seen thatthe views expressed in the literature are contradictory; we argue that ensemblebehaviour can be sufficiently understood without the need to quantify 'diversity'.However, in non-stationary learning scenarios - where we must process datathat is not independent and identically distributed - the model must not onlygeneralise well, but also adapt to changes in the distribution. Building on thework of Minku, we hypothesise that high diversity might be of special significancein such problems in determining the rate at which the model can adapt. We usethe correspondence between diversity and margins to formulate the reasoning be-hind this intuition formally, and then derive an algorithm that explicitly managesdiversity in order to test this hypothesis. An empirical investigation shows thatmanaging diversity can, under certain conditions, improve the ability of an ensem-ble to adapt to a new concept; however, it typically seems that other aspects ofthe learning algorithm, especially concept change detection, have a substantiallylarger impact on performance than diversity does.
|Date of Award||1 Aug 2013|
- The University of Manchester
|Supervisor||Gavin Brown (Supervisor)|