Decision Variance in Risk-Averse Online Learning

Sattar Vakili, Alexis Boukouvalas, Qing Zhao

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Online learning has traditionally focused on the expected rewards. In this paper, a risk-averse online learning problem under the performance measure of the mean-variance of the rewards is studied. Both the bandit and full information settings are considered. The performance of several existing policies is analyzed, and new fundamental limitations on risk-averse learning is established. In particular, it is shown that although a logarithmic distribution-dependent regret in time T is achievable (similar to the risk-neutral problem), the worst-case (i.e. minimax) regret is lower bounded by Ω(T) (in contrast to the Ω(√T) lower bound in the risk-neutral problem). This sharp difference from the risk-neutral counterpart is caused by the the variance in the player's decisions, which, while absent in the regret under the expected reward criterion, contributes to excess mean-variance due to the non-linearity of this risk measure. The role of the decision variance in regret performance reflects a risk-averse player's desire for robust decisions and outcomes.

 


Original languageEnglish
Title of host publication2019 IEEE 58th Conference on Decision and Control, CDC 2019
Subtitle of host publication11-13 December 2019, Nice, France
Place of PublicationNew York
PublisherIEEE
Pages2738-2744
Number of pages7
ISBN (Electronic)9781728113982
ISBN (Print)9781728113999
DOIs
Publication statusPublished - Dec 2019
Event58th IEEE Conference on Decision and Control, CDC 2019 - Nice, France
Duration: 11 Dec 201913 Dec 2019

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2019-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference58th IEEE Conference on Decision and Control, CDC 2019
Country/TerritoryFrance
CityNice
Period11/12/1913/12/19

Fingerprint

Dive into the research topics of 'Decision Variance in Risk-Averse Online Learning'. Together they form a unique fingerprint.

Cite this