Estimating Mutual Information in Under-Reported Variables

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.
Original languageEnglish
Title of host publication Proceedings of the Eighth International Conference on Probabilistic Graphical Models
Pages449-461
Volume52
Publication statusPublished - 15 Aug 2016
Event International Conference on Probabilistic Graphical Models - Università della Svizzera Italiana (USI), Lugano, Switzerland
Duration: 6 Sept 20169 Sept 2016
http://www2.idsia.ch/cms/pgm/venue/

Conference

Conference International Conference on Probabilistic Graphical Models
Abbreviated titlePGM 2016
Country/TerritorySwitzerland
CityLugano
Period6/09/169/09/16
Internet address

Fingerprint

Dive into the research topics of 'Estimating Mutual Information in Under-Reported Variables'. Together they form a unique fingerprint.

Cite this