Abstract
Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.
Original language | English |
---|---|
Title of host publication | Proceedings of the Eighth International Conference on Probabilistic Graphical Models |
Pages | 449-461 |
Volume | 52 |
Publication status | Published - 15 Aug 2016 |
Event | International Conference on Probabilistic Graphical Models - Università della Svizzera Italiana (USI), Lugano, Switzerland Duration: 6 Sept 2016 → 9 Sept 2016 http://www2.idsia.ch/cms/pgm/venue/ |
Conference
Conference | International Conference on Probabilistic Graphical Models |
---|---|
Abbreviated title | PGM 2016 |
Country/Territory | Switzerland |
City | Lugano |
Period | 6/09/16 → 9/09/16 |
Internet address |