The application of machine learning (ML) in medicine has led to a paradigm shift in the medical research. Medical machine learning (MML) research is an example of multidisciplinary research. It involves different disciplines with cultural differences leading to more intricate and heterogeneous literature. As a result, retrieving multidisciplinary literature can be challenging. Information retrieval methods are not well validated in this context. Adding to that, the quality of scientific reporting has been widely questioned. Therefore, identifying the high quality reported papers is as important as retrieving them. We believe that an evaluation of the complexity of such literature followed by an automation of identifying relevant and high quality reported papers from such literature could help researchers in the multidisciplinary research context. This thesis was conducted to address such needs by developing the means by which MML literature could be efficiently and sufficiently searched. The contribution of this thesis can be explained by its designed framework, which is the Awareness Retrieval Quality (ARQ) framework. In the awareness level, different components of the literature reviewing process were assessed including (a) literature indexing, (b) systematic reviews, (c) bibliographic keywords, and (d) bibliometrics. To do that, we performed meta-research methods and bibliometric methods including the Visualisation Of Similarities (VOS) mapping method. In the retrieval level, we designed and developed a system for retrieving relevant papers. To do that, we used text mining techniques including (a) Term Frequency-Inverse Document Frequency (TF-IDF), (b) Latent Dirichlet Allocation (LDA), (c) Document to vector (Doc2vec) for text representation, and (a) Support Vector Machine (SVM), (b) Random Forest (RF) for text classification. We evaluated the system (a) internally in retrieving ML in precision medicine literature and (b) externally in retrieving ML in Electronic Patient Record (EPR) literature. In the quality level, the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist was analysed in its impact on scientific papers including their (a) reporting language and (b) reporting completeness. We also designed and developed a system for scoring the reporting completeness of papers according to seven questions from TRIPOD. To do that, we used Named Entity Recognition (NER) methods including (a) dictionary based methods and (b) Neural Network (NN) based methods. We evaluated the system in scoring the reporting quality of diagnostic and prognostic prediction studies. In conclusion, we found that it is difficult to find multidisciplinary literature using standard methods. In comparison to standard methods, the ARQ had higher recall with the ability of retrieving relevant and high quality literature. We suggested that the ARQ can be incorporated in literature searching process as a means of exploring the nature of literature, identifying relevant papers in literature, and evaluating the reporting quality of papers in literature.
|Date of Award||31 Dec 2020|
- The University of Manchester
|Supervisor||Goran Nenadic (Supervisor) & Andrew Brass (Supervisor)|