Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries

Arthur Jochems, Timo M. Deist, Issam El Naqa, Marc Kessler, Chuck Mayo, Jackson Reeves, Shruti Jolly, Martha Matuszak, Randall Ten Haken, Johan van Soest, Cary Oberije, Corinne Faivre-Finn, Gareth Price, Dirk De Ruysscher, Philippe Lambin, Andre Dekker

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with (chemo)radiotherapy are of limited quality. In this work, we develop a predictive model of survival at two years. The model is based on a large volume of historical patient data and serves as a proof of concept to demonstrate the distributed learning approach. Patients and methods Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected and stored in 2 different cancer institutes (559 patients at Institute 1 (Country 1)), 139 at University of Institute 2 (Country 2). The model was further validated on 196 patients originating from the Institute 3 (Institute 3, Country 3). A Bayesian network model was adapted for distributed learning (watch the animation: link censored). Two-year post-treatment survival was chosen as endpoint. The Institute 1 cohort data is publicly available at (link censored) and the developed models can be found at (link censored). Results Variables included in the final model were T and N stage, age, performance status, and total tumor dose. The model has an AUC of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross-validation. A model based on T and N stage performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<0.001). Learning the model in a centralized or distributed fashion yields a minor difference on the probabilities of the conditional probability tables (0.6%), discriminative performance of the models on the validation set is similar (P=0.26). Conclusion Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data sharing barriers. We believe that Distributed learning is the future of sharing data in health care.
Original languageEnglish
Pages (from-to)344-352
Number of pages9
JournalInternational Journal of Radiation: Oncology - Biology - Physics
Volume99
Issue number1
Early online date24 Apr 2017
DOIs
Publication statusPublished - 1 Oct 2017

Keywords

  • Bayesian networks
  • Distributed learning
  • privacy preserving data-mining
  • survival
  • machine learning

Research Beacons, Institutes and Platforms

  • Manchester Cancer Research Centre

Fingerprint

Dive into the research topics of 'Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries'. Together they form a unique fingerprint.

Cite this