Development of breast cancer risk prediction models using the UK Biobank dataset

  • Kawthar Al-Ajmi

Student thesis: Phd


Aim: The work presented in this thesis is based on the following aims; 1) to systematically review non-clinical/non-genetic breast cancer risk prediction models, 2) to review the published risk factors of BC (reproductive, anthropometric, lifestyle and dietary) to take as a base of the model development, 3) to assess the BC risk factors using the UKBiobank prospective cohort, 4) to explore the effects of adherence to “healthier lifestyles” in groups based on different genetic predispositions, 5) and to develop BC risk prediction models (epidemiological and genetic models). Methods: For aim 1, a PRISMA approach was employed to carry out the systematic review. For aim 2, the literature was reviewed and summary of evidences was presented. For aim3, the UKB data was analysed using the glm model to derive relative risk and 95% confidence intervals. For aim 4, the hazard ratios of different lifestyle categories were calculated based on the tertile groups of genetic predisposition score (using 305 SNPs). For aim 5, backward stepwise logistic and bootstrap regression approaches were used to derive the best fitting (epidemiological and genetic). Results: For aim 1, 14 epidemiological (non-clinical and non-genetic) models were identified. All of the models were well calibrated but had poor or moderate ability to discriminate in internal validation analyses. However, external validation was also missing for most of the models. Additionally, generalisability is also problematic as some variables are specific for some populations. For aim 2, a list of modifiable risk factors (physical activity, alcohol, smoking, BMI, OC use ,HRT, and diet), partially modifiable risk factors (age at first birth, null-parity, and breastfeeding), and non-modifiable risk factors (age, genetic factors, family history of breast cancer, early menarche age, late menopause age, benign breast disease, breast density, height , abortion, and radiation) were summarised and evaluated. For aim 3, the following risk factors: age, height, low BMI, low waist to hip ratio, first degree family history of BC, early menarche age, null-parity, late age at first live birth, high reproductive interval index, and long duration use of contraceptive were all significantly associated with an increased BC among pre-menopausal females. While among post-menopausal, age, height, high BMI, first degree BC family history, null-parity, late age at first live birth, and high reproductive interval index were all significantly associated with an increased risk of BC. For aim 4, our analysis showed potential BC risk modifications as a consequence of selected modifiable lifestyle factors (more exercise, healthy weight, low alcohol intake, no contraceptive or no or limited HRT use). The results were significant regardless of whether women had higher genetic risk. For aim 5, two epidemiological models based on menopausal status (pre- and post-menopaused models) were developed together with a computation of the absolute 5 years risk. Later, the discriminatory power of the models was significantly improved by adding a PRS as a risk score for breast cancer in the extended genetic models. Conclusions: The work presented in this dissertation can be used for a) increasing the public awareness regarding the possible risk factors of BC, b) encouraging females to change their lifestyle into a healthier style to reduce their BC risk, c) using the models as an educational tool for the community and primary care as a strategy for cancer education and prevention, d) encouraging females at higher BC risk to attend the screening invitation.
Date of Award1 Aug 2022
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorKenneth Muir (Supervisor) & Artitaya Lophatananon (Supervisor)

Cite this