The distinctive nature of cancer as a disease prompts an exploration of the special characteristics the genes and mutations implicated in cancer exhibit. Currently, we have no clear explanation for why patterns of replacements of amino acids are frequent in cancer, and what their effects may be on the protein. Such patterns would be expected to provide an understanding of how these amino acid replacements drive cancer progression and reveal the properties that distinguish them from replacements that are non-cancer associated. Moreover, the identification of cancer-associated genes and their characteristics is crucial to further our understanding of this disease. These characteristics can be used to recognise and prioritise therapeutic drug targets with an enhanced likelihood of success. However, the rate at which cancer genes are being identified experimentally is slow. Applying predictive analysis techniques, through the building of accurate machine learning models, is potentially a useful approach in enhancing the identification rate of these genes and their characteristics. In this work, we identified certain amino acid residues and replacements to be highly enriched in cancer. In particular, we highlight 17 substitutions showing high enrichment rates also we find that very frequently in cancer a residue is replaced with either a Cys or an aromatic residue. We explained the role of Cys in forming disulphide bonds and the aromatic amino acids in forming stacking interactions; both are known to be vital in binding activities highly enriched in cancer-associated gene functions. We also identified properties, such as protein stability and hydrophobicity that have distinguished patterns in these cancer-associated replacements compared to other non-cancer-associated mutations. We used these properties to train a machine learning model predicting cancer-associated replacements related to specific protein using only the amino acids residue position and physico-chemical properties. In terms of cancer-associated genes, we investigated gene essentiality and found that essentiality scores tend to be higher for cancer-associated genes compared to other protein-coding human genes. We built a dataset of extended gene properties linked to essentiality and used it to train a machine-learning model; this model reached 89% accuracy and > 0.85 for the Area Under Curve (AUC). The model showed that essentiality, evolutionary-related properties, and properties arising from protein-protein interaction networks are particularly effective in predicting cancer-associated genes. We were able to use the model to identify potential candidate genes that have not been previously linked to cancer. Prioritising genes that score highly by our methods could aid scientists in the identification of novel genes and targets for further research.
Date of Award | 1 Aug 2023 |
---|
Original language | English |
---|
Awarding Institution | - The University of Manchester
|
---|
Supervisor | Andrew Doig (Supervisor), David Robertson (Supervisor) & Simon Lovell (Supervisor) |
---|
- Characteristics of Cancer
- cancer
- Machine Learning
- Cancer genes prediction
- Cancer Replacements Prediction
Distinctive Characteristics of Cancer-associated Genes and Point Mutations Driving Carcinogenesis Through Computational Modelling
Safadi, A. (Author). 1 Aug 2023
Student thesis: Phd