A database, AmyProt, was developed that collated details of 32 human amyloid proteins associated with disease and 488 associated mutations and polymorphisms, of which 316 are classified as amyloid. A detailed profile of the mutations was developed in terms of location within domains and secondary structures of the proteins and functional effects of the mutations. The data was used to test the hypothesis that mutations enhance amyloidosis in human amyloid proteins have distinctive characteristics, in terms of specific location within proteins and physico-chemical characteristics, which differentiate them from non-amyloid forming polymorphisms in amyloid proteins and from disease mutations and polymorphisms in non-amyloid disease linked proteins. The aim was to use these characteristics to train a prediction algorithm for amyloid mutations that will provide a more accurate prediction than current general disease prediction tools and amyloid prediction tools that focus on aggregating regions. 66 location specific features and changes upon mutation of 366 amino acids propensities, derived from the amino acid index database AAindex, were analysed. A significant proportion of mutations were located with aggregating regions, however the majority of mutations were not associated with these regions. An analysis of motifs showed that amyloid mutations had a significant association with transmembrane helix motifs such as GxxxG. Statistical analysis of substitutions mutations, using substitution matrices, showed that amyloid mutations have a decrease in alpha-helix propensity and overall secondary structure propensity compared to the disease mutations and disease and amyloid polymorphisms. Machine learning was used to reduce the large set of features to a set of 18 features. These included location near transmembrane helices, secondary structure features; transmembrane and extracellular domains and 4 amino acid propensities: knowledge-based membrane propensity scale from 3D helix; alpha-helix propensity; partition coefficient; normalized frequency of coil. The AmyProt mutations and non-amyloid polymorphisms were used to train and test the novel amyloid mutation prediction tool, AmyPred, the first tool developed purely to predict amyloid mutations. AmyPred predicts the amyloidogenicity of mutations as a consensus by majority vote (CMV) and mean probability (CMP) of 5 classifiers. Validation of AmyPred with 27 amyloid mutations and 20 non-amyloid mutations from APP, Tau and TTR proteins, gave classification accuracies of 0.7/0.71 (CMV/CMP) and with an MCC of 0.4 (CMV) and 0.41 (CMP). AmyPred out performed other tools such as SIFT (0.37) and PolyPhen (0.36) and the amyloid consensus prediction tool, MetAmyl (0.13). Finally, AmyPred was used to analyse p53 mutations to characterize amyloid and non-amyloid mutations within this protein.
|Date of Award||31 Dec 2016|
- The University of Manchester
|Supervisor||Andrew Doig (Supervisor) & Simon Hubbard (Supervisor)|
- bioinformatic tool