Determination of protein fold class from Raman or Raman optical activity spectra using random forests

Myra Kinalwa, Ewan W. Blanch, Andrew J. Doig

    Research output: Contribution to journalArticlepeer-review


    Knowledge of the fold class of a protein is valuable because fold class gives an indication of protein function and evolution. Fold class can be accurately determined from a crystal structure or NMR structure, though these methods are expensive, time-consuming, and inapplicable to all proteins. In contrast, vibrational spectra [infra-red, Raman, or Raman optical activity (ROA)] are rapidly obtained for proteins under wide range of biological molecules under diverse experimental and physiological conditions. Here, we show that the fold class of a protein can be determined from Raman or ROA spectra by converting a spectrum into data of 10 cm -1 bin widths and applying the random forest machine learning algorithm. Spectral data from 605 and 1785 cm -1 were analyzed, as well as the amide I, II, and III regions in isolation and in combination. ROA amide II and III data gave the best performance, with 33 of 44 proteins assigned to one of the correct four top-level structural classification of proteins (SCOP) fold class (all α, all β, α and β, and disordered). The method also shows which spectral regions are most valuable in assigning fold class. Published by Wiley-Blackwell. © 2011 The Protein Society.
    Original languageEnglish
    Pages (from-to)1668-1674
    Number of pages6
    JournalProtein science
    Issue number10
    Publication statusPublished - Oct 2011


    • α-helix
    • β-sheet
    • Machine learning
    • Protein structure
    • Vibrational spectroscopy


    Dive into the research topics of 'Determination of protein fold class from Raman or Raman optical activity spectra using random forests'. Together they form a unique fingerprint.

    Cite this