Determination of Protein Secondary Structure from Infrared Spectra Using Partial Least-Squares Regression

Kieaibi E . Wilcox, Ewan Blanch, Andrew Doig

    Research output: Contribution to journalArticlepeer-review

    66 Downloads (Pure)


    Infrared (IR) spectra contain substantial information about
    protein structure. This has previously most often been exploited by using
    known band assignments. Here, we convert spectral intensities in bins within
    Amide I and II regions to vectors and apply machine learning methods to
    determine protein secondary structure. Partial least squares was performed
    on spectra of 90 proteins in H2O. After preprocessing and removal of
    outliers, 84 proteins were used for this work. Standard normal variate and
    second-derivative preprocessing methods on the combined Amide I and II
    data generally gave the best performance, with root-mean-square values for
    prediction of ∼12% for α-helix, ∼7% for β-sheet, 7% for antiparallel β-sheet,
    and ∼8% for other conformations. Analysis of Fourier transform infrared
    (FTIR) spectra of 16 proteins in D2O showed that secondary structure
    determination was slightly poorer than in H2O. Interval partial least squares
    was used to identify the critical regions within spectra for secondary
    structure prediction and showed that the sides of bands were most valuable, rather than their peak maxima. In conclusion, we
    have shown that multivariate analysis of protein FTIR spectra can give α-helix, β-sheet, other, and antiparallel β-sheet contents
    with good accuracy, comparable to that of circular dichroism, which is widely used for this purpose.
    Original languageEnglish
    Pages (from-to)3794−3802
    Issue number27
    Early online date20 Jun 2016
    Publication statusPublished - 2016


    Dive into the research topics of 'Determination of Protein Secondary Structure from Infrared Spectra Using Partial Least-Squares Regression'. Together they form a unique fingerprint.

    Cite this