Abstract
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the α-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) data-base correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in α-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted α-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of α-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of α-helices should benefit attempts to model adjacent loop regions. © 2004 Wiley-Liss, Inc.
Original language | English |
---|---|
Pages (from-to) | 322-330 |
Number of pages | 8 |
Journal | Proteins: Structure, Function and Bioinformatics |
Volume | 57 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Nov 2004 |
Keywords
- α-helix
- N-cap
- N-terminus
- Secondary structure
- Structure prediction