A Polarisable Multipolar Force Field for Peptides Based on Kriging: towards Application in Protein Crystallography and Enzymatic Reactions

  • Yongna Yuan

Student thesis: Phd


Molecular dynamics (MD) simulations enable the prediction of structure and function of biomacromolecules. For example, the activity of small drug molecules can be ranked by molecular docking thereby reducing the number of candidates. To date, several popular empirical force fields are available for MD simulations, such as CHARMM, AMBER, OPLS, and GROMOS. These force fields are commonly used but limited to relatively low accuracy in their predictions as all these force fields compute the atom-atom electrostatic interaction energy using point charges. It is well known that the non-bonded interaction energy, in particular the electrostatic energy, plays a crucial role in determing the structure of proteins, including their docking regions. Therefore, the electrostatic energy is one of the most important parts to be correctly predicted in any force field. As a result, improvements in a force field are critical to obtain more reliable protein simulations. In our work, a novel force field based on Quantum Chemical Topology (QCT) has been actively developed in order to increase the accuracy of the simulations. Currently this force field is referred to as QCTFF. QCTFF calculates the atom-atom electrostatic energy based on the atomic multipole moments instead of point charges. The smallest natural protein, crambin, is used as a pilot system to compare the multipole moment electrostatic interaction energy with the point charge energy. The atom-atom electrostatic interaction energy calculated based on atomic multipole moments is remarkably different from the values calculated by point charges. There are in total fifteen possible atom-atom interactions, including HH, HC, HO, HN, HS, CC, CO, CN, CS, OO, ON, OS, NN, NS, SS, typically occurring in proteins. In order to study the convergence of the multipole expansion behind QCTFF, the minimum internuclear distance in the convergent region for each of the fifteen atom-atom interactions are calculated. The levels of theory used are HF/6-31G(d,p), B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ. This study determines which multipole moment rank is 14 required to obtain the electrostatic energy between two atoms, at a given distance, to a given energy accuracy. In addition, we define an atom type for each element C, H, O, N and S. The element in question is taken as the central atom and a "horizon sphere" centred at its nucleus is systematically enlarged to find out at which radius the atomic multipole moments are no further influenced by atoms entering the growing sphere. All 20 natural amino acids (AAs), from which any protein can be built, are studied in the construction of QCTFF. The minimum energy geometries of each amino acid are optimised and computed at HF/6-31G(d,p), B3LYP/apc-1 and MP2/cc-pVDZ levels of theory. The number of minima of the 20 AAs increased with the size and the flexibility of the side-chain. The multipole moments of the atoms in the five-atom fragment [-NH-Calpha-C(=O)]- varied according to the same trends in all three levels of theory. The properties, including bond length and bond angles, of the common seven-atom fragment [-NH-CalphaH(Cβ)-C(=O)-] (in all but proline and glycine) are investigated to probe the transferability between different AAs. A machine learning method called Kriging is applied to carry out the prediction of the atomic multipole moments in QCTFF. To build the Kriging models, the minima of serine are used to produce a sufficient number of geometries following the distortion procedure of both the normal mode and the redundant internal coordinate approaches. Ultimately, a thousand geometries are used in the training set to build Kriging models and the prediction reliability of the models is tested by the remaining thousand geometries in the test set. The same three levels of theory as before are applied to calculate the wave functions of the distorted geometries using GAUSSIAN03. Moreover, a large number of serine geometries are also extracted from proteins in
Date of Award1 Aug 2013
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorPaul Popelier (Supervisor)

Cite this