TY - JOUR

T1 - Gaussian Process Regression Models for Predicting Atomic Energies and Multipole Moments

AU - Burn, Matthew

AU - Popelier, Paul

PY - 2023/1/23

Y1 - 2023/1/23

N2 - Developing a force field is a difficult task because its design is typically pulled in opposite directions by speed and accuracy. FFLUX breaks this trend by utilising Gaussian Process Regression (GPR) to predict, at ab initio accuracy, atomic energies and multipole moments as obtained from the quantum theory of atoms in molecules (QTAIM). This work demonstrates that the in-house FFLUX training pipeline can generate successful GPR models for 6 representative molecules: peptide-capped glycine and alanine, glucose, paracetamol, aspirin and ibuprofen. The molecules were sufficiently distorted to represent configurations from an AMBER-GAFF2 molecular dynamics run. All internal degrees of freedom were covered corresponding to 93 dimensions in the case of the largest molecule ibuprofen (33 atoms). Benefiting from active learning, the GPR models contain only about 2000 training points, and return largely sub-kcal/mol prediction errors for the validation sets. A proof-of-concept has been reached for transferring the model produced through active learning on one atomic property to that of the remaining atomic properties. The prediction of electrostatic interaction can be assessed at intermolecular level and the vast majority of interactions have a root-mean-square error of less than 0.1 kJ mol-1 with a maximum value of ~1 kJ mol-1 for a glycine and paracetamol dimer.

AB - Developing a force field is a difficult task because its design is typically pulled in opposite directions by speed and accuracy. FFLUX breaks this trend by utilising Gaussian Process Regression (GPR) to predict, at ab initio accuracy, atomic energies and multipole moments as obtained from the quantum theory of atoms in molecules (QTAIM). This work demonstrates that the in-house FFLUX training pipeline can generate successful GPR models for 6 representative molecules: peptide-capped glycine and alanine, glucose, paracetamol, aspirin and ibuprofen. The molecules were sufficiently distorted to represent configurations from an AMBER-GAFF2 molecular dynamics run. All internal degrees of freedom were covered corresponding to 93 dimensions in the case of the largest molecule ibuprofen (33 atoms). Benefiting from active learning, the GPR models contain only about 2000 training points, and return largely sub-kcal/mol prediction errors for the validation sets. A proof-of-concept has been reached for transferring the model produced through active learning on one atomic property to that of the remaining atomic properties. The prediction of electrostatic interaction can be assessed at intermolecular level and the vast majority of interactions have a root-mean-square error of less than 0.1 kJ mol-1 with a maximum value of ~1 kJ mol-1 for a glycine and paracetamol dimer.

M3 - Article

JO - Journal of Chemical Theory and Computation

JF - Journal of Chemical Theory and Computation

SN - 1549-9618

ER -