TY - JOUR

T1 - pKa prediction from "quantum chemical topology" descriptors

AU - Popelier, P. L A

AU - Harding, A. P.

AU - Wedge, D. C.

N1 - Harding, A. P. Wedge, D. C. Popelier, P. L. A.

PY - 2009/8/24

Y1 - 2009/8/24

N2 - Knowing the pKa of a compound gives insight into many properties relevant to many industries, in particular the pharmaceutical industry during drug development processes. In light of this, we have used the theory of Quantum Chemical Topology (QCT), to provide ab initio descriptors that are able to accurately predict pATa values for 228 carboxylic acids. This Quantum Topological Molecular Similarity (QTMS) study involved the comparison of 5 increasingly more expensive levels of theory to conclude that HF/6-3 lG(d) and B3LYP/ 6-311 +G(2d,p) provided an accurate representation of the compounds studies. We created global and subset models for the carboxylic acids using Partial Least Square (PLS), Support Vector Machines (SVM), and Radial Basis Function Neural Networks (RBFNN). The models were extensively validated using 4-, 7-, and 10-fold cross-validation, with the validation sets selected based on systematic and random sampling. HF/ 6-31G(d) in conjunction with SVM provided the best statistics when taking into account the large increase in CPU time required to optimize the geometries at the B3LYP/6-311+G(2d,p) level. The SVM models provided an average q2 value of 0.886 and an RMSE value of 0.293 for all the carboxylic acids, a q2 of 0.825 and RMSE of 0.378 for the ortho-substituted acids, a q2 of 0.923 and RMSE of 0.112 for the paraand meta-substituted acids, and a q2 of 0.906 and RMSE of 0.268 for the aliphatic acids. Our method compares favorably to ACD/Laboratories, VCCLAB, SPARC, and ChemAxon's p/£a prediction software based of the RMSE calculated by the leave-one-out method. © 2009 American Chemical Society.

AB - Knowing the pKa of a compound gives insight into many properties relevant to many industries, in particular the pharmaceutical industry during drug development processes. In light of this, we have used the theory of Quantum Chemical Topology (QCT), to provide ab initio descriptors that are able to accurately predict pATa values for 228 carboxylic acids. This Quantum Topological Molecular Similarity (QTMS) study involved the comparison of 5 increasingly more expensive levels of theory to conclude that HF/6-3 lG(d) and B3LYP/ 6-311 +G(2d,p) provided an accurate representation of the compounds studies. We created global and subset models for the carboxylic acids using Partial Least Square (PLS), Support Vector Machines (SVM), and Radial Basis Function Neural Networks (RBFNN). The models were extensively validated using 4-, 7-, and 10-fold cross-validation, with the validation sets selected based on systematic and random sampling. HF/ 6-31G(d) in conjunction with SVM provided the best statistics when taking into account the large increase in CPU time required to optimize the geometries at the B3LYP/6-311+G(2d,p) level. The SVM models provided an average q2 value of 0.886 and an RMSE value of 0.293 for all the carboxylic acids, a q2 of 0.825 and RMSE of 0.378 for the ortho-substituted acids, a q2 of 0.923 and RMSE of 0.112 for the paraand meta-substituted acids, and a q2 of 0.906 and RMSE of 0.268 for the aliphatic acids. Our method compares favorably to ACD/Laboratories, VCCLAB, SPARC, and ChemAxon's p/£a prediction software based of the RMSE calculated by the leave-one-out method. © 2009 American Chemical Society.

U2 - 10.1021/ci900172h

DO - 10.1021/ci900172h

M3 - Article

SN - 1549-9596

VL - 49

SP - 1914

EP - 1924

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

IS - 8

ER -