TY - JOUR
T1 - pKa prediction from "quantum chemical topology" descriptors
AU - Popelier, P. L A
AU - Harding, A. P.
AU - Wedge, D. C.
N1 - Harding, A. P. Wedge, D. C. Popelier, P. L. A.
PY - 2009/8/24
Y1 - 2009/8/24
N2 - Knowing the pKa of a compound gives insight into many properties relevant to many industries, in particular the pharmaceutical industry during drug development processes. In light of this, we have used the theory of Quantum Chemical Topology (QCT), to provide ab initio descriptors that are able to accurately predict pATa values for 228 carboxylic acids. This Quantum Topological Molecular Similarity (QTMS) study involved the comparison of 5 increasingly more expensive levels of theory to conclude that HF/6-3 lG(d) and B3LYP/ 6-311 +G(2d,p) provided an accurate representation of the compounds studies. We created global and subset models for the carboxylic acids using Partial Least Square (PLS), Support Vector Machines (SVM), and Radial Basis Function Neural Networks (RBFNN). The models were extensively validated using 4-, 7-, and 10-fold cross-validation, with the validation sets selected based on systematic and random sampling. HF/ 6-31G(d) in conjunction with SVM provided the best statistics when taking into account the large increase in CPU time required to optimize the geometries at the B3LYP/6-311+G(2d,p) level. The SVM models provided an average q2 value of 0.886 and an RMSE value of 0.293 for all the carboxylic acids, a q2 of 0.825 and RMSE of 0.378 for the ortho-substituted acids, a q2 of 0.923 and RMSE of 0.112 for the paraand meta-substituted acids, and a q2 of 0.906 and RMSE of 0.268 for the aliphatic acids. Our method compares favorably to ACD/Laboratories, VCCLAB, SPARC, and ChemAxon's p/£a prediction software based of the RMSE calculated by the leave-one-out method. © 2009 American Chemical Society.
AB - Knowing the pKa of a compound gives insight into many properties relevant to many industries, in particular the pharmaceutical industry during drug development processes. In light of this, we have used the theory of Quantum Chemical Topology (QCT), to provide ab initio descriptors that are able to accurately predict pATa values for 228 carboxylic acids. This Quantum Topological Molecular Similarity (QTMS) study involved the comparison of 5 increasingly more expensive levels of theory to conclude that HF/6-3 lG(d) and B3LYP/ 6-311 +G(2d,p) provided an accurate representation of the compounds studies. We created global and subset models for the carboxylic acids using Partial Least Square (PLS), Support Vector Machines (SVM), and Radial Basis Function Neural Networks (RBFNN). The models were extensively validated using 4-, 7-, and 10-fold cross-validation, with the validation sets selected based on systematic and random sampling. HF/ 6-31G(d) in conjunction with SVM provided the best statistics when taking into account the large increase in CPU time required to optimize the geometries at the B3LYP/6-311+G(2d,p) level. The SVM models provided an average q2 value of 0.886 and an RMSE value of 0.293 for all the carboxylic acids, a q2 of 0.825 and RMSE of 0.378 for the ortho-substituted acids, a q2 of 0.923 and RMSE of 0.112 for the paraand meta-substituted acids, and a q2 of 0.906 and RMSE of 0.268 for the aliphatic acids. Our method compares favorably to ACD/Laboratories, VCCLAB, SPARC, and ChemAxon's p/£a prediction software based of the RMSE calculated by the leave-one-out method. © 2009 American Chemical Society.
U2 - 10.1021/ci900172h
DO - 10.1021/ci900172h
M3 - Article
SN - 1549-9596
VL - 49
SP - 1914
EP - 1924
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 8
ER -