TY - JOUR
T1 - Prediction of Aqueous pKa Values for Guanidine-Containing Compounds Using Ab Initio Gas-Phase Equilibrium Bond Lengths
AU - Caine, Bethan
AU - Dardonville, Christophe
AU - Popelier, Paul
PY - 2018/4/30
Y1 - 2018/4/30
N2 - In this work, we demonstrate the existence of linear relationships between gas-phase equilibrium bond lengths of the guanidine skeleton of 2-(arylamino)imidazolines and their aqueous pKa value. For a training set of 22 compounds, in the most stable conformation of their lowest energy tautomeric form, three bonds were found to exhibit r2 and q2 values >0.95 and root-mean-squared-error of estimation values ≤0.25 when regressed individually against pKa. The equations describing these one-bond-length linear relationships, in addition to a multiple linear regression model using all three bond lengths, were then used to predict the experimental pKa values of an external test set of further 27 derivatives. The optimal protocol we derive here shows an overall mean absolute error (MAE) of 0.20 and standard deviation of errors of 0.18 for the test set. Predictions for a second test set of diphenyl-based bis(2-iminoimidazolidines) yielded an MAE of 0.27 and a standard deviation of 0.10. The predictive power of the optimal model is further demonstrated by its ability to correct erroneously reported experimental values. Finally, a previously established guanidine model is recalibrated at a new level of theory, and predictions are made for novel phenylguanidine derivatives, showing an MAE of just 0.29. The protocols established and tested here pass both of Roy’s modern and stringent MAE-based criteria for a “good” quantitative structure–activity relationship/quantitative structure–property relationship model predictivity. Notably, the ab initio bond length high correlation subset protocol developed in this work demonstrates lower MAE values than the Marvin program by ChemAxon for all test sets
AB - In this work, we demonstrate the existence of linear relationships between gas-phase equilibrium bond lengths of the guanidine skeleton of 2-(arylamino)imidazolines and their aqueous pKa value. For a training set of 22 compounds, in the most stable conformation of their lowest energy tautomeric form, three bonds were found to exhibit r2 and q2 values >0.95 and root-mean-squared-error of estimation values ≤0.25 when regressed individually against pKa. The equations describing these one-bond-length linear relationships, in addition to a multiple linear regression model using all three bond lengths, were then used to predict the experimental pKa values of an external test set of further 27 derivatives. The optimal protocol we derive here shows an overall mean absolute error (MAE) of 0.20 and standard deviation of errors of 0.18 for the test set. Predictions for a second test set of diphenyl-based bis(2-iminoimidazolidines) yielded an MAE of 0.27 and a standard deviation of 0.10. The predictive power of the optimal model is further demonstrated by its ability to correct erroneously reported experimental values. Finally, a previously established guanidine model is recalibrated at a new level of theory, and predictions are made for novel phenylguanidine derivatives, showing an MAE of just 0.29. The protocols established and tested here pass both of Roy’s modern and stringent MAE-based criteria for a “good” quantitative structure–activity relationship/quantitative structure–property relationship model predictivity. Notably, the ab initio bond length high correlation subset protocol developed in this work demonstrates lower MAE values than the Marvin program by ChemAxon for all test sets
U2 - 10.1021/acsomega.8b00142
DO - 10.1021/acsomega.8b00142
M3 - Article
SN - 2470-1343
JO - ACS Omega
JF - ACS Omega
ER -