Predicting pK a for Small Molecules on Public and In‐house Datasets Using Fast Prediction Methods Combined with Data Fusion
Data fusion approach was investigated in the context of pKa prediction for 391 small molecules derived from a public data source as well as for 681 compounds from an internal corporate database. Four different pKa prediction methods (Simulations Plus ADMET‐Predictor S+pKa, ACD/Labs Percepta Classic, ACD/Labs Percepta GALAS and Epik) were used to predict the most acidic or basic pKa for each of the compounds. By using data fusion, the median absolute error for the internal compounds was reduced from the best performing single model’s value of 0.69 down to 0.50. In addition to the improved accuracy, data fusion also enabled predictions for all of the compounds in the dataset as individual methods failed on some of the molecules.