hERG Liability Classification Models Using Machine Learning Techniques
Known for its critical role in cardiac action potential repolarization, the voltage-dependent hERG K+ channel is one of the most prominent anti-targets determining cardio-toxicity of drugs. Inhibition of hERG leads to QT interval prolongation, Torsades de Pointes and sudden death. As testing of hERG liability is mandatory and the standard experimental screening is complex, exorbitant, and tedious, there is a considerable interest for developing predictive computational (in silico) tools to identify and filter out potential hERG blockers early in drug discovery process. Here, we aimed to build robust descriptor-based QSAR and predictive models for hERG liability with different potency threshold cutoffs (1µM, 10µM and 30µM) by using a large curated dataset of 8,705 compounds, 2D descriptors and by applying machine learning techniques. The pipeline yielded accuracies of 0.87 (N=222, literature mined test set), 0.95 (N=277, Pubchem BioAssay) and 0.98 (N=86, in-house data) when tested on 3 different external validation sets. Incorporation of similarity information with combined models enhanced the prediction performance. The consensus model prediction performance on overall external dataset (N=499, literature mined and Pubchem BioAssay datasets only) showed accuracy = 0.92, CCR = 0.876, MCC = 0.786, sensitivity = 0.963 and specificity = 0.786 which is better than several existing models. The performance of our consensus model shows an improvement of 16% to 52% in discriminating blockers from non-blockers accurately compared to the other existing models. Owing to improved prediction performance, our consensus model can serve as a useful tool for screening the molecules, reducing time, cost and animal testing.