Abstract
The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties remains a central bottleneck in small-molecule discovery. We present the third-place solution from the PolarisHub Antiviral Competition, covering five end points broadly relevant to small-molecule design: human and mouse liver microsomal stability (HLM, MLM), MDR1-MDCKII permeability, kinetic solubility, and lipophilicity (LogD). Rather than pursuing complex machine learning architectures, we adopted a descriptor-first strategy. We systematically curated descriptors and models from ADMET Predictor as meta-features and then applied high-capacity tabular learners. A pretrained foundation model for tabular data (TabPFN), used in single-task regression, consistently outperformed or matched a strong gradient boosting baseline (CatBoost), yielding up to 44% mean absolute error (MAE) reduction across end points while simplifying deployment by eliminating an extensive hyperparameter search and producing compact models. Additionally, we engineered two feature sets that delivered modest gains in randomized cross-validation runs: (i) tuned fragment representations and (ii) site-of-metabolism pattern features. Overall, we used four groups of features: mechanistic, physicochemical, fragment, and metabolic. These results indicate that in practical ADMET modeling scenarios, where rich, validated descriptors are available, the competitive advantages often arise from principled feature engineering combined with robust, rather than overly complex, modeling approaches.
By Vladimir Chupakhin and John DiBella