Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation

The authors wish it to be known that, in their opinion, the first Yu Wei, Shanshan Li and Zhonglin Li should be regarded as Joint First Authors.

29 March 2022 15 October 2021 Revision received: 05 March 2022 Editorial decision: 22 March 2022 28 March 2022 29 March 2022 Corrected and typeset: 15 April 2022

Cite

Yu Wei, Shanshan Li, Zhonglin Li, Ziwei Wan, Jianping Lin, Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation, Bioinformatics, Volume 38, Issue 10, May 2022, Pages 2863–2871, https://doi.org/10.1093/bioinformatics/btac192

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

Abstract

Motivation

In the process of discovery and optimization of lead compounds, it is difficult for non-expert pharmacologists to intuitively determine the contribution of substructure to a particular property of a molecule.

In this work, we develop a user-friendly web service, named interpretable-absorption, distribution, metabolism, excretion and toxicity (ADMET), which predict 59 ADMET-associated properties using 90 qualitative classification models and 28 quantitative regression models based on graph convolutional neural network and graph attention network algorithms. In interpretable-ADMET, there are 250 729 entries associated with 59 kinds of ADMET-associated properties for 80 167 chemical compounds. In addition to making predictions, interpretable-ADMET provides interpretation models based on gradient-weighted class activation map for identifying the substructure, which is important to the particular property. Interpretable-ADMET also provides an optimize module to automatically generate a set of novel virtual candidates based on matched molecular pair rules. We believe that interpretable-ADMET could serve as a useful tool for lead optimization in drug discovery.

Availability and implementation Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

The research and development of new drug therapies is a complicated process involving disease identification, target discovery, lead discovery and optimization, preclinical studies and clinical trials. A compound entering Phase I trials is a process with a low chance of success, which is still slightly under 10% as in the past couple of decades ( Dowden and Munro, 2019). The problems with drug efficacy and/or safety result in about 75–80% of the trial failures, which result in great economic losses ( Dowden and Munro, 2019; Sertkaya et al., 2016). Generally, marketed and research drugs are supposed to possess the high clinical efficiency and safety. However, the unfavorable pharmacokinetic (PK) properties and unacceptable toxicity that are largely related to efficiency, are the preliminary factors besides inefficiency that led to the failure of the drug in the clinical trial phase ( Honorio et al., 2013). The efficacy deficiency is heavily due to PK issues, for instance, whether a drug will arrive at the designated location in the body with appropriate speed and concentration, and how long it will leave the patient’s body. Therefore, the early assessment of absorption, distribution, metabolism, excretion and toxicity (ADMET) has become an important consideration for pharmacologists in the development of therapeutic new drug.

Experimental evaluation of ADMET-associated properties can be a time-consuming and costly process that requires extensive animal experiments in the early stage of drug discovery. In order to early identify the risk of trial failures, computational strategies are utilized to provide aid by automatically predicting the ADMET properties. Currently, a variety of ADMET predictive models based on artificial intelligence (AI) approaches have been developed ( Tao et al., 2015), such as the prediction of P-glycoprotein substrates (pgp-substrate) ( Li et al., 2014), drug–human serum albumin binding interaction ( Zsila et al., 2011), unbound brain-to-plasma concentration ratio ( Chen et al., 2011), human Ether-a-go-go-Related gene (hERG) potassium channel blockage ( Wang et al., 2012) and cytochromes P450 inhibition ( Lapins et al., 2013). Although these models with significant predictive ability can rapidly filter out compounds with undesirable PK properties or inappropriate toxicity, the complex algorithm and the demand of particular operation system and software compiler limit the use of these AI models. Hence, free web-based servers to predict molecular ADMET-associated properties have evolved for solving the usage problem. For example, SwissADME is a web tool to evaluate molecular PK using nine ADMET evaluation models that constructed based on multiple linear regression and support vector machine (SVM) ( Daina et al., 2017). ProTox-II is a free web server to assess the various toxicity of chemicals, such as acute toxicity, cytotoxicity, hepatotoxicity (H-HT), carcinogenicity, immunotoxicity, mutagenicity, adverse outcomes pathways (Tox21) and toxicity targets, based on 33 predictive machine-learning (ML) models ( Banerjee et al., 2018). AdmetSAR 2.0 ( Yang et al., 2019), a new version of admetSAR, is a comprehensive source through collecting a variety of available ADMET properties. In admetSAR 2.0, 47 predictive models are built by ML methods including random forest (RF), SVM, k-nearest neighbors and convolutional neural networks (CNNs) for the evaluation of chemical ADMET properties. ADMETlab ( Dong et al., 2018) is a web server for predicting 31 types of ADMET properties based on MACCS keys or topological descriptors using the RF or SVM method, and ADMETlab 2.0 ( Xiong et al., 2021) is developed for the prediction of 88 ADMET-related endpoints using the multi-task graph attention framework. Although these servers described above cover as many properties as possible, it is far from enough for dealing with the massive amounts of the growing data generated in the process of drug discovery. Jurgen Bajorath stated that data could be more key factor than algorithms in medicinal chemistry ( Bajorath, 2021). A platform is still appreciated that offers more features to predict and optimize the chemical ADMET properties based on advanced algorithms and ever-increasing data, and results in more accurate models.

Compared to traditional ML using fixed length vector featurizations to represent molecules, graph convolutional neural network (GCNN) ( Kearnes et al., 2016) automatically extracts features from the graph-structured data and represents individual molecule by learning a differential feature vector from multiple graph convolutional layers. In GCNN, the representation of each molecule is composed of a feature matrix and an adjacency matrix based on the molecular structure. Atoms are defined as nodes to form features matrix, and chemical bonds constitutes the adjacency matrix. However, GCNN is inadequate for capturing the information about the binding interaction between protein and ligand to predict binding affinities of small molecule. Besides, 3D conformational flexibility of small molecule is not fully described by the molecular graph, which is also insensitive to electrostatics and quantum effects. Whereas GCNN suffers from the weakness that inadequately represents the molecular properties, it has nevertheless revealed great potential for computational drug development and discovery (e.g. synthesis prediction, de novo molecular design, drug–protein interaction prediction, drug–drug interactions prediction and drug property prediction) ( Sun et al., 2020). An ensemble framework of graph attention networks (GATs) developed by Long et al. (2020) was applied for human microbe–drug association prediction. Cynthia Rudin indicated that deep neural networks perform no better than traditional ML when dealing with problems that have structured data or small datasets ( Rudin, 2019). And serval comparative studies also showed that GCNN and other state-of-the-art of AIs achieved a comparable performance ( Harigua-Souiai et al., 2021; Sun et al., 2020). However, in some cases, such as de novo compound generation, the performance will probably be very different ( Bajorath, 2021). Even so, considering the characteristic of simple compound feature encoding and automatic feature extraction from drug molecule structure, which is treated as a graph ( Rifaioglu et al., 2019; Sun et al., 2020), graph convolutional features were used as the representation of irregular molecular structure to build neural network models in this study.

Recently, in order to overcome the limitation of general deep neural networks in explaining the prediction results, several interpretability methods have been developed based on CNNs or GCNN ( Pope et al., 2019; Zhang and Zhu, 2018). For example, contrastive gradients, class activation mapping (CAM) and excitation backpropagation (EB) are the popular interpretability methods for CNNs. Furthermore, these interpretability methods were extended to GCNN, such as gradient-weighted CAM (Grad-CAM) and contrastive EB. McCloskey et al. (2019) applied integrated gradients on GCNN model for protein–ligand binding to reveal the molecular features responsible for small-molecule binding and analyze the binding mechanism. In 2020, Gligorijevic et al. (2020) proposed an approach namely DeepFRI Grad-CAM using GCNN architecture to predict protein functions and interpret prediction results by identifying the functional regions in proteins. GAT ( Veličković et al., 2017) is another type of the graph attention neural network. GAT is differentiated from GCNN since the former is added with the attention mechanism and can assign different weights to different nodes in the neighborhoods. As far as we know, the application of GCNN architecture and interpretability methods on GCNN in ADMET end points prediction is still limited. Thus, it could be helpful to further expand the availability of ADMET predictive models based on GCNN or GAT by mining a plentiful of growing useful data size from the literature and databases.

In this study, we presented interpretable-ADMET based on Grad-CAM ( Pope et al., 2019; Selvaraju, 2020), a user-friendly web service for ADMET properties prediction ( Fig. 1). Compared to other existing ADMET assessment tools, interpretable-ADMET has the following unique features. First, it provides prediction value of a specific ADMET property, as well as the contribution of substructure to a given property. Second, the platform enables automatically optimize the lead compound for improving a particular ADMET property.

Data components and functions of interpretable-ADMET

Data components and functions of interpretable-ADMET

2 Materials and methods

2.1 Data preparation

The ADMET data in interpretable-ADMET were collected from various public data sources, such as ChEMBL ( Mendez et al., 2019), PubChem ( Kim et al., 2021), DrugBank ( Wishart et al., 2018) and literature publications. The structural information and experimental data were simultaneously collected. After removing the ADMET data with missing experimental values or missing SMILES, all retained ADMET data were desalted, standardized and attained original protonation states using RDKit (http://rdkit.org). Salts and solvents in the defined list used in RDKit were removed to generate the parent structures. For duplicates, one of them was retained for classification data, and arithmetic mean value was adopted for regression data. To obtain robust results, the performances of the GCNN and GAT algorithms were evaluated via five repeated experiments in which the data of each ADMET property was randomly split five times into training set, validation set and test set by the ratio of 8:1:1. For each ADMET property, five datasets were trained using both GCNN and GAT algorithms. The classification and regression models were trained using the training set and optimized using the validation set in the model building process. The test set is used to evaluate the performance of model. Repeat above steps to generate five models for each ADMET property, and the final prediction results based on GCNN and GAT algorithms were determined based on the average of output probability of five models for each ADMET property.

2.2 Predictive models

In this study, GCNN ( Kearnes et al., 2016) and GAT ( Veličković et al., 2017) were utilized to develop qualitative classification and quantitative regression predictive models. The GCNN model was constructed using PyTorch ( Paszke et al., 2019) and PyTorch Geometric ( Wang et al., 2021). We built the GCNN model of three graph convolutional layers, and chose global max pooling as the pooling operation. The non-linear transformation was performed using the rectified linear units activation function ( Nair and Hinton, 2010). We used a dropout of 0.2 to prevent overfitting. The learning rates of 0.001 were used to improve the performance of the network and further obtain stable performance. In the classification predictive model, we set a batch equal to 128, which takes 128 samples from the training dataset to train the network. In order to overcome the problem of data imbalance, focal loss was used as the loss function, which reshaped the cross entropy loss in order to reduce weights of well-classified examples ( Lin et al., 2020). The regression model was constructed using MSELoss as the loss function. Similar to the GCNN model, the GAT model also was constructed using PyTorch and PyTorch Geometric. The difference was that three GAT layers were used with global max pooling and batch size of 512.

2.3 Evaluation of predictive models

For the classification models, four evaluation indicators including sensitivity (SE) (Formula 1), specificity (SP) (Formula 2), accuracy (ACC) (Formula 3), Matthew’s correlation coefficient (MCC) (Formula 4) and area under curve (AUC) were applied to evaluate the classification ability of the models. The performance of regression models was assessed using the root mean squared error (RMSE) (Formula 5), the square correlation coefficients (Q 2 ) for validation set (Formula 6) and the square correlation coefficients (R 2 ) for test set (Formula 7). The formulas were presented as follows: