Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Dr. T. S. Baskaran, M. Arunkumar
DOI Link: https://doi.org/10.22214/ijraset.2023.55500
Certificate: View Certificate
With the growing need for public health and drug development, combination therapy has become widely used in clinical settings. However, the risk of unexpected adverse effects and unknown toxicity caused by drug-drug interactions (DDIs) is a severe public health issue for polypharmacy safety. Traditional investigational methods for detecting DDIs are expensive and time- consuming. Therefore, many computational methods have been developed in recent years to predict DDIs with the growing availability of data and advancements in artificial intelligence. In silico methods have proven to be effective in predicting DDIs, but detecting potential interactions, especially for newly exposed drugs without an existing DDI network, remains a challenge. In this study, we proposea predicting method of DDIs named HAG-DDI based on graph look networks. We consider the differences in mechanisms between DDIs and add learning of semantic-level attention, which can focus on superior representations of DDIs. By treating interactions as nodes and the presence of the same drug as edges, and constructing small subnetworks during training, we effectively mitigate potential bias issues arising from limited data availability. Our experimental results show that our method achieves an F1-score of 0.952, proving that our model is a viable alternative for DDIs prediction. The codes are available at: https://github.com/xtnenu/DDIFramework.
I. INTRODUCTION
Drug-drug interactions (DDIs) are a change in the effect of one drug due to the presence of another drug [1]. It can promote the efficacy or reduce the side effects and affect the drug absorption or produce adverse side effects. With the development of drugs, combination therapy is widely used clinically, and DDIs play an important part [2]. Therefore, the research of DDIs has great importance for new drugs and clinical pharmacy treatment. The medical research methods of DDIs are very diverse, including in vitro experiments, animal experiments and pharmaceutical experiments, as well as the research according to the clinical results. However, the above methods also have limitations which cannot predict DDIs on large scale datasets and it is important development of low-cost and high-efficiency DDIs research methods [3]. Hence, In silico methods provide a possibility and can provide certain references for clinical experiments. There are two types of computer experiments for DDI prediction [4]. The first type uses medical literature, databases and clinical records as research objects, and analyzes them using natural language processing [5, 6] or data mining methods [7–14]. Deep learning methods have been widely employed in various studies [5, 6, 8, 13]. The second type directly uses drug features to predict whether there is a DDI between two drugs [15–22]. In order to focus on the experimental results of predicting the potential interactions of new drugs with limited information, we will focus on the second type. predictable machine learning methods of DDI predictions use data features flexibly, and generallydo not require high-level experimental environments. Kastrin et al. [15] took the prediction of DDI asa link prediction problem, and used data from five databases to train five classifiers. Yan et al. [16] developed DDIGIP based on Gaussian interaction profile kernels. Qian et al. [17] developed a gradient boosting-based classifier and make obvious that targets of adversely DDIs are significantly more likelyto have synergistic genetic interactions than non-interacting drug pairs.
The deep learning methods of DDI predictions have higher requirements on computing power of the experimental equipment. Compared with conventional machine learning, deep learning can learn more abstract data representation. Rohani et al. [18] developed a deep learning model based on drug substructure, target, side effect, off-label side effect, pathway, transporter and indication data, making full use of the computing power of deep learning. Ryu et al. [19] developed a deep learning framework that can concurrently predict DDIs and drug-food interactions. Deng et al. [20]developed an architecture that integrates four deep learning sub-models that learn different features. Liu et al. [21] developed an autoencoder-based deep learning framework that can predict new drugs with unknown interaction relationships.
The network-based methods of DDI predictions compute the graph structure of data. since many data in biology and medicine exist in the network structure, the network methods can more easily reflect the similarity between data features. Chen et al. [22] applied the Laplace regularized least squares method to the synergistic drug combination to develop the model NLLSS. Tripodi et al. [23] proposed a semantic-reasoning-based approach that can infer DDIs through network computing through biological knowledge bases. Yan et al. [24] proposed IDNDDI, which uses a cosine similarity calculation tool to calculate the similarity of drug features and infer whether a DDI exists. Huang et al. [25] developed a prophecy method based on the S-score calculation mechanism.
In latest years, with the encroachment of deep learning technology, the incorporation of deep learningand network analysis methods for DDI research has been on the rise. Karim et al. [26], Wang et al. [27] and Xu et al. [28] have each proposed their own deep network model to address various issuesencountered in previous DDI studies. Graph convolutional neural networks and graph look networks, widely used algorithms in the field of bioinformatics, have also been applied to DDI research. Graph neural networks [29] have been proposed as a powerful method for processing graph representations based on deep learning. The graph structure can effectively represent various complex network structure data. Although graph neural networks can predict known graph structures, handling unknown graph structures remains challenging. To overcome this limitation, researchers have developed graph look networks [30]. In DDI research, Nyamabo et al. [31] proposed a graph look network model based on drug substructures, while Feng et al. [32] proposed a graph look network model based on chemical molecular graph calculations. Both studies have demonstrated the potential of graph look networks in DDI research.
Although methods based on drug features have made progress in DDI research and have been confirmed feasible in in silico methods, there are still some limitations. First, deep learning methods are mostly trained based on independent samples, and a large amount of data is required to discover the similarity and correlation between samples. Second, the network-based models is lacking a mining ability for advanced representations and some methods using graph structure cannot predict for a drug outside the network. Third, for new drugs, many methods are unable to extract their features and predict their related interactions.
In this study, we recommend a novel DDI prediction model based on heterogeneous graph look networks named HGAA-DDI. The HGAA-DDI uses DDIs as nodes and the same drugs as edges. To accommodate predictions for new drugs, we use only substructure molecular fingerprints of Pubchem as features. To strengthen the attention of the model to superior features, we use the node-level attention and the semantic-level attention mechanisms originally used in heterogeneous graph look networks. Our experimental results show that this new model achieves good performance.
In outline, the major hand-outs of this work are:
II. MATERIALS AND METHODS
A. Datasets
Our data is sourced from two databases, Drugbank and Twosides. Drugbank [33] consists of two parts: bioinformatics data and cheminformatics data. It integrates a vast amount of drug biochemical data, target structure and other information for drug research. Twosides [34–36] database collects only DDIs and is a sub-database of adverse DDIs derived from the FAERS (FDA Adverse Event Reporting System) database. We screened 1017 small molecule drugs and 202,304 DDIs that fit the FDA standards and feature extraction requirements of this work from Drugbank version 5.1.7. Subsequently, we selected 39,813 intersections recorded in Twosides as positive samples. To facilitate experimental grouping and address the imbalance of actual drug effects, we randomly generated 60,187 negative samples that did not appear in both databases, which brings the dataset total to 100,000 by selecting records that do not appear in Drugbank and Twosides.
Molecular fingerprint [37] encodes molecule information into a bit string where each bit represents a molecular feature. In this study, we used molecular fingerprints to represent drugs.
Through the Pubchem database, we extracted the substructure fingerprint of drugs as the learning features. The substructure fingerprint has 881 bits, covering a wide range of different substructures and functional groups. To build the graph data structure, we used the DDIs as nodes of the graph, and edges represent whether a drug is involved in two DDIs. For each DDI, we integrate the features of the relationship by comparing the features of each position between the two corresponding drugs. If a position is the same and equal to 1 for both drugs, it is set as 1 in the integrated feature. If the position is the same and equal to 0 for both drugs, it is set as 0 in the integrated feature. Otherwise, it is set as 0.5. Finally, each DDI is encoded as a 881-dimensional vector.
To verify the model on unknown drugs with limited data, we divided the data into 200 random sub-nets for batch training and randomly selected 2% of each sub-net as public validation and testing datasets. The final data distribution is shown in Figure 1:
Figure 2. Overview of HGAA-DDI. (a) The subnets retrieve drug molecular fingerprints from the PubChem database. (b) The heterogeneous graph look network layer extracts graphembeddings of DDIs, which are then fed into the MLP classifier for classification. (c) We compare the performance of our proposed graph look network layer with baseline graph embedding algorithms. (d) We conduct an analysis of the model to evaluate its effectiveness.
Because DDIs networks are complex graph structures, in this work, we applied graph look network [30], which is suitable for graph problems, as the core algorithm. The graph look network employs an attention mechanism as its main algorithm, eliminating the need for complex calculations involving matrices like Laplace. Instead, it updates node features through the representation of neighboring nodes. In the graph look network, the learning weights from target nodes to neighbor nodes differ. The adjacency matrix defines the representation of the relevant node, and the calculation of the relationship weights depends on the features of both the node and its neighbor. Specifically, the weight of the neighbor to the node is calculated as follows:
After computing the embeddings of drug nodes, the second part of HGAA-DDI is a Multilayer Perceptron (MLP) classifier by using the embeddings as input. To train the MLP classifier, we performed ten-fold cross-validation. In each fold, 10% of the data from the test set was randomly selected as the testing portion for evaluating the performance of the MLP classifier, while the remaining 90% of the data was used for training the MLP classifier. We repeated this process ten times, each time using a different 10% portion for testing and the rest for training. The final performance of the model was determined by averaging the results of these ten folds, taking into account the performance of each individual model.
In this study, we propose two meta-paths: the Interaction Independent Feature Meta-Path (IIFM) and the Interaction-Drug-Interaction Meta-Path (IDIM). IIFM is represented by a diagonal matrix, which indicates that the network uses its own features. The mathematical expression of IIFM is as follows:
IDIM refers to the Interaction-Drug-Interaction Matrix. For a given node, if there exists a shared drug between that node and another node, the corresponding element in IDIM is set to 1. Conversely, if there is no shared drug, the element is set to 0. Notably, all diagonal elements in the matrix are set to 1.
C. Baseline models
To validate the overall effectiveness of the model, we compared two state-of-the-art open-source drug-drug interaction (DDI) models based on graph look networks. The first model, SSI-DDI [31], has released all its training code, while the second model, GNN-DDI [32], has shared its model architecture. We successfully reproduced both models and compared the results by extracting the SMILES representations of drug molecules from the test set used in this study.
Additionally, in order to test the ability of the graph look network used in HGAA-DDI to extract embeddings, we compared the embedding features extracted by five baseline models based on graph algorithms, and also used MLP classifier for comparison. The introductions of the five algorithms are as follows:
DeepWalk [39] is a network-based language modeling algorithm that utilizes local information obtained from truncated random walks to learn latent representations. It treats walks as the equivalent of sentences and consists of a random walk generator and an update procedure.
2. SDNE
Structural Deep Network Embedding (SDNE) [40] is a semi-supervised deep learning algorithm that incorporates two orders of similarity. The first-order similarity primarily reflects the local characteristics of the graph and is used as supervised information in the supervised component. The second-order similarity mainly reflects the global characteristics of the graph, which is used by the unsupervised component.
3. LINE
Large-scale Information Network Embedding (LINE) [41] optimizes an objective function and proposes an edge-sampling algorithm that improves both the effectiveness and efficiency of stochastic gradient descent.
4. Node2Vec
Node2Vec [42] learns continuous feature representations of networks and maps nodes to low- dimensional feature representations to maximize the likelihood representation of network neighbor nodes. It defines a flexible notion of a node’s network neighborhood, designs a biased random walk procedure and learns to explore a variety of neighbor representations.
5. Struc2Vec
Struc2Vec [43] uses a hierarchy to measure node similarity at different scales and constructs a multi-layer graph to encode structural similarities and generate structural context for nodes.
To validate the rationality of the MLP classifier of HGAA-DDI, we compared it with several machine learning methods. They are Support Vector Machines (SVM), Random Forests (RF), Gradient Boosting Decision Tree (GBDT) and K-Nearest Neighbor (KNN) Classifier.
III. RESULTS AND DISCUSSION
A. Metrics
In order to evaluate the performance, we use precision (PRE), sensitivity (SEN), specificity (SPE), accuracy (ACC), F1 score and Matthews correlation coefficient (MCC) as metrics, and their formulas
Model |
ACC |
PRE |
SEN |
SPE |
F1 |
MCC |
SSI-DDI |
0.931 |
0.920 |
0.943 |
0.918 |
0.931 |
0.862 |
GNN-DDI |
0.908 |
0.913 |
0.903 |
0.912 |
0.908 |
0.816 |
HGAA-DDI |
0.952 |
0.964 |
0.939 |
0.965 |
0.952 |
0.904 |
In comparison to two state-of-the-art graph look networks, HGAA-DDI demonstrated the best performance across all metrics, showcasing the overall computational superiority of the model. However, during the comparison, it was noted that one limitation of HGAA-DDI is its inability to predict the types of drug interactions. This will be addressed and improved upon in our future work.
B. Sensitivity Analysis Of Graph Embedding Methods
To verify the graph look network, we conduct five baseline models which used to calculate drug graph embedding on testing datasets of this work. The results are in Table 2.
Table 2. Comparison with other graph embedding algorithms.
Model |
ACC |
PRE |
SEN |
SPE |
F1 |
MCC |
Deepwalk |
0.834 |
0.972 |
0.688 |
0.980 |
0.806 |
0.698 |
SDNE |
0.779 |
0.941 |
0.595 |
0.963 |
0.729 |
0.600 |
LINE |
0.809 |
0.958 |
0.645 |
0.972 |
0.771 |
0.653 |
Node2Vec |
0.846 |
0.978 |
0.708 |
0.984 |
0.821 |
0.720 |
Struct2Vec |
0.761 |
0.921 |
0.570 |
0.951 |
0.704 |
0.564 |
HGAA-DDI |
0.952 |
0.964 |
0.939 |
0.965 |
0.952 |
0.904 |
|
|
|
|
|
|
|
The Table 2 presents the comparison results, demonstrating the superior performance of HGAA- DDIin terms of ACC, SEN, F1-score and MCC. These results indicate the competence of HGAA-DDI in DDI prediction. While other graph embedding algorithms may exhibit a bias towards encoding data towards positive samples due to data imbalance, this results in HGAA-DDI not achieving the best performance in terms of PRE and SPE metrics. However, when considering comprehensive metrics such as F1 andMCC, it becomes evident that HGAA-DDI possesses better capability in distinguishing between positive and negative samples. This discriminative ability highlights the advantage of the heterogeneous graph look network employed by HGAA-DDI. Furthermore, the highest ACC value obtained by our model suggests its accuracy in identifying DDI samples and its effective extraction of graph embeddings using the graph look network. Figure 3 provides a visual representation of the comparison results.
As shown in the Figure 5, the meta-path IDIM is given a higher weight on our training datasets, which means that the method regards IDIM as the most critical meta-path for identifying drug interactions. The experimental results also reflect that IDIM has more effective features than IIFM. It also further confirms the validity of semantic-level attention and the difference in the effectiveness of the meta-paths.
E. Case Study
To verify the ability of this method to predict real data, we conduct database and literature studies as case studies. In the database study, to demonstrate the advancement of the model in considering interaction relationships with shared drugs as edges, we focused on studying newly developed drugs related to COVID-19. We collected a total of 1734 drugs related to Covid-19 from PubChem, differentiating them by whether they were included in PubChem between 2021 and 2022, resulting in 57 new drugs. We predicted a total of 98,718 potential relationships between each drug and all drugs. Using HGAA-DDI for prediction, a total of 19,055 interactions were predicted as positive, with 8128 interactions classified as high-confidence samples (predicted probability of being positive greater than 95%).
We statistically analyzed these high-confidence samples to identify the top 20 sensitive new drugs in terms of drug interactions. The results are in Table 4 (as some compounds were not named, they are represented by their PubChem ID and molecular formula):
Through database research, we have demonstrated the potential of HGAA-DDI in predicting interactions for new drugs. Additionally, in the literature study, we collected the reports on DDIs from PubMed in recent three years, and found a total of 36 DDIs, including clinical, pharmaceutical and in vitro experimental methods. Twenty DDIs are not all composed of small molecule drugs, and among
16 DDIs that are composed with two small molecule drugs and have substructure molecular fingerprints on PubChem, only 2 of them not be included by Drugbank. They are interaction between voriconazole and tamsulosin hydrochloride [44] and interaction between voriconazole and methotrexate [45], which are also predicted to be positive samples. The results show that our method has the ability to correctly predict data outside our datasets.
Drug-drug interactions have high value for medical and clinical studies, especially drug development. Capturing more and richer comprehensive information about DDIs is one of the key tasks in public health and drug development. In silico methods to predict drug interaction can effectively guide the medical experiment, and modeling DDIs as a graph structure can effectively analyze the correlation. In this work, we propose an interaction prediction method based on graph look mechanism, and the learning of semantic-attention mechanism is effectively used in the method. Finally, the prediction performance of this model is better than five comparison models on our testing datasets. Moreover, through the analysis of the meta-paths selection, the importance of the reference neighbor node weight of this problem is verified. Finally, through several testing cases, it demonstrated the availability of our method.
[1] K. Baxter, Stockley’s Drug Interactions : A Source Book of Interactions, Their Mechanisms, Clinical Importance and Management, Pharmaceutica Press, 2010. [2] D. M. Qato, J. Wilder, L. P. Schumm, V. Gillet, G. C. Alexander, Changes in prescription and over-the-counter medication and dietary supplement use among older adults in the united states, 2005 vs 2011, JAMA Intern. Med., 176 (2016), 473–482. https://doi.org/10.1001/jamainternmed.2015.8581 [3] Y. Chen, T. Ma, X. Yang, J. Wang, B. Song, X. Zeng, Muffin: Multi-scale feature fusion for drug–drug interaction prediction, Bioinformatics, 37 (2021), 2651–2658. https://doi.org/10.1093/bioinformatics/btab169 [4] Y. Qiu, Y. Zhang, Y. Deng, S. Liu, W. Zhang, A comprehensive review of computational methods for drug-drug interaction detection, IEEE/ACM Trans. Comput. Biol. Bioinf., 19 (2022), 1968– 1985. https://doi.org/10.1109/TCBB.2021.3081268 [5] Z. Zhao, Z. Yang, L. Luo, H. Lin, J. Wang, Drug drug interaction extraction from biomedical literature using syntax convolutional neural network, Bioinformatics, 32 (2016), 3444–3453. https://doi.org/10.1093/bioinformatics/btw486 [6] R. Kavuluru, A. Rios, T. Tran, Extracting drug-drug interactions with word and character-level recurrent neural networks, in 2017 IEEE International Conference on Healthcare Informatics, IEEE, (2017), 5–12. [7] S. Kim, H. Liu, L. Yeganova, W. J. Wilbur, Extracting drug–drug interactions from literature using a rich feature-based linear kernel approach, J. Biomed. Inf., 55 (2015), 23–30. ttps://doi.org/10.1016/j.jbi.2015.03.002 [8] I. N. Dewi, S. Dong, J. Hu, Drug-drug interaction relation extraction with deep convolutional neural networks, in 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, (2017), 1795–1802. [9] Y. Shen, K. Yuan, Y. Li, B. Tang, M. Yang, N. Du, et al., Drug2vec: Knowledge-aware feature- driven method for drug representation learning, in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, (2018), 757–800. [10] J. S. Almenoff, W. DuMouchel, L. A. Kindman, X. Yang, D. Fram, Disproportionality analysis using empirical bayes data mining: A tool for the evaluation of drug interactions in the post-marketing setting, Pharmacoepidemiol. Drug Saf., 12 (2003), 517–521. https://doi.org/10.1002/pds.885 [11] G. N. Nore´n, A. Bate, R. Orre, I. R. Edwards, Extending the methods used to screen the who drug safety database towards analysis of complex associations and improved accuracy for rare events, Stat. Med., 25 (2006), 3740–3757. https://doi.org/10.1002/sim.2473 [12] A. Suzuki, N. Yuen, K. Ilic, R. T. Miller, M. J. Reese, H. R. Brown, et al., Comedications alter drug-induced liver injury reporting frequency: Data mining in the who vigibase™, Regul. Toxicol. Pharm., 72 (2015), 481–490. https://doi.org/10.1001/jamaneurol.2015.0365 [13] R. Harpaz, H. S. Chase, C. Friedman, Mining multi-item drug adverse effect associations in spontaneous reporting systems, BMC Bioinf., 11 (2010), 1–8. [14] Y. Noguchi, A. Ueno, M. Otsubo, H. Katsuno, I. Sugita, Y. Kanematsu, et al., A new search method using association rule mining for drug-drug interaction based on spontaneous report system, Front. Pharmacol., 9 (2018), 197. [15] A. Kastrin, P. Ferk, B. Leskos?ek, Predicting potential drug-drug interactions on topological and semantic similarity features using statistical learning, PLoS One, 13 (2018), e0196865. https://doi.org/10.1371/journal.pone.0196865 [16] C. Yan, G. Duan, Y. Pan, F. X. Wu, J. Wang, Ddigip: Predicting drug-drug interactions based on gaussian interaction profile kernels, BMC Bioinf., 20 (2019), 1–10. https://doi.org/10.1186/s12859-018-2565-8 [17] S. Qian, S. Liang, H. Yu, Leveraging genetic interactions for adverse drug- drug interaction prediction, PLoS Comput. Biol., 15 (2019), e1007068. https://doi.org/10.1371/journal.pcbi.1007068 [18] N. Rohani, C. Eslahchi, Drug-drug interaction predicting by neural network using integrated similarity, Sci. Rep., 9 (2019), 13645. [19] J. Y. Ryu, H. U. Kim, S. Y. Lee, Deep learning improves prediction of drug–drug and drug–food interactions, Proc. Natl. Acad. Sci., 115 (2018), E4304–E4311. [20] Y. Deng, X. Xu, Y. Qiu, J. Xia, W. Zhang, S. Liu, A multimodal deep learning framework for predicting drug–drug interaction events, Bioinformatics, 36 (2020), 4316–4322. https://doi.org/10.1093/bioinformatics/btaa501
Copyright © 2023 Dr. T. S. Baskaran, M. Arunkumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET55500
Publish Date : 2023-08-25
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here