Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Pranali Pali, Abhishek Jain, Pooja Kyadarkunte, Sampada Patil, Shruti Jais, Rajesh Nasare, Sharda Chhabria
DOI Link: https://doi.org/10.22214/ijraset.2022.47504
Certificate: View Certificate
An objective of drug discovery is to identify novel substances with certain chemical properties for the treatment of diseases. A significant amount of biological data has been produced recently from a variety of sources. Using this data, molecular analysis has been used to determine the most successful treatments. Trial-and-error medicine is frequently frustrating and significantly more expensive. This makes it easier to complete the work by predicting whether a drug will be active or not. The information about the drug can also be used to develop new medications. Quantitative Structure Activity Relationship (QSAR) analysis is one application that uses machine learning to improve decision-making in pharmaceutical data across multiple applications. Predictive models based on machine learning have recently grown substantially in prominence with in phase beyond preclinical research. In this stage, new drug discovery expenses and research times are significantly reduced. Utilizing pattern recognition algorithms, deciphering mathematical correlations, chemical and biological features of compounds, and machine learning has been used for drug development increasingly and more frequently, with positive outcomes. Other restrictions include the necessity for a large volume of data, a lack of interpretability, etc. Machine learning approaches are comparable to physical models in that they may be applied to large data sets without the need for computational resources.
I. INTRODUCTION
Identifying a therapeutically effective molecule for the diagnosis and treatment of disease has been the purpose of the drug discovery process. According to the Precision Medicine Initiative, precision medicine is a new approach to disease treatment and prevention that considers a person's unique genetic makeup as well as their surroundings and way of life. With the use of this innovative method, medical professionals and researchers can now anticipate with greater precision which disease treatment and preventative techniques will be effective for which demographics. The search for innovative medications continues to be a time-consuming and expensive process. A new medicine typically takes between 10 and 15 years to produce through research and testing. The cost of drug discovery and development is dramatically rising when using a standard approach. There are many different chemical compounds with different properties that have existed for ages. Building a medication for the particular ailment will be made easier by studying these chemicals. Whole-person care is the foundation for drug research. By using conventional procedures, finding drugs takes a long time. This project uses machine learning approaches to address the aforementioned issues. This model's major purpose is to cut down on expenses and time spent on research when finding new drugs. The ChEMBL database, which contains information on compounds' biological and chemical properties, is used in this study. Acetylcholinesterase is particularly mentioned as a compound in this research.Let us understand some of the important terminologies :
II. METHODOLOGY
Even though all study domains share some steps in the experimental design, the use of an ML approach must be cross-disciplinary. We can distinguish the following steps in the ML methodology used in drug discovery specifically: Data Collection, Creating Mathematical Descriptors, Searching for best selection of variables, Model Training, and Model Validation are the essential five steps:
2. Creating Mathematical Descriptors: Although certain machine learning (ML) models do not require labelling, supervised learning models are frequently used in the field of drug development. In this instance, the researchers' defined labelling will be crucial to the experimental procedure. A set of data that the ML model can process is obtained with the production of the mathematical descriptors. This dataset is split into two subsets: one with a larger percentage of data used to train the model (shown in Fig. 2.) and a smaller one used to test the model (represented in Fig. 2.). With the correct and required information, the best subset of variables inside the training set is sought after. Unsurprisingly, a lot of numerical variables are offered when creating mathematical descriptors. The fundamental goal of this approach is to eliminate as many redundant or superfluous variables as feasible. To this goal, other methods exist, including PCA(Principal Component Analyisis), t-SNE( t-Distributed Stochastic Neighbor Embedding), FS(Feature Selection), Autoencoder, etc.
3. Searching for Best Selection of Variables: A subset of the original collection of features is selected using FS approaches, but not alter the variables' contents. The algorithms and their input parameters must be chosen first. To make sure they are appropriate for the task at hand and the quantity and type of data available, these must be carefully picked. This provides a biologically comprehensible justification, which is the reason why a Most researchers employ these methods while creating their experimental designs.
4. Model Training: The model is trained when the best collection of variables has been identified. The experiment is then run a number of times using the practise data. To ensure the model's applicability to unknowable inputs, excessive training should be avoided. In these situations, cross-validation (CV) techniques are frequently used. The CV enables performance evaluation, performance estimation with unknown data, and monitoring the model's degree of generalisation throughout the training phase.
5. Model Validation: The initial data set is separated once more into three groups for each execution of the experiment. The training set and the validation set are two subsets. Figure 2 shows the evolution of the CV approach over the course of 10 runs. The blue set represents the training set for each of these runs, and the red set represents the validation set. The optimal parameter combinations for each approach are to be chosen as the final result of the CV process. Using these criteria, each model's performance is evaluated. The model with the highest performance value at the lowest total cost is the best one. The test set that was taken from the original set is then retrieved (shown in Fig. 2), and the best model that was produced by the CV process is then final validated. A new predictive drug model may have been developed if the validation results are statistically significant.
The application of machine learning techniques is widespread, and more articles have been published recently in particular. However, there aren't many machine learning publications on open access platforms that are concerned with medication development.
III. DATABASES, SOFTWARES, PACKAGES and THEIR REPRESENTATION
A. Databases
B. Softwares
In order to make model interpretation easier, a number of software tools have been developed in light of the current interest deep learning applications are receiving. Captum, an addition to the PyTorch deep learning and automatic differentiation package that offers support for the majority of the feature attribution strategies discussed in this paper, is a notable example. Alibi, another well-liked package, offers instance-specific justifications for individual models developed using the scikit-learn or TensorFlow libraries. Anchors, descriptive explanations, and counterfactual examples are a few of the explanation techniques used.
C. Packages
Based on the prior work, Sakakibara created a web service called Comprehensive Predictor of Interactions between Chemical Compounds and Target Proteins, which use SVM as the Drug Target Interaction(DTI) predictor. It appears that this server is no longer accessible.
In order to integrate chemoinformatics, bioinformatics, proteochemometrics, and chemogenomics for DTI prediction, Cao created the Python tool PyDPI based on Random Forest. The proposed approach uses prepared dictionaries for categorization and requires choosing chemical characteristics. This package can be used to build web-based servers and offers an interface for databases including PubChem, Drugbank, Uniprot, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The same team also developed PreDPI-Ki, a web-based service, in the same year. PreDPI-Ki is built on a random forest predictor and considers the binding affinities of DT pairs to better anticipate interactions.
D. Representation
IV. ALGORITHMS
V. CHALLENGES
VI. FUTURE SCOPE
Future study should concentrate on techniques that incorporate various similarities. In comparison to methods that just employ one form of similarity, ensemble-based models are more likely to produce accurate findings. Repurposed medications, for instance, have been discovered by accident, pharmacological analysis, or retrospective clinical analysis (such as examining adverse effects). Research is now concentrating on the most effective ways to adopt a more comprehensive, systemic approach in light of the surprisingly successful early examples (repurposing minoxidil from hypertension to hair loss, sildenafil from angina to erectile dysfunction, and thalidomide from morning sickness to multiple myeloma). Medical science and web innovation have been combined to increase the predictive power of deep learning algorithms about biomarkers, side effects of treatments, and therapeutic benefits. Success in clinical trials is attained by the use of certain applications. Therefore, motivation for potential investments in pharmaceutical firms is carried out.
Future medication discovery and development plans anticipate using AI technologies to address every element. For new applications, automated AI has to coordinate theoretical findings like chemistry data, omics data, and medical data. Additionally, we anticipate that more confirmations will need to be rebuilt for the drug reveal campaign.
VII. ACKNOWLEDGEMENT
The author would like to express their appreciation to Prof. Rajesh Nasare for his wise counsel and ongoing assistance during the project. They would also want to give particular thanks to Dr. Sharda Chhabria for her diligent supervision of the improvisation. We successfully finished this paper with your helpful guidance, and we are grateful to have both of you as our mentors.
In the world of medicine, ML models can replace more traditional methods like PPT inhibitors and macrocycles by making predictions based on learned data inside a known framework, i.e., the compound structure. Deep learning models can also take into account chemical structures and QSAR models from pharmaceutical data because they were pertinent for molecules with the correct characteristics and had a high clinical trial success rate. Deep learning techniques and machine learning algorithms are frequently employed in the pharmaceutical business. In drug development and healthcare service hubs, notably in image analysis and omics data, several problems have been overcome using ML algorithms. AI technology has improved by entering computer-aided drug development in an effort to regain the powerful skills in data mining. The development of machine learning methodologies and disciplines will benefit from the proliferation of data. The application of these models in cheminformatics, and more specifically in drug development, has greatly benefited the pharmaceutical industry. The use of descriptors derived from the structure of peptides or small molecules was the sole tool accessible up until this time. ANN have been used more recently to directly recreate graph-based molecules. As the area develops, researchers are looking for new drugs, treatments, or cures that are more efficient than those that are already accessible. Understanding the fundamental processes behind disease progression, the effects of already available medications, and the genetic makeup of patients can aid in the development of novel, highly targeted therapeutic therapies that will eventually improve patients\' health and quality of life.
[1] Speck-Planche, M.N. Cordeiro, “Computer-aided drug design,synthesis and evaluation of new anti-cancer drugs”, Curr Top MedChem. [Epub ahead of print], 2013. [2] “Drug discovery hit to lead”, Available from in Wikipedia.org/wiki/Drug_discovery_hit_to_lead, cited march 26 2012. [3] Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020). [4] Schneider, P. et al. Rethinking drug design in the artifcial intelligence era.Nat. Rev. Drug Discov. 19, 353–364 (2020). [5] Karpov, P., Godin, G. & Tetko, I. V. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J. Cheminform. 12, 17 (2020). [6] Zhang, R., Li, C., Zhang, J., Chen, C. & Wilson, A. G. Cyclical stochastic gradient MCMC for Bayesian deep learning. Preprint at https://arxiv.org/abs/1902.03932 (2019). [7] Scalia, G., Grambow, C. A., Pernici, B., Li, Y.-P. & Green, W. H. Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction. J. Chem. Inf. Model. 60, 2697–2717 (2020) [8] Rifaioglu AS, Atas H, Martin MJ, et al. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform,2018. [9] Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine Learning Methods in Drug Discovery. Molecules 2020. [10] Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug discovery today 2015. [11] Tong WD, et al. Decision forest: combining the predictions of multiple independent decision tree models. J. Chem. Inf. Comput. Sci 2003. [12] Feng Q, Dueva E, Cherkasov A, Ester M. Padme: A deep learning-based framework for drug-target interaction prediction. arXiv preprint arXiv:1807.09741 2018. [13] https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ [14] L. Burggraaff, P. Oranje, R. Gouka, P. van der Pijl, M. Geldof, H.W. van Vlijmen, A.P. IJzerman, G.J. van Westen, Identification of novel small molecule inhibitors for solute carrier sglt1 using proteochemometric modeling, Journal of cheminformatics 11 (1) (2019). [15] D.S. Wishart, Y.D. Feunang, A.C. Guo, E.J. Lo, A. Marcu, J.R. Grant, T. Sajed, D.Johnson, C. Li, Z. Sayeeda, et al., Drugbank 5.0: a major update to the drugbank database for 2018, Nucleic acids research 46 (D1) (2018). [16] A. Alimadadi, S. Aryal, I. Manandhar, P.B. Munroe, B. Joe, X. Cheng, Artificial intelligence and machine learning to fight covid-19 (2020). [17] J.B. Cross, Methods for virtual screening of gpcr targets: Approaches and challenges, in: Computational Methods for GPCR Drug Discovery, Springer,2018. [18] K. Zhao, H.-C. So, Using drug expression profiles and machine learning approach for drug repurposing, in: Computational methods for drug repurposing, Springer, 2019. [19] Nicolas, J., Artificial intelligence and bioinformatics, in A Guided Tour of Artificial Intelligence Research. 2020. [20] Kohli, A., et al., Concepts in US Food and Drug Administration regulation of artificial intelligence for medical imaging. American Journal of Roentgenology, 2019. [21] Zhou, Y., et al., Artificial intelligence in COVID-19 drug repurposing. The Lancet Digital Health, 2020. [22] Jiménez-Luna, J., F. Grisoni, and G. Schneider, Drug discovery with explainable artificial intelligence. Nature Machine Intelligence, 2020. [23] Mohanty, S., et al., Application of Artificial Intelligence in COVID-19 drug repurposing. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 2020. [24] Paul, D., et al., Artificial intelligence in drug discovery and development. Drug Discovery Today, 2020. [25] Cano G, Garcia-Rodriguez J, Garcia-Garcia A, Perez-Sanchez H, Benediktsson JA, Thapa A, Barr A Automatic selection of molecular descriptors using random forest: Application to drug discovery (2017). [26] Chen H, Zhang Y, Kalra MK, Lin F, Chen Y, Liao P, Zhou J, Wang G Low-dose ct with a residual encoder-decoder convolutional neural network (2017). [27] Mei J-P, Kwoh C-K, Yang P, et al. Drug–target interaction pre-diction by learning from local information and neighbors. Bioinformatics 2012. [28] You J, McLeod RD, Hu P. Predicting drug–target interac-tion network using deep learning model. Comput Biol Chem 2019.
Copyright © 2022 Pranali Pali, Abhishek Jain, Pooja Kyadarkunte, Sampada Patil, Shruti Jais, Rajesh Nasare, Sharda Chhabria. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET47504
Publish Date : 2022-11-17
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here