1
|
Priyadarsinee L, Jamir E, Nagamani S, Mahanta HJ, Kumar N, John L, Sarma H, Kumar A, Gaur AS, Sahoo R, Vaikundamani S, Murugan NA, Priyakumar UD, Raghava GPS, Bharatam PV, Parthasarathi R, Subramanian V, Sastry GM, Sastry GN. Molecular Property Diagnostic Suite for COVID-19 (MPDS COVID-19): an open-source disease-specific drug discovery portal. GIGABYTE 2024; 2024:gigabyte114. [PMID: 38525218 PMCID: PMC10958779 DOI: 10.46471/gigabyte.114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 03/11/2024] [Indexed: 03/26/2024] Open
Abstract
Molecular Property Diagnostic Suite (MPDS) was conceived and developed as an open-source disease-specific web portal based on Galaxy. MPDSCOVID-19 was developed for COVID-19 as a one-stop solution for drug discovery research. Galaxy platforms enable the creation of customized workflows connecting various modules in the web server. The architecture of MPDSCOVID-19 effectively employs Galaxy v22.04 features, which are ported on CentOS 7.8 and Python 3.7. MPDSCOVID-19 provides significant updates and the addition of several new tools updated after six years. Tools developed by our group in Perl/Python and open-source tools are collated and integrated into MPDSCOVID-19 using XML scripts. Our MPDS suite aims to facilitate transparent and open innovation. This approach significantly helps bring inclusiveness in the community while promoting free access and participation in software development. Availability & Implementation The MPDSCOVID-19 portal can be accessed at https://mpds.neist.res.in:8085/.
Collapse
Affiliation(s)
- Lipsa Priyadarsinee
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Esther Jamir
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
| | - Selvaraman Nagamani
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Hridoy Jyoti Mahanta
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Nandan Kumar
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
| | - Lijo John
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
| | - Himakshi Sarma
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
| | - Asheesh Kumar
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
| | - Anamika Singh Gaur
- CSIR-Indian Institute of Toxicology Research, Lucknow, 226001, Uttar Pradesh, India
| | - Rosaleen Sahoo
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - S. Vaikundamani
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
| | - N. Arul Murugan
- Indraprastha Institute of Information Technology, Delhi, 110020, India
| | - U. Deva Priyakumar
- International Institute of Information Technology, Gachibowli, Hyderabad, 500032, India
| | - G. P. S. Raghava
- Indraprastha Institute of Information Technology, Delhi, 110020, India
| | - Prasad V. Bharatam
- National Institute of Pharmaceutical Education and Research, S.A.S. Nagar (Mohali), 160062, India
| | - Ramakrishnan Parthasarathi
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
- CSIR-Indian Institute of Toxicology Research, Lucknow, 226001, Uttar Pradesh, India
| | - V. Subramanian
- Department of Chemistry, Indian Institute of Technology, Chennai, 600036, India
| | - G. Madhavi Sastry
- Schrödinger Inc., Octave, Salarpuria Sattva Knowledge City, 1st Floor, Unit 3A, Hyderabad, 500081, India
| | - G. Narahari Sastry
- CSIR–North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
- Indian Institute of Technology (IIT) Hyderabad, Kandi, Sangareddy, Telangana, 502284, India
| |
Collapse
|
2
|
John L, Nagamani S, Mahanta HJ, Vaikundamani S, Kumar N, Kumar A, Jamir E, Priyadarsinee L, Sastry GN. Molecular Property Diagnostic Suite Compound Library (MPDS-CL): a structure-based classification of the chemical space. Mol Divers 2023:10.1007/s11030-023-10752-1. [PMID: 37902900 DOI: 10.1007/s11030-023-10752-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 10/17/2023] [Indexed: 11/01/2023]
Abstract
Molecular Property Diagnostic Suite Compound Library (MPDS-CL) is an open-source Galaxy-based cheminformatics web portal which presents a structure-based classification of the molecules. A structure-based classification of nearly 150 million unique compounds, obtained from 42 publicly available databases and curated for redundancy removal through 97 hierarchically well-defined atom composition-based portions, has been done. These are further subjected to 56-bit fingerprint-based classification algorithm which led to the formation of 56 structurally well-defined classes. The classes thus obtained were further divided into clusters based on their molecular weight. Thus, the entire set of molecules was put into 56 different classes and 625 clusters. This led to the assignment of a unique ID, named as MPDS-AadharID, for each of these 149,169,443 molecules. MPDS-AadharID is akin to the unique number given to citizens in India (similar to SSN in the US and NINO in the UK). The unique features of MPDS-CL are (a) several search options, such as exact structure search, substructure search, property-based search, fingerprint-based search, using SMILES, InChIKey and key-in; (b) automatic generation of information for the processing for MPDS and other galaxy tools; (c) providing the class and cluster of a molecule which makes it easier and fast to search for similar molecules and (d) information related to the presence of the molecules in multiple databases. The MPDS-CL can be accessed at https://mpds.neist.res.in:8086/ .
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Asheesh Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Esther Jamir
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Lipsa Priyadarsinee
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
3
|
Mazumdar B, Deva Sarma PK, Mahanta HJ, Sastry GN. Machine learning based dynamic consensus model for predicting blood-brain barrier permeability. Comput Biol Med 2023; 160:106984. [PMID: 37137267 DOI: 10.1016/j.compbiomed.2023.106984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 03/27/2023] [Accepted: 04/27/2023] [Indexed: 05/05/2023]
Abstract
The blood-brain barrier (BBB) is an important defence mechanism that restricts disease-causing pathogens and toxins to enter the brain from the bloodstream. In recent years, many in silico methods were proposed for predicting BBB permeability, however, the reliability of these models is questionable due to the smaller and class-imbalance dataset which subsequently leads to a very high false positive rate. In this study, machine learning and deep learning-based predictive models were built using XGboost, Random Forest, Extra-tree classifiers and deep neural network. A dataset of 8153 compounds comprising both the BBB permeable and BBB non-permeable was curated and subjected to calculations of molecular descriptors and fingerprints for generating the features for machine learning and deep learning models. Three balancing techniques were then applied to the dataset to address the class-imbalance issue. A comprehensive comparison among the models showed that the deep neural network model generated on the balanced MACCS fingerprint dataset outperformed with an accuracy of 97.8% and a ROC-AUC score of 0.98 among all the models. Additionally, a dynamic consensus model was prepared with the machine learning models and validated with a benchmark dataset for predicting BBB permeability with higher confidence scores.
Collapse
Affiliation(s)
- Bitopan Mazumdar
- Department of Computer Science, Assam University, Silchar, 788011, Assam, India; Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India
| | | | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| |
Collapse
|
4
|
John L, Mahanta HJ, Soujanya Y, Sastry GN. Assessing machine learning approaches for predicting failures of investigational drug candidates during clinical trials. Comput Biol Med 2023; 153:106494. [PMID: 36587568 DOI: 10.1016/j.compbiomed.2022.106494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/30/2022] [Accepted: 12/27/2022] [Indexed: 12/30/2022]
Abstract
One of the major challenges in drug development is having acceptable levels of efficacy and safety throughout all the phases of clinical trials followed by the successful launch in the market. While there are many factors such as molecular properties, toxicity parameters, mechanism of action at the target site, etc. that regulates the therapeutic action of a compound, a holistic approach directed towards data-driven studies will invariably strengthen the predictive toxicological sciences. Our quest for the current study is to find out various reasons as to why an investigational candidate would fail in the clinical trials after multiple iterations of refinement and optimization. We have compiled a dataset that comprises of approved and withdrawn drugs as well as toxic compounds and essentially have used time-split based approach to generate the training and validation set. Five highly robust and scalable machine learning binary classifiers were used to develop the predictive models that were trained with features like molecular descriptors and fingerprints and then validated rigorously to achieve acceptable performance in terms of a set of performance metrics. The mean AUC scores for all the five classifiers with the hold-out test set were obtained in the range of 0.66-0.71. The models were further used to predict the probability score for the clinical candidate dataset. The top compounds predicted to be toxic were analyzed to estimate different dimensions of toxicity. Apparently, through this study, we propose that with the appropriate use of feature extraction and machine learning methods, one can estimate the likelihood of success or failure of investigational drugs candidates thereby opening an avenue for future trends in computational toxicological studies. The models developed in the study can be accessed at https://github.com/gnsastry/predicting_clinical_trials.git.
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Y Soujanya
- Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.
| |
Collapse
|
5
|
Kiewhuo K, Gogoi D, Mahanta HJ, Rawal RK, Das D, S V, Jamir E, Sastry GN. OSADHI - An online structural and analytics based database for herbs of India. Comput Biol Chem 2023; 102:107799. [PMID: 36512929 DOI: 10.1016/j.compbiolchem.2022.107799] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/27/2022] [Accepted: 11/28/2022] [Indexed: 12/03/2022]
Abstract
The current study aims to develop a PAN India database of medicinal plants along with their phytochemicals and geographical availability. The database consists of 6959 unique medicinal plants belonging to 348 families which are available across 28 states and 8 union territories of India. The database sources the information on four different sections - traditional knowledge, geographical indications, phytochemicals, and chemoinformatics. The traditional knowledge reports the plant taxonomy with their vernacular names. A total of 27,440 unique phytochemicals associated with these plants were curated from various sources in this study. However, due to the non-availability of general information like IUPAC names, InChI key, etc. from reliable sources, only 22,314 phytochemicals have been currently reported in the database. Various analyses have been performed for the phytochemicals which include analysis of physicochemical and ADMET properties calculated from open-source web servers using in-house python scripts. The phytochemical data set has also been classified based on the class, superclass, and pathways respectively using NPClassifier, a deep learning framework. Additionally, the antiviral potency of the phytochemicals was also predicted using two machine learning models - Random Forest and XGBoost. The database aims to provide accurate and exhaustive data of the traditional practice of medicinal plants in India in a single platform integrating and analyzing the rich customary practices and facilitating the development and identification of plant-based therapeutics for a variety of diseases. The database can be accessed at https://neist.res.in/osadhi/.
Collapse
Affiliation(s)
- Kikrusenuo Kiewhuo
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Dipshikha Gogoi
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Ravindra K Rawal
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Debabrata Das
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Vaikundamani S
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
| | - Esther Jamir
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India.
| |
Collapse
|
6
|
Towards systematic exploration of chemical space: building the fragment library module in molecular property diagnostic suite. Mol Divers 2022:10.1007/s11030-022-10506-5. [PMID: 35925528 PMCID: PMC9362107 DOI: 10.1007/s11030-022-10506-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 07/23/2022] [Indexed: 11/04/2022]
Abstract
A fragment-based drug discovery (FBDD) approach has traditionally been of utmost significance in drug design studies. It allows the exploration of large chemical space to find novel scaffolds and chemotypes which can be improved into selective inhibitors with good affinity. In the current work, several public domain chemical libraries (ChEMBL, DrugCentral, PDB ligands, COCONUT, and SAVI) comprising bioactive and virtual molecules were retrieved to develop a fragment library. A systematic fragmentation method that breaks a given molecule into rings, linkers, and substituents was used to cleave the molecules and the fragments were analyzed. Further, only the ring framework was taken into the consideration to develop a fragment library that consists of a total number of 107,614 unique fragments. This set represents a rich diverse structure framework that covers a wide variety of yet-to-be-explored fragments for a wide range of small molecule-based applications. This fragment library is an integral part of the molecular property diagnostic suite (MPDS) suite that can be used with other modeling and informatics methods for FBDD approaches. The fragment library module of MPDS can be accessed at http://mpds.neist.res.in:8085.
Collapse
|
7
|
Kiewhuo K, Gogoi D, Mahanta HJ, Rawal RK, Das D, Sastry GN. North East India Medicinal Plants Database (NEI-MPDB). Comput Biol Chem 2022; 100:107728. [DOI: 10.1016/j.compbiolchem.2022.107728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/08/2022] [Accepted: 07/08/2022] [Indexed: 11/03/2022]
|
8
|
Artificial intelligence in virtual screening: models versus experiments. Drug Discov Today 2022; 27:1913-1923. [PMID: 35597513 DOI: 10.1016/j.drudis.2022.05.013] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 05/08/2022] [Accepted: 05/12/2022] [Indexed: 12/22/2022]
Abstract
A typical drug discovery project involves identifying active compounds with significant binding potential for selected disease-specific targets. Experimental high-throughput screening (HTS) is a traditional approach to drug discovery, but is expensive and time-consuming when dealing with huge chemical libraries with billions of compounds. The search space can be narrowed down with the use of reliable computational screening approaches. In this review, we focus on various machine-learning (ML) and deep-learning (DL)-based scoring functions developed for solving classification and ranking problems in drug discovery. We highlight studies in which ML and DL models were successfully deployed to identify lead compounds for which the experimental validations are available from bioassay studies.
Collapse
|